On Fri, Dec 5, 2008 at 11:03 AM, Henning P. Schmiedehausen <
[EMAIL PROTECTED]> wrote:

> [EMAIL PROTECTED] writes:
>
> >Author: etnu
> >Date: Fri Dec  5 03:49:32 2008
> >New Revision: 723717
>
> >URL: http://svn.apache.org/viewvc?rev=723717&view=rev
> >Log:
> >Improved JSON serialization library that is several times faster than the
> closest alternative in CPU usage and that produces a few orders of magnitude
> less garbage by avoiding making many allocations when escaping strings.
> Included is a benchmark that demonstrates the CPU gap, but running it under
> a profiler will show the far more dramatic difference in memory usage.
>
> Where is the JIRA for this? Where are the numbers you are talking about?


Run the benchmark suite that is included and mentioned in the commit
message. For best results, run it under a profiler to see how terrible the
serialization mechanism in org.json.JSONObject is.

There's no JIRA because we agreed to a commit then review process for code
changes that don't change behavior, due to the pain of creating JIRA tickets
relative to the benefits in this case. We're really banking on eventual
improvements to the codereview.appspot site that will allow for small bug
fix / performance patches to be reviewed cleanly.

>+    } catch (IOException e) {
> >+      // Can't happen.
> >+    }
>
> if you are so sure, throw a RuntimeException. This would show you when your
> assumption was not correct.


StringBuilder can't throw an IOException. The signatures of append can throw
IOException only because IOBuffers are one class of appendable. Changing the
signatures to take a StringBuilder instead of Appendable eliminates the
IOException from ever occuring.  I suppose someone could add new code in
there that throws an IOException.


>
>
> >+        default:
> >+          if (current < ' ' || (current >= '\u0080' && current <
> '\u00a0') ||
> >+              (current >= '\u2000' && current < '\u2100')) {
> [...]
> >+                // The three possible alternative approaches for dealing
> with unicode characters are
> >+                // as follows:
> >+                // Method 1 (from json.org.JSONObject)
> >+                // 1. Append "000" + Integer.toHexString(current)
> >+                // 2. Truncate this value to 4 digits by using
> value.substring(value.length() - 4)
> >+                //
> >+                // Method 2 (from net.sf.json.JSONObject)
> >+                // This method is fairly unique because the entire thing
> uses an intermediate fixed
> >+                // size buffer of 1KB. It's an interesting approach, but
> overall performs worse than
> >+                // org.json
> >+                // 1. Append "000" + Integer.toHexString(current)
> >+                // 2. Append value.charAt(value.length() - 4)
> >+                // 2. Append value.charAt(value.length() - 3)
> >+                // 2. Append value.charAt(value.length() - 2)
> >+                // 2. Append value.charAt(value.length() - 1)
> >+                //
> >+                // Method 3 (previous experiment)
> >+                // 1. Calculate Integer.hexString(current)
> >+                // 2. for (int i = 0; i < 4 - value.length(); ++i) {
> buf.append('0'); }
> >+                // 3. buf.append(value)
> >+                //
> >+                // Of these, the second proved fastest. The current
> implementation performs slightly
> >+                // faster than the second in the benchmark found in
> JsonSerializerTest.
> >+                buf.append(UNICODE_TABLE[current]);
> [...]
>
> That *looks* very hackish. Are you sure that this is the fastest/best way
> to do this?


Which method looks hackish? A lookup table seemed a hell of a lot cleaner
than the hideous way that both org.json.JSONObject and
net.sf.json.JSONobject do it, which is why I included the long explanation
of the alternative schemes. The check at the beginning ensures that only
JSON that actually has to be escaped is escaped, avoiding output size blow
up in asian, cyrillic, or indic languages.


>
>
> The Sun JDK uses a shift (in java.util.Properties::saveConcert) which
> usually translates well into VM instructions. Maybe benchmark this,
> too.


For reference, it does this:

 if (((aChar < 0x0020) || (aChar > 0x007e)) & escapeUnicode ) {
                        outBuffer.append('\\');
                        outBuffer.append('u');
                        outBuffer.append(toHex((aChar >> 12) & 0xF));
                        outBuffer.append(toHex((aChar >>  8) & 0xF));
                        outBuffer.append(toHex((aChar >>  4) & 0xF));
                        outBuffer.append(toHex( aChar        & 0xF));

where toHex simply gets the hex value out of a lookup table (and is probably
inlined).

If you notice, though, this was designed to escape all unicode code points,
which is probably why sun opted for the smaller (16 char) lookup table.
Mapping every possible unicode code point in a lookup table would require
over a million entries. Fortunately, JSON only requires escaping a tiny
subset of code points, and the rest can be safely encoded as plain old UTF8.

toHex might exhibit better overall performance (a 16 byte array is more
likely to fit into cache than an ~20K entry) though, so I'll benchmark it
and compare.


>
>        Ciao
>            Henning
>
> --
> Henning P. Schmiedehausen - Palo Alto, California, U.S.A.
> [EMAIL PROTECTED] "We're Germans and we use Unix.
> [EMAIL PROTECTED]          That's a combination of two demographic groups
>                            known to have no sense of humour whatsoever."
>                               -- Hanno Mueller, de.comp.os.unix.programming
>

Reply via email to