Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-17 Thread Kenton Varda
I see.  So in fact your code is quite possibly slower in non-ASCII cases?
 In fact, it sounds like having even one non-ASCII character would force
extra copies to occur, which I would guess would defeat the benefit, but
we'd need benchmarks to tell for sure.

On Fri, May 7, 2010 at 6:21 PM, Evan Jones ev...@mit.edu wrote:

 On May 7, 2010, at 18:54 , Kenton Varda wrote:

 I'd be very interested to hear why the JDK is not optimal here.


 I dug into this. I *think* the problem is that the JDK ends up allocating a
 huge temporary array for the UTF-8 data. Hence, the garbage collection cost
 is higher for the JDK's implementation, rather than my implementation.
 Basically the code does this:


 * allocate a new byte[] array that is string length * max bytes per
 character ( = 4 for the UTF encoder)
 * use the java.nio.charset.CharsetEncoder to encode the char[] into the
 byte[] (wrapped in CharBuffer / ByteBuffer).
 * copy the exact number of bytes out of the byte[] into a new byte[], and
 return that.

 The only trick the JDK gets to use that normal Java code can't is that
 they can access the string's char[] buffer directly, whereas I need to copy
 it out into a char[] array.


 Hence, I think what is happening is that the JDK allocates 4-5 times as
 much memory per encode than I do. In the cases where the data is ASCII, my
 code is faster, since it allocates exactly the right amount of space, and
 doesn't need an extra copy. When the data is not ASCII, my code may still be
 faster, since it doesn't overallocate quite as much (in exchange my code
 does many copies).


 Conclusion: there is a legitimate reason for this code to be faster than
 the JDK's code. But it still may not be worth including this patch in the
 main line protocol buffer code.


 Evan

 --
 Evan Jones
 http://evanjones.ca/



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-17 Thread Evan Jones

On May 17, 2010, at 15:38 , Kenton Varda wrote:
I see.  So in fact your code is quite possibly slower in non-ASCII  
cases?  In fact, it sounds like having even one non-ASCII character  
would force extra copies to occur, which I would guess would defeat  
the benefit, but we'd need benchmarks to tell for sure.


Yes. I've been playing with this a bit in my spare time since the last  
email, but I don't have any results I'm happy with yet. Rough notes:


* Encoding is (quite a bit?) faster than String.getBytes() if you  
assume one byte per character.
* If you guess the number bytes per character poorly and have to do  
multiple allocations and copies, the regular Java version will win. If  
you get it right (even if you first guess 1 byte per character) it  
looks like it can be slightly faster or on par with the Java version.
* Re-using a temporary byte[] for string encoding may be faster than  
String.getBytes(), which effectively allocates a temporary byte[] each  
time.



I'm going to try to rework my code with a slightly different policy:

a) Assume 1 byte per character and attempt the encode. If we run out  
of space:
b) Use a shared temporary buffer and continue the encode. If we run  
out of space:
c) Allocate a worst case 4 byte per character buffer and finish the  
encode.



This should be much better than the JDK version for ASCII, a bit  
better for short strings that fit in the shared temporary buffer,  
and not significantly worse for the rest, but I'll need to test it to  
be sure.


This is sort of just a fun experiment for me at this point, so who  
knows when I may get around to actually finishing this.


Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-07 Thread Kenton Varda
Yeah I don't think we should add a way to inject decoders into ByteString...

I'd be very interested to hear why the JDK is not optimal here.

On Mon, May 3, 2010 at 6:16 PM, Evan Jones ev...@mit.edu wrote:

 On May 3, 2010, at 21:11 , Evan Jones wrote:

 Yes, I actually changed ByteString, since ByteString.copyFromUtf8 is how
 protocol buffers get UTF-8 encoded strings at this point.


 Although now that I think about it, I think it might be possible to enable
 this only for SPEED messages, if that might make it interesting. It may
 require some gross stuff in ByteString, though, in order to be able to
 create ByteStrings using a different code path, without including the .class
 files in the lite jar.

 My guess: this probably isn't worth an extra 10-20% performance
 improvement. I'll try to clean up and maintain the patch, so speed freak
 types can always patch their own implementations.

 Evan

 --
 Evan Jones
 http://evanjones.ca/

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-07 Thread Evan Jones

On May 7, 2010, at 18:54 , Kenton Varda wrote:

I'd be very interested to hear why the JDK is not optimal here.


I dug into this. I *think* the problem is that the JDK ends up  
allocating a huge temporary array for the UTF-8 data. Hence, the  
garbage collection cost is higher for the JDK's implementation, rather  
than my implementation. Basically the code does this:



* allocate a new byte[] array that is string length * max bytes per  
character ( = 4 for the UTF encoder)
* use the java.nio.charset.CharsetEncoder to encode the char[] into  
the byte[] (wrapped in CharBuffer / ByteBuffer).
* copy the exact number of bytes out of the byte[] into a new byte[],  
and return that.


The only trick the JDK gets to use that normal Java code can't is  
that they can access the string's char[] buffer directly, whereas I  
need to copy it out into a char[] array.



Hence, I think what is happening is that the JDK allocates 4-5 times  
as much memory per encode than I do. In the cases where the data is  
ASCII, my code is faster, since it allocates exactly the right amount  
of space, and doesn't need an extra copy. When the data is not ASCII,  
my code may still be faster, since it doesn't overallocate quite as  
much (in exchange my code does many copies).



Conclusion: there is a legitimate reason for this code to be faster  
than the JDK's code. But it still may not be worth including this  
patch in the main line protocol buffer code.


Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-03 Thread Kenton Varda
Interesting.  Since this seems like a JVM implementation issue, I wonder if
the results are different on Dalvik (Android).  Also, the extra code sounds
undesirable for lite mode, but my guess is that you had to place this code
inside CodedOutputStream which is shared by lite mode.  So yeah, there are
some down sides.  Not sure how I feel about it.

On Wed, Apr 28, 2010 at 11:17 AM, Evan Jones ev...@mit.edu wrote:

 Evan Jones wrote:

 Problem 2: Using the NIO encoders/decoders can be faster than
 String.getBytes, but only if it is used = 4 times. If used only once, it is
 worse. The same is approximately true about decoding. Lame results:
 http://evanjones.ca/software/java-string-encoding.html


 I'm revisiting this old issue, thanks to being reminded about it by an
 earlier message. I've tested this with recent JVMs, and it still seems to
 hold true: using the NIO encoders and decoders can be faster than using
 String.getBytes(). My numbers show that encoding is approximately 40%
 faster, while decoding shows a smaller improvement. See the following for
 details on this microbenchmark:


 http://evanjones.ca/software/java-string-encoding.html


 This surprises me, since it suggests that Sun/Oracle could replace their
 implementation of String.getBytes() with something similar to my code, and
 get a performance improvement. In fact, with privileged access to the
 internals of a String, they should be able to do even better.

 I've integrated this change into protobuf and with the microbenchmark in
 the protobuf source tree, it shows a performance improvement of ~20%
 (numbers below). I'll send a code review with the code shortly, once I've
 cleaned it up a bit, in case anyone wants to look at it.


 Pros:
 + Faster protocol buffer encoding.

 Cons:
 - Far more code to handle encoding (like an extra hundred lines or so)
 which means there could be bugs.
 - Extra memory (~2 kB per thread for encoding)


 I'm unsure if the benefits outweigh the costs, particularly since the fact
 this appears faster seems to be a surprising result, and I wouldn't be
 shocked to find that future JVM / JDK releases could make this
 optimization useless. I'll leave that decision to the protocol buffer
 maintainers.

 Evan



 Results: All the serialize results are better. The deserialized results are
 unchanged (as expected).


 SpeedMessage1 (which is small):
 Original: Serialize to byte string: 12360424 iterations in 29.83s;
 90.097984MB/s
 Optimized: Serialize to byte string: 15911623 iterations in 29.997s;
 115.337776MB/s

 SpeedMessage2 (larger):
 Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s
 Serialize to byte string: 40381 iterations in 30.031s; 108.44853MB/s


 Raw results on my Macbook (Core2 Duo CPU):

 ORIGINAL
 Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file
 google_message1.dat
 Serialize to byte string: 12360424 iterations in 29.83s; 90.097984MB/s
 Serialize to byte array: 12244951 iterations in 30.303s; 87.86307MB/s
 Serialize to memory stream: 8699469 iterations in 23.732s; 79.70642MB/s
 Serialize to /dev/null with FileOutputStream: 6075179 iterations in
 27.764s; 47.578636MB/s
 Serialize to /dev/null reusing FileOutputStream: 6975006 iterations in
 29.99s; 50.571175MB/s
 Serialize to /dev/null with FileChannel: 10375092 iterations in 29.864s;
 75.54034MB/s
 Serialize to /dev/null reusing FileChannel: 11166943 iterations in 30.699s;
 79.09427MB/s
 Deserialize from byte string: 14463117 iterations in 30.06s; 104.618355MB/s
 Deserialize from byte array: 14436567 iterations in 30.007s; 104.61074MB/s
 Deserialize from memory stream: 6221772 iterations in 28.024s;
 48.274624MB/s

 Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file
 google_message2.dat
 Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s
 Serialize to byte array: 33103 iterations in 29.517s; 90.45062MB/s
 Serialize to memory stream: 28872 iterations in 29.939s; 77.77786MB/s
 Serialize to /dev/null with FileOutputStream: 32934 iterations in 29.927s;
 88.756MB/s
 Serialize to /dev/null reusing FileOutputStream: 32979 iterations in
 29.887s; 88.99622MB/s
 Serialize to /dev/null with FileChannel: 32447 iterations in 29.921s;
 87.46108MB/s
 Serialize to /dev/null reusing FileChannel: 32585 iterations in 29.903s;
 87.88594MB/s
 Deserialize from byte string: 38388 iterations in 29.879s; 103.620544MB/s
 Deserialize from byte array: 38677 iterations in 29.866s; 104.446075MB/s
 Deserialize from memory stream: 37879 iterations in 29.954s; 101.990585MB/s

 OPTIMIZED
 Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file
 google_message1.dat
 Serialize to byte string: 15911623 iterations in 29.997s; 115.337776MB/s
 Serialize to byte array: 16152646 iterations in 30.008s; 117.041954MB/s
 Serialize to memory stream: 14859367 iterations in 29.551s; 109.33597MB/s
 Serialize to /dev/null with FileOutputStream: 7224915 iterations in
 29.954s; 52.446056MB/s
 Serialize to /dev/null reusing 

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-05-03 Thread Evan Jones

On May 3, 2010, at 21:11 , Evan Jones wrote:
Yes, I actually changed ByteString, since ByteString.copyFromUtf8 is  
how protocol buffers get UTF-8 encoded strings at this point.


Although now that I think about it, I think it might be possible to  
enable this only for SPEED messages, if that might make it  
interesting. It may require some gross stuff in ByteString, though, in  
order to be able to create ByteStrings using a different code path,  
without including the .class files in the lite jar.


My guess: this probably isn't worth an extra 10-20% performance  
improvement. I'll try to clean up and maintain the patch, so speed  
freak types can always patch their own implementations.


Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

2010-04-28 Thread Evan Jones

Evan Jones wrote:
Problem 2: Using the NIO encoders/decoders can be faster than  
String.getBytes, but only if it is used = 4 times. If used only  
once, it is worse. The same is approximately true about decoding.  
Lame results: http://evanjones.ca/software/java-string-encoding.html


I'm revisiting this old issue, thanks to being reminded about it by an  
earlier message. I've tested this with recent JVMs, and it still  
seems to hold true: using the NIO encoders and decoders can be faster  
than using String.getBytes(). My numbers show that encoding is  
approximately 40% faster, while decoding shows a smaller improvement.  
See the following for details on this microbenchmark:


http://evanjones.ca/software/java-string-encoding.html


This surprises me, since it suggests that Sun/Oracle could replace  
their implementation of String.getBytes() with something similar to my  
code, and get a performance improvement. In fact, with privileged  
access to the internals of a String, they should be able to do even  
better.


I've integrated this change into protobuf and with the microbenchmark  
in the protobuf source tree, it shows a performance improvement of  
~20% (numbers below). I'll send a code review with the code shortly,  
once I've cleaned it up a bit, in case anyone wants to look at it.



Pros:
+ Faster protocol buffer encoding.

Cons:
- Far more code to handle encoding (like an extra hundred lines or so)  
which means there could be bugs.

- Extra memory (~2 kB per thread for encoding)


I'm unsure if the benefits outweigh the costs, particularly since the  
fact this appears faster seems to be a surprising result, and I  
wouldn't be shocked to find that future JVM / JDK releases could make  
this optimization useless. I'll leave that decision to the protocol  
buffer maintainers.


Evan



Results: All the serialize results are better. The deserialized  
results are unchanged (as expected).



SpeedMessage1 (which is small):
Original: Serialize to byte string: 12360424 iterations in 29.83s;  
90.097984MB/s
Optimized: Serialize to byte string: 15911623 iterations in 29.997s;  
115.337776MB/s


SpeedMessage2 (larger):
Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s
Serialize to byte string: 40381 iterations in 30.031s; 108.44853MB/s


Raw results on my Macbook (Core2 Duo CPU):

ORIGINAL
Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file  
google_message1.dat

Serialize to byte string: 12360424 iterations in 29.83s; 90.097984MB/s
Serialize to byte array: 12244951 iterations in 30.303s; 87.86307MB/s
Serialize to memory stream: 8699469 iterations in 23.732s; 79.70642MB/s
Serialize to /dev/null with FileOutputStream: 6075179 iterations in  
27.764s; 47.578636MB/s
Serialize to /dev/null reusing FileOutputStream: 6975006 iterations in  
29.99s; 50.571175MB/s
Serialize to /dev/null with FileChannel: 10375092 iterations in  
29.864s; 75.54034MB/s
Serialize to /dev/null reusing FileChannel: 11166943 iterations in  
30.699s; 79.09427MB/s
Deserialize from byte string: 14463117 iterations in 30.06s;  
104.618355MB/s
Deserialize from byte array: 14436567 iterations in 30.007s;  
104.61074MB/s
Deserialize from memory stream: 6221772 iterations in 28.024s;  
48.274624MB/s


Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file  
google_message2.dat

Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s
Serialize to byte array: 33103 iterations in 29.517s; 90.45062MB/s
Serialize to memory stream: 28872 iterations in 29.939s; 77.77786MB/s
Serialize to /dev/null with FileOutputStream: 32934 iterations in  
29.927s; 88.756MB/s
Serialize to /dev/null reusing FileOutputStream: 32979 iterations in  
29.887s; 88.99622MB/s
Serialize to /dev/null with FileChannel: 32447 iterations in 29.921s;  
87.46108MB/s
Serialize to /dev/null reusing FileChannel: 32585 iterations in  
29.903s; 87.88594MB/s
Deserialize from byte string: 38388 iterations in 29.879s;  
103.620544MB/s

Deserialize from byte array: 38677 iterations in 29.866s; 104.446075MB/s
Deserialize from memory stream: 37879 iterations in 29.954s;  
101.990585MB/s


OPTIMIZED
Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file  
google_message1.dat

Serialize to byte string: 15911623 iterations in 29.997s; 115.337776MB/s
Serialize to byte array: 16152646 iterations in 30.008s; 117.041954MB/s
Serialize to memory stream: 14859367 iterations in 29.551s;  
109.33597MB/s
Serialize to /dev/null with FileOutputStream: 7224915 iterations in  
29.954s; 52.446056MB/s
Serialize to /dev/null reusing FileOutputStream: 7479144 iterations in  
30.081s; 54.062305MB/s
Serialize to /dev/null with FileChannel: 12730586 iterations in  
30.025s; 92.193504MB/s
Serialize to /dev/null reusing FileChannel: 14024645 iterations in  
30.399s; 100.31538MB/s
Deserialize from byte string: 14390338 iterations in 29.958s;  
104.44631MB/s
Deserialize from byte array: 14496442 iterations in 30.142s;  
104.57414MB/s
Deserialize from memory