Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. On Fri, May 7, 2010 at 6:21 PM, Evan Jones ev...@mit.edu wrote: On May 7, 2010, at 18:54 , Kenton Varda wrote: I'd be very interested to hear why the JDK is not optimal here. I dug into this. I *think* the problem is that the JDK ends up allocating a huge temporary array for the UTF-8 data. Hence, the garbage collection cost is higher for the JDK's implementation, rather than my implementation. Basically the code does this: * allocate a new byte[] array that is string length * max bytes per character ( = 4 for the UTF encoder) * use the java.nio.charset.CharsetEncoder to encode the char[] into the byte[] (wrapped in CharBuffer / ByteBuffer). * copy the exact number of bytes out of the byte[] into a new byte[], and return that. The only trick the JDK gets to use that normal Java code can't is that they can access the string's char[] buffer directly, whereas I need to copy it out into a char[] array. Hence, I think what is happening is that the JDK allocates 4-5 times as much memory per encode than I do. In the cases where the data is ASCII, my code is faster, since it allocates exactly the right amount of space, and doesn't need an extra copy. When the data is not ASCII, my code may still be faster, since it doesn't overallocate quite as much (in exchange my code does many copies). Conclusion: there is a legitimate reason for this code to be faster than the JDK's code. But it still may not be worth including this patch in the main line protocol buffer code. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
On May 17, 2010, at 15:38 , Kenton Varda wrote: I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure. Yes. I've been playing with this a bit in my spare time since the last email, but I don't have any results I'm happy with yet. Rough notes: * Encoding is (quite a bit?) faster than String.getBytes() if you assume one byte per character. * If you guess the number bytes per character poorly and have to do multiple allocations and copies, the regular Java version will win. If you get it right (even if you first guess 1 byte per character) it looks like it can be slightly faster or on par with the Java version. * Re-using a temporary byte[] for string encoding may be faster than String.getBytes(), which effectively allocates a temporary byte[] each time. I'm going to try to rework my code with a slightly different policy: a) Assume 1 byte per character and attempt the encode. If we run out of space: b) Use a shared temporary buffer and continue the encode. If we run out of space: c) Allocate a worst case 4 byte per character buffer and finish the encode. This should be much better than the JDK version for ASCII, a bit better for short strings that fit in the shared temporary buffer, and not significantly worse for the rest, but I'll need to test it to be sure. This is sort of just a fun experiment for me at this point, so who knows when I may get around to actually finishing this. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
Yeah I don't think we should add a way to inject decoders into ByteString... I'd be very interested to hear why the JDK is not optimal here. On Mon, May 3, 2010 at 6:16 PM, Evan Jones ev...@mit.edu wrote: On May 3, 2010, at 21:11 , Evan Jones wrote: Yes, I actually changed ByteString, since ByteString.copyFromUtf8 is how protocol buffers get UTF-8 encoded strings at this point. Although now that I think about it, I think it might be possible to enable this only for SPEED messages, if that might make it interesting. It may require some gross stuff in ByteString, though, in order to be able to create ByteStrings using a different code path, without including the .class files in the lite jar. My guess: this probably isn't worth an extra 10-20% performance improvement. I'll try to clean up and maintain the patch, so speed freak types can always patch their own implementations. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
On May 7, 2010, at 18:54 , Kenton Varda wrote: I'd be very interested to hear why the JDK is not optimal here. I dug into this. I *think* the problem is that the JDK ends up allocating a huge temporary array for the UTF-8 data. Hence, the garbage collection cost is higher for the JDK's implementation, rather than my implementation. Basically the code does this: * allocate a new byte[] array that is string length * max bytes per character ( = 4 for the UTF encoder) * use the java.nio.charset.CharsetEncoder to encode the char[] into the byte[] (wrapped in CharBuffer / ByteBuffer). * copy the exact number of bytes out of the byte[] into a new byte[], and return that. The only trick the JDK gets to use that normal Java code can't is that they can access the string's char[] buffer directly, whereas I need to copy it out into a char[] array. Hence, I think what is happening is that the JDK allocates 4-5 times as much memory per encode than I do. In the cases where the data is ASCII, my code is faster, since it allocates exactly the right amount of space, and doesn't need an extra copy. When the data is not ASCII, my code may still be faster, since it doesn't overallocate quite as much (in exchange my code does many copies). Conclusion: there is a legitimate reason for this code to be faster than the JDK's code. But it still may not be worth including this patch in the main line protocol buffer code. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
Interesting. Since this seems like a JVM implementation issue, I wonder if the results are different on Dalvik (Android). Also, the extra code sounds undesirable for lite mode, but my guess is that you had to place this code inside CodedOutputStream which is shared by lite mode. So yeah, there are some down sides. Not sure how I feel about it. On Wed, Apr 28, 2010 at 11:17 AM, Evan Jones ev...@mit.edu wrote: Evan Jones wrote: Problem 2: Using the NIO encoders/decoders can be faster than String.getBytes, but only if it is used = 4 times. If used only once, it is worse. The same is approximately true about decoding. Lame results: http://evanjones.ca/software/java-string-encoding.html I'm revisiting this old issue, thanks to being reminded about it by an earlier message. I've tested this with recent JVMs, and it still seems to hold true: using the NIO encoders and decoders can be faster than using String.getBytes(). My numbers show that encoding is approximately 40% faster, while decoding shows a smaller improvement. See the following for details on this microbenchmark: http://evanjones.ca/software/java-string-encoding.html This surprises me, since it suggests that Sun/Oracle could replace their implementation of String.getBytes() with something similar to my code, and get a performance improvement. In fact, with privileged access to the internals of a String, they should be able to do even better. I've integrated this change into protobuf and with the microbenchmark in the protobuf source tree, it shows a performance improvement of ~20% (numbers below). I'll send a code review with the code shortly, once I've cleaned it up a bit, in case anyone wants to look at it. Pros: + Faster protocol buffer encoding. Cons: - Far more code to handle encoding (like an extra hundred lines or so) which means there could be bugs. - Extra memory (~2 kB per thread for encoding) I'm unsure if the benefits outweigh the costs, particularly since the fact this appears faster seems to be a surprising result, and I wouldn't be shocked to find that future JVM / JDK releases could make this optimization useless. I'll leave that decision to the protocol buffer maintainers. Evan Results: All the serialize results are better. The deserialized results are unchanged (as expected). SpeedMessage1 (which is small): Original: Serialize to byte string: 12360424 iterations in 29.83s; 90.097984MB/s Optimized: Serialize to byte string: 15911623 iterations in 29.997s; 115.337776MB/s SpeedMessage2 (larger): Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s Serialize to byte string: 40381 iterations in 30.031s; 108.44853MB/s Raw results on my Macbook (Core2 Duo CPU): ORIGINAL Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat Serialize to byte string: 12360424 iterations in 29.83s; 90.097984MB/s Serialize to byte array: 12244951 iterations in 30.303s; 87.86307MB/s Serialize to memory stream: 8699469 iterations in 23.732s; 79.70642MB/s Serialize to /dev/null with FileOutputStream: 6075179 iterations in 27.764s; 47.578636MB/s Serialize to /dev/null reusing FileOutputStream: 6975006 iterations in 29.99s; 50.571175MB/s Serialize to /dev/null with FileChannel: 10375092 iterations in 29.864s; 75.54034MB/s Serialize to /dev/null reusing FileChannel: 11166943 iterations in 30.699s; 79.09427MB/s Deserialize from byte string: 14463117 iterations in 30.06s; 104.618355MB/s Deserialize from byte array: 14436567 iterations in 30.007s; 104.61074MB/s Deserialize from memory stream: 6221772 iterations in 28.024s; 48.274624MB/s Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s Serialize to byte array: 33103 iterations in 29.517s; 90.45062MB/s Serialize to memory stream: 28872 iterations in 29.939s; 77.77786MB/s Serialize to /dev/null with FileOutputStream: 32934 iterations in 29.927s; 88.756MB/s Serialize to /dev/null reusing FileOutputStream: 32979 iterations in 29.887s; 88.99622MB/s Serialize to /dev/null with FileChannel: 32447 iterations in 29.921s; 87.46108MB/s Serialize to /dev/null reusing FileChannel: 32585 iterations in 29.903s; 87.88594MB/s Deserialize from byte string: 38388 iterations in 29.879s; 103.620544MB/s Deserialize from byte array: 38677 iterations in 29.866s; 104.446075MB/s Deserialize from memory stream: 37879 iterations in 29.954s; 101.990585MB/s OPTIMIZED Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat Serialize to byte string: 15911623 iterations in 29.997s; 115.337776MB/s Serialize to byte array: 16152646 iterations in 30.008s; 117.041954MB/s Serialize to memory stream: 14859367 iterations in 29.551s; 109.33597MB/s Serialize to /dev/null with FileOutputStream: 7224915 iterations in 29.954s; 52.446056MB/s Serialize to /dev/null reusing
Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
On May 3, 2010, at 21:11 , Evan Jones wrote: Yes, I actually changed ByteString, since ByteString.copyFromUtf8 is how protocol buffers get UTF-8 encoded strings at this point. Although now that I think about it, I think it might be possible to enable this only for SPEED messages, if that might make it interesting. It may require some gross stuff in ByteString, though, in order to be able to create ByteStrings using a different code path, without including the .class files in the lite jar. My guess: this probably isn't worth an extra 10-20% performance improvement. I'll try to clean up and maintain the patch, so speed freak types can always patch their own implementations. Evan -- Evan Jones http://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements
Evan Jones wrote: Problem 2: Using the NIO encoders/decoders can be faster than String.getBytes, but only if it is used = 4 times. If used only once, it is worse. The same is approximately true about decoding. Lame results: http://evanjones.ca/software/java-string-encoding.html I'm revisiting this old issue, thanks to being reminded about it by an earlier message. I've tested this with recent JVMs, and it still seems to hold true: using the NIO encoders and decoders can be faster than using String.getBytes(). My numbers show that encoding is approximately 40% faster, while decoding shows a smaller improvement. See the following for details on this microbenchmark: http://evanjones.ca/software/java-string-encoding.html This surprises me, since it suggests that Sun/Oracle could replace their implementation of String.getBytes() with something similar to my code, and get a performance improvement. In fact, with privileged access to the internals of a String, they should be able to do even better. I've integrated this change into protobuf and with the microbenchmark in the protobuf source tree, it shows a performance improvement of ~20% (numbers below). I'll send a code review with the code shortly, once I've cleaned it up a bit, in case anyone wants to look at it. Pros: + Faster protocol buffer encoding. Cons: - Far more code to handle encoding (like an extra hundred lines or so) which means there could be bugs. - Extra memory (~2 kB per thread for encoding) I'm unsure if the benefits outweigh the costs, particularly since the fact this appears faster seems to be a surprising result, and I wouldn't be shocked to find that future JVM / JDK releases could make this optimization useless. I'll leave that decision to the protocol buffer maintainers. Evan Results: All the serialize results are better. The deserialized results are unchanged (as expected). SpeedMessage1 (which is small): Original: Serialize to byte string: 12360424 iterations in 29.83s; 90.097984MB/s Optimized: Serialize to byte string: 15911623 iterations in 29.997s; 115.337776MB/s SpeedMessage2 (larger): Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s Serialize to byte string: 40381 iterations in 30.031s; 108.44853MB/s Raw results on my Macbook (Core2 Duo CPU): ORIGINAL Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat Serialize to byte string: 12360424 iterations in 29.83s; 90.097984MB/s Serialize to byte array: 12244951 iterations in 30.303s; 87.86307MB/s Serialize to memory stream: 8699469 iterations in 23.732s; 79.70642MB/s Serialize to /dev/null with FileOutputStream: 6075179 iterations in 27.764s; 47.578636MB/s Serialize to /dev/null reusing FileOutputStream: 6975006 iterations in 29.99s; 50.571175MB/s Serialize to /dev/null with FileChannel: 10375092 iterations in 29.864s; 75.54034MB/s Serialize to /dev/null reusing FileChannel: 11166943 iterations in 30.699s; 79.09427MB/s Deserialize from byte string: 14463117 iterations in 30.06s; 104.618355MB/s Deserialize from byte array: 14436567 iterations in 30.007s; 104.61074MB/s Deserialize from memory stream: 6221772 iterations in 28.024s; 48.274624MB/s Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat Serialize to byte string: 33482 iterations in 29.754s; 90.757484MB/s Serialize to byte array: 33103 iterations in 29.517s; 90.45062MB/s Serialize to memory stream: 28872 iterations in 29.939s; 77.77786MB/s Serialize to /dev/null with FileOutputStream: 32934 iterations in 29.927s; 88.756MB/s Serialize to /dev/null reusing FileOutputStream: 32979 iterations in 29.887s; 88.99622MB/s Serialize to /dev/null with FileChannel: 32447 iterations in 29.921s; 87.46108MB/s Serialize to /dev/null reusing FileChannel: 32585 iterations in 29.903s; 87.88594MB/s Deserialize from byte string: 38388 iterations in 29.879s; 103.620544MB/s Deserialize from byte array: 38677 iterations in 29.866s; 104.446075MB/s Deserialize from memory stream: 37879 iterations in 29.954s; 101.990585MB/s OPTIMIZED Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat Serialize to byte string: 15911623 iterations in 29.997s; 115.337776MB/s Serialize to byte array: 16152646 iterations in 30.008s; 117.041954MB/s Serialize to memory stream: 14859367 iterations in 29.551s; 109.33597MB/s Serialize to /dev/null with FileOutputStream: 7224915 iterations in 29.954s; 52.446056MB/s Serialize to /dev/null reusing FileOutputStream: 7479144 iterations in 30.081s; 54.062305MB/s Serialize to /dev/null with FileChannel: 12730586 iterations in 30.025s; 92.193504MB/s Serialize to /dev/null reusing FileChannel: 14024645 iterations in 30.399s; 100.31538MB/s Deserialize from byte string: 14390338 iterations in 29.958s; 104.44631MB/s Deserialize from byte array: 14496442 iterations in 30.142s; 104.57414MB/s Deserialize from memory