I've done some quick and dirty benchmarking of Java string encoding/ 
decoding to/from UTF-8 for an unrelated project, but I've realized  
that these performance improvements could be added to protobufs. The  
"easy" way to do UTF-8 conversions is the way CodedInputStream/ 
CodedOutputStream does it: using String.getBytes() and new String().  
It turns out that using the java.nio.charset.CharsetDecoder/ 
CharsetEncoder *can* be faster. However, to make it faster the objects  
need to be reused, due to the cost of allocating temporary buffers and  
objects.

Before I attempt to make any improvements, I want to see if anyone  
(Kenton primarily) has any opinions if these make sense. They would  
add ~100 lines of code to replace something which is now a few lines  
of code, and it is a small improvement (approximately 40% less time  
per encode/decode, on a list of 1400 strings in different languages).  
I haven't tried adding this to protobufs yet, so final performance  
improvements are unknown:


Problem 1: A Java protobuf string is stored as a String instance. It  
typically gets converted to UTF-8 *twice*: Once in getSerializedSize()  
via a call to CodedOutputStream.computeStringSize, then again in  
writeTo().

Solution: Cache the byte[] version of String fields. This would  
increase the memory size of each message (an additional pointer per  
string, plus the space for the byte[]), but would HALVE the number of  
conversions. I suspect this will be a fair bit faster. If added, it  
should only be added for the SPEED generated messages.


Problem 2: Using the NIO encoders/decoders can be faster than  
String.getBytes, but only if it is used >= 4 times. If used only once,  
it is worse. The same is approximately true about decoding. Lame  
results: http://evanjones.ca/software/java-string-encoding.html

Solution 1: Add a custom encoder/decoder to CodedOutputStream,  
allocated as needed. This could be *bad* for applications that call  
Message.toByteString or .toByteArray frequently for messages with few  
strings, since that creates and throws away a single CodedOutputStream  
instance.

Solution 2: Add a custom encoder/decoder per thread via a ThreadLocal.  
This requires fetching the ThreadLocal, which is slightly expensive,  
and adds some per-thread memory overhead (~ 4kB, tunable). however the  
allocations are done ONCE per thread, which should be significantly  
better.


--
Evan Jones
http://evanjones.ca/

--

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.


Reply via email to