I've done some quick and dirty benchmarking of Java string encoding/
decoding to/from UTF-8 for an unrelated project, but I've realized
that these performance improvements could be added to protobufs. The
"easy" way to do UTF-8 conversions is the way CodedInputStream/
CodedOutputStream does it: using String.getBytes() and new String().
It turns out that using the java.nio.charset.CharsetDecoder/
CharsetEncoder *can* be faster. However, to make it faster the objects
need to be reused, due to the cost of allocating temporary buffers and
Before I attempt to make any improvements, I want to see if anyone
(Kenton primarily) has any opinions if these make sense. They would
add ~100 lines of code to replace something which is now a few lines
of code, and it is a small improvement (approximately 40% less time
per encode/decode, on a list of 1400 strings in different languages).
I haven't tried adding this to protobufs yet, so final performance
improvements are unknown:
Problem 1: A Java protobuf string is stored as a String instance. It
typically gets converted to UTF-8 *twice*: Once in getSerializedSize()
via a call to CodedOutputStream.computeStringSize, then again in
Solution: Cache the byte version of String fields. This would
increase the memory size of each message (an additional pointer per
string, plus the space for the byte), but would HALVE the number of
conversions. I suspect this will be a fair bit faster. If added, it
should only be added for the SPEED generated messages.
Problem 2: Using the NIO encoders/decoders can be faster than
String.getBytes, but only if it is used >= 4 times. If used only once,
it is worse. The same is approximately true about decoding. Lame
Solution 1: Add a custom encoder/decoder to CodedOutputStream,
allocated as needed. This could be *bad* for applications that call
Message.toByteString or .toByteArray frequently for messages with few
strings, since that creates and throws away a single CodedOutputStream
Solution 2: Add a custom encoder/decoder per thread via a ThreadLocal.
This requires fetching the ThreadLocal, which is slightly expensive,
and adds some per-thread memory overhead (~ 4kB, tunable). however the
allocations are done ONCE per thread, which should be significantly
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to
For more options, visit this group at