Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

Kenton Varda Tue, 22 Dec 2009 16:59:55 -0800

These ideas sound good to me.

On Tue, Dec 22, 2009 at 9:26 AM, Evan Jones <[email protected]> wrote:


> Problem 1: A Java protobuf string is stored as a String instance. It
> typically gets converted to UTF-8 *twice*: Once in getSerializedSize()
> via a call to CodedOutputStream.computeStringSize, then again in
> writeTo().
>
> Solution: Cache the byte[] version of String fields. This would
> increase the memory size of each message (an additional pointer per
> string, plus the space for the byte[]), but would HALVE the number of
> conversions. I suspect this will be a fair bit faster. If added, it
> should only be added for the SPEED generated messages.
>

I wonder if we can safely discard the cached byte array during serialization
on the assumption that most messages are serialized only once?


> Problem 2: Using the NIO encoders/decoders can be faster than
> String.getBytes, but only if it is used >= 4 times. If used only once,
> it is worse. The same is approximately true about decoding. Lame
> results: http://evanjones.ca/software/java-string-encoding.html
>
> Solution 1: Add a custom encoder/decoder to CodedOutputStream,
> allocated as needed. This could be *bad* for applications that call
> Message.toByteString or .toByteArray frequently for messages with few
> strings, since that creates and throws away a single CodedOutputStream
> instance.
>
> Solution 2: Add a custom encoder/decoder per thread via a ThreadLocal.
> This requires fetching the ThreadLocal, which is slightly expensive,
> and adds some per-thread memory overhead (~ 4kB, tunable). however the
> allocations are done ONCE per thread, which should be significantly
> better.
>

Fetching a threadlocal should just be a pointer dereference on any decent
threading implementation.  Is it really that expensive in Java?

Solution 3:  Maintain a private freelist of encoder objects within
CodedOutputStream.  Allocate one the first time a string is encoded on a
particular stream object, and return it to the freelist on flush() (which is
always called before discarding the stream unless an exception interrupts
serialization).  In may make sense for the freelist to additionally be
thread-local to avoid locking, but if it's only one lock per serialization
maybe it's not a big deal?

--

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

Reply via email to