Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

Kenton Varda Wed, 23 Dec 2009 13:59:46 -0800

On Wed, Dec 23, 2009 at 7:44 AM, Evan Jones <ev...@mit.edu> wrote:

> On Dec 22, 2009, at 19:59 , Kenton Varda wrote:
>
>> I wonder if we can safely discard the cached byte array during
>> serialization on the assumption that most messages are serialized only once?
>>
>
> This is a good idea, and it seems to me that this should definitely be
> possible. It would need to be done somewhat carefully, since Message objects
> are supposed to be thread safe, but I don't think this is particularly hard.



Right.  Assuming pointer reads and writes are atomic -- a reasonable
assumption in general, but we can guarantee it with "volatile" -- then it is
safe for one thread to set the cached value to null even if another thread
is reading it simultaneously.  Either the other thread will successfully get
the pointer and be able to use it, or it will get null and have to rebuild
it.


> The only additional tricky part: on subsequent serializations, it would be
> useful to know the serialized size of the string, in order to serialize the
> string directly into the output buffer, rather than needing to create a
> temporary byte[] array to get the length. This is also a solvable problem,
> and my lame benchmarks suggest this is only a small improvement anyway.


We could cache the size separately, but since it would only be useful in the
case of multiple serializations, it's probably not worthwhile.  Who
serializes the same immutable message multiple times?


> Solution 3:  Maintain a private freelist of encoder objects within
>> CodedOutputStream.  Allocate one the first time a string is encoded on a
>> particular stream object, and return it to the freelist on flush() (which is
>> always called before discarding the stream unless an exception interrupts
>> serialization).  In may make sense for the freelist to additionally be
>> thread-local to avoid locking, but if it's only one lock per serialization
>> maybe it's not a big deal?
>>
>
> I would guess that this might be more expensive than the ThreadLocal, but I
> don't know that for sure. It would avoid the "one encoder/decoder per
> thread" overhead. Do you think it is worth it?
>

If a simple ThreadLocal is faster, use that.  I was just thinking that if
ThreadLocal lookup is slow, then it would be useful to only have to do that
lookup once per CodedOutputStream object -- then subsequent uses are just
dereferencing a field.

--

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

Reply via email to