Even with the extra call to access the offset, I would think there would be
some advantage to not making the data copies, which generate garbage cruft.

Am interested in your patch whenever it surfaces.

I seem to remember you saying that using an Encoder/Decoder didn't pay off
when the number of strings to en/decode was small. 
Did the same hold true when using a ThreadLocal?


-----Original Message-----
From: Evan Jones [mailto:ev...@mit.edu] 
Sent: Monday, May 31, 2010 4:32 PM
To: David Dabbs
Cc: Protocol Buffers
Subject: Re: [protobuf] Java UTF-8 encoding/decoding: possible performance

On May 31, 2010, at 14:25 , David Dabbs wrote:
> you may access a String's internals via reflection in a "safe," albeit
> potentially implementation-specific way. See class code below.
> As long as your java.lang.String uses "value" for the char[] and
> "offset" for the storage offset, this should work.
> No sun.misc.Unsafe used. Only tested/used on JDK6.

Good idea! Unfortunately, this isn't much faster for small strings. It  
is faster if you just get the value char[] array. However, when I  
modified my implementation to get both the char[] value and int  
offset, it ended up being about the same speed for my test data set,  
which is composed mostly of "short" UTF-8 and ASCII strings.  
Unfortunately, a correct implementation will need to get both values.  
Since this is also somewhat "dangerous," it doesn't seem like a great  
idea for my data.

At any rate:  I'll try to find some time to try and prepare a protocol  
buffer patch with my "encode to a temporary ByteBuffer" trick, which  
does make things a bit faster. I won't necessarily advocate this patch  
to be included, but after having wasted this much time on this stuff,  
I'll certainly try to maintain the patch for a while, in case others  
are interested.


Evan Jones

No virus found in this incoming message.
Checked by AVG - www.avg.com 
Version: 9.0.819 / Virus Database: 271.1.1/2908 - Release Date: 05/31/10

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to