On May 17, 2010, at 15:38 , Kenton Varda wrote:
I see. So in fact your code is quite possibly slower in non-ASCII cases? In fact, it sounds like having even one non-ASCII character would force extra copies to occur, which I would guess would defeat the benefit, but we'd need benchmarks to tell for sure.

Yes. I've been playing with this a bit in my spare time since the last email, but I don't have any results I'm happy with yet. Rough notes:

* Encoding is (quite a bit?) faster than String.getBytes() if you assume one byte per character. * If you "guess" the number bytes per character poorly and have to do multiple allocations and copies, the regular Java version will win. If you get it right (even if you first guess 1 byte per character) it looks like it can be slightly faster or on par with the Java version. * Re-using a temporary byte[] for string encoding may be faster than String.getBytes(), which effectively allocates a temporary byte[] each time.

I'm going to try to rework my code with a slightly different policy:

a) Assume 1 byte per character and attempt the encode. If we run out of space: b) Use a shared temporary buffer and continue the encode. If we run out of space: c) Allocate a worst case 4 byte per character buffer and finish the encode.

This should be much better than the JDK version for ASCII, a bit better for "short" strings that fit in the shared temporary buffer, and not significantly worse for the rest, but I'll need to test it to be sure.

This is sort of just a "fun" experiment for me at this point, so who knows when I may get around to actually "finishing" this.


Evan Jones

You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to