Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

Evan Jones Mon, 17 May 2010 16:10:39 -0700

On May 17, 2010, at 15:38 , Kenton Varda wrote:

I see. So in fact your code is quite possibly slower in non-ASCIIcases? In fact, it sounds like having even one non-ASCII characterwould force extra copies to occur, which I would guess would defeatthe benefit, but we'd need benchmarks to tell for sure.

Yes. I've been playing with this a bit in my spare time since the lastemail, but I don't have any results I'm happy with yet. Rough notes:

* Encoding is (quite a bit?) faster than String.getBytes() if youassume one byte per character.* If you "guess" the number bytes per character poorly and have to domultiple allocations and copies, the regular Java version will win. Ifyou get it right (even if you first guess 1 byte per character) itlooks like it can be slightly faster or on par with the Java version.* Re-using a temporary byte[] for string encoding may be faster thanString.getBytes(), which effectively allocates a temporary byte[] eachtime.



I'm going to try to rework my code with a slightly different policy:

a) Assume 1 byte per character and attempt the encode. If we run outof space:b) Use a shared temporary buffer and continue the encode. If we runout of space:c) Allocate a worst case 4 byte per character buffer and finish theencode.

This should be much better than the JDK version for ASCII, a bitbetter for "short" strings that fit in the shared temporary buffer,and not significantly worse for the rest, but I'll need to test it tobe sure.

This is sort of just a "fun" experiment for me at this point, so whoknows when I may get around to actually "finishing" this.


Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Re: Java UTF-8 encoding/decoding: possible performance improvements

Reply via email to