On May 17, 2010, at 15:38 , Kenton Varda wrote:
I see. So in fact your code is quite possibly slower in non-ASCII
cases? In fact, it sounds like having even one non-ASCII character
would force extra copies to occur, which I would guess would defeat
the benefit, but we'd need benchmarks to tell for sure.
Yes. I've been playing with this a bit in my spare time since the last
email, but I don't have any results I'm happy with yet. Rough notes:
* Encoding is (quite a bit?) faster than String.getBytes() if you
assume one byte per character.
* If you "guess" the number bytes per character poorly and have to do
multiple allocations and copies, the regular Java version will win. If
you get it right (even if you first guess 1 byte per character) it
looks like it can be slightly faster or on par with the Java version.
* Re-using a temporary byte for string encoding may be faster than
String.getBytes(), which effectively allocates a temporary byte each
I'm going to try to rework my code with a slightly different policy:
a) Assume 1 byte per character and attempt the encode. If we run out
b) Use a shared temporary buffer and continue the encode. If we run
out of space:
c) Allocate a worst case 4 byte per character buffer and finish the
This should be much better than the JDK version for ASCII, a bit
better for "short" strings that fit in the shared temporary buffer,
and not significantly worse for the rest, but I'll need to test it to
This is sort of just a "fun" experiment for me at this point, so who
knows when I may get around to actually "finishing" this.
You received this message because you are subscribed to the Google Groups "Protocol
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to
For more options, visit this group at