This does somewhat suggestive that it might be worthwhile specifically tagging a field as ASCII only. There are enough cases of this that it could be a huge win.
On 5/17/10, Evan Jones <ev...@mit.edu> wrote: > On May 17, 2010, at 15:38 , Kenton Varda wrote: >> I see. So in fact your code is quite possibly slower in non-ASCII >> cases? In fact, it sounds like having even one non-ASCII character >> would force extra copies to occur, which I would guess would defeat >> the benefit, but we'd need benchmarks to tell for sure. > > Yes. I've been playing with this a bit in my spare time since the last > email, but I don't have any results I'm happy with yet. Rough notes: > > * Encoding is (quite a bit?) faster than String.getBytes() if you > assume one byte per character. > * If you "guess" the number bytes per character poorly and have to do > multiple allocations and copies, the regular Java version will win. If > you get it right (even if you first guess 1 byte per character) it > looks like it can be slightly faster or on par with the Java version. > * Re-using a temporary byte[] for string encoding may be faster than > String.getBytes(), which effectively allocates a temporary byte[] each > time. > > > I'm going to try to rework my code with a slightly different policy: > > a) Assume 1 byte per character and attempt the encode. If we run out > of space: > b) Use a shared temporary buffer and continue the encode. If we run > out of space: > c) Allocate a worst case 4 byte per character buffer and finish the > encode. > > > This should be much better than the JDK version for ASCII, a bit > better for "short" strings that fit in the shared temporary buffer, > and not significantly worse for the rest, but I'll need to test it to > be sure. > > This is sort of just a "fun" experiment for me at this point, so who > knows when I may get around to actually "finishing" this. > > Evan > > -- > Evan Jones > http://evanjones.ca/ > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to proto...@googlegroups.com. > To unsubscribe from this group, send email to > protobuf+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- Sent from my mobile device Chris -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.