This does somewhat suggestive that it might be worthwhile specifically
tagging a field as ASCII only. There are enough cases of this that it
could be a huge win.


On 5/17/10, Evan Jones <ev...@mit.edu> wrote:
> On May 17, 2010, at 15:38 , Kenton Varda wrote:
>> I see.  So in fact your code is quite possibly slower in non-ASCII
>> cases?  In fact, it sounds like having even one non-ASCII character
>> would force extra copies to occur, which I would guess would defeat
>> the benefit, but we'd need benchmarks to tell for sure.
>
> Yes. I've been playing with this a bit in my spare time since the last
> email, but I don't have any results I'm happy with yet. Rough notes:
>
> * Encoding is (quite a bit?) faster than String.getBytes() if you
> assume one byte per character.
> * If you "guess" the number bytes per character poorly and have to do
> multiple allocations and copies, the regular Java version will win. If
> you get it right (even if you first guess 1 byte per character) it
> looks like it can be slightly faster or on par with the Java version.
> * Re-using a temporary byte[] for string encoding may be faster than
> String.getBytes(), which effectively allocates a temporary byte[] each
> time.
>
>
> I'm going to try to rework my code with a slightly different policy:
>
> a) Assume 1 byte per character and attempt the encode. If we run out
> of space:
> b) Use a shared temporary buffer and continue the encode. If we run
> out of space:
> c) Allocate a worst case 4 byte per character buffer and finish the
> encode.
>
>
> This should be much better than the JDK version for ASCII, a bit
> better for "short" strings that fit in the shared temporary buffer,
> and not significantly worse for the rest, but I'll need to test it to
> be sure.
>
> This is sort of just a "fun" experiment for me at this point, so who
> knows when I may get around to actually "finishing" this.
>
> Evan
>
> --
> Evan Jones
> http://evanjones.ca/
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
Sent from my mobile device

Chris

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to