Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

David Yu Tue, 22 Dec 2009 20:18:29 -0800

On Wed, Dec 23, 2009 at 11:14 AM, Kenton Varda <[email protected]> wrote:

> On Tue, Dec 22, 2009 at 7:06 PM, David Yu <[email protected]> wrote:
>
>> There should be a writeByteArray(int fieldNumber, byte[] value) in
>> CodedOutputStream so that the cached bytes of strings would
>> be written directly.  The ByteString would not help, it adds more memory
>> since it creates a copy of the byte array.
>>
>
> We could cache the bytes as a ByteString.  Converting a String to a
> ByteString does not require a redundant copy, as ByteString has methods for
> this.
>
> I think it would be better to do it this way because, in the long run, we
> actually want to extend ByteString to allow avoiding copies in some cases.
>  For example, if you are serializing a message to a ByteString (you caleld
> toByteString()) or parsing from a ByteString, then handling "bytes" fields
> should require any copy.  Instead, it should be possible to construct a
> ByteString which is a substring of some other ByteString in O(1) time, as
> well as concatenate ByteStrings in O(1) time.
>
> So this way, if the size-computation step converted the String to a
> ByteString and cached that, no further copy of the bytes would ever be
> needed in many cases.
>

Cool.
Btw, the ByteString's snippet is:
 return new ByteString(text.getBytes("UTF-
8"));

Another improvement would be avoiding the lookup and instead cache the
Charset.forName("UTF-8") object and use it.
I believe you google guys have also been evangelizing this :-) (PDF from
http://code.google.com/p/guava-libraries/)

-- 
When the cat is away, the mouse is alone.
- David Yu

--

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Java UTF-8 encoding/decoding: possible performance improvements

Reply via email to