On Jan 11, 2011, at 0:45 , Nicolae Mihalache wrote:
But I have noticed in java that it is impossible to create a message
containing a "bytes" fields without copying some buffers around. For
example if I have a encoded message of 1MB with a few regular fields
and one big bytes field, decoding the message will make a copy of the
entire buffer instead of keeping a reference to it.

By "decoding" I'm assuming you mean deserializing the message from a file or something.

This is a disadvantage, but it makes things much easier: it means the buffer used to read data can be recycled for the next message. Without this copy, the library would need to do complicated tracking of chunks of memory to determine if they are "in use" or not.

However, now that you mention it: in the case of big buffers, CodedInputStream.readBytes() gets called, which currently makes 2 copies of the data (it calls readRawBytes() then calls ByteString.copyFrom()). This could probably be "fixed" in CodedInputStream.readBytes(), which might improve performance a fair bit. I'll put this on my TODO list of things to look at, since I think my code does this pretty frequently.

Even worse when encoding: if I read some data from file, does not seem
possible to put it directly into a ByteString so I have to make first
a byte[], then copy it into the ByteString and when encoding, it makes
yet another byte[].

The copy cannot be avoided because it makes the API simpler (thread- safety, don't need to worry about the ByteBuffer being accidentally changed, etc). The latest version of Protocol Buffers in Subversion has ByteString.copyFrom(ByteBuffer) which will do what you want efficiently.


Evan Jones

You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to