Thanks Evan. That was very helpful. I got rid of the external object
and created the internal objects directly. After that the only part
that was taking time was decoding. I like the idea of using bytes for
serialization and do my own encoding/decoding on top of that. That way
I can delay decoding until it is needed. For example for comparisons I
should just be able to use the bytes. Also do you think that if I
encode/decode using utf-16 it would be faster? Clearly it is not as
On Aug 22, 11:58 am, Evan Jones <ev...@mit.edu> wrote:
> On Aug 19, 2010, at 11:45 , achintms wrote:
> > I have an application that is reading data from disk and is using
> > proto buffers to create java objects. When doing performance analysis
> > I was surprised to find out that most of the time was spent in and
> > around proto buffers and not reading data from disk.
> In my experience, protocol buffers are more than fast enough to be
> able to keep up with disk speeds. That is, when reading uncached data
> from the disk at 100 MB/s, protocol buffers can decode it at that
> speed. Now, if your data is cached, and your application is not doing
> much with the data, then I would expect protocol buffers to take 100%
> of the CPU time, since the disk read doesn't take CPU, and your
> application isn't doing much.
> In other words: in a more "real" application, I would expect protocol
> buffers will take only a very small portion of your application's time.
> > Again I expected that decoding strings would be almost all the time
> > (although decoding here still seems slower than in C in my
> > experience). I am trying to figure out why mergeFrom method for this
> > message is taking 6 sec (own time).
> Decoding strings in Java is way slower because it actually decodes the
> UTF-8 encoded strings into UTF-16 strings in memory. The C++ version
> just leaves the data in UTF-8. If this is a performance issue for your
> application, you may wish to consider using the bytes protocol buffer
> type rather than strings. This is less convenient, and means you can
> "screw up" by accidentally sending invalid data, but is faster.
> > There are around 15 SubMessages.
> This is basically the problem right here. Each time you parse one of
> these messages, it ends up allocating a new object for each of these
> sub messages, and a new object for each string inside them. This is
> pretty slow.
> As I said above: I suspect that in a "real" application, this won't be
> a problem. However, it would be faster if you get rid of all the sub
> messages (assuming that you don't actually need them for some other
> Finally, I'll take a moment to promote my patch that improves Java
> message *encoding* performance, by optimizing string encoding. It is
> available at the following URL. Unfortunately, there is no similar
> approach to improving the decoding performance.
> Evan Joneshttp://evanjones.ca/
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to
For more options, visit this group at