Thanks Evan. That was very helpful. I got rid of the external object
and created the internal objects directly. After that the only part
that was taking time was decoding. I like the idea of using bytes for
serialization and do my own encoding/decoding on top of that. That way
I can delay decoding until it is needed. For example for comparisons I
should just be able to use the bytes. Also do you think that if I
encode/decode using utf-16 it would be faster? Clearly it is not as

On Aug 22, 11:58 am, Evan Jones <> wrote:
> On Aug 19, 2010, at 11:45 , achintms wrote:
> > I have an application that is reading data from disk and is using
> > proto buffers to create java objects. When doing performance analysis
> > I was surprised to find out that most of the time was spent in and
> > around proto buffers and not reading data from disk.
> In my experience, protocol buffers are more than fast enough to be  
> able to keep up with disk speeds. That is, when reading uncached data  
> from the disk at 100 MB/s, protocol buffers can decode it at that  
> speed. Now, if your data is cached, and your application is not doing  
> much with the data, then I would expect protocol buffers to take 100%  
> of the CPU time, since the disk read doesn't take CPU, and your  
> application isn't doing much.
> In other words: in a more "real" application, I would expect protocol  
> buffers will take only a very small portion of your application's time.
> > Again I expected that decoding strings would be almost all the time
> > (although decoding here still seems slower than in C in my
> > experience). I am trying to figure out why mergeFrom method for this
> > message is taking 6 sec (own time).
> Decoding strings in Java is way slower because it actually decodes the  
> UTF-8 encoded strings into UTF-16 strings in memory. The C++ version  
> just leaves the data in UTF-8. If this is a performance issue for your  
> application, you may wish to consider using the bytes protocol buffer  
> type rather than strings. This is less convenient, and means you can  
> "screw up" by accidentally sending invalid data, but is faster.
> > There are around 15 SubMessages.
> This is basically the problem right here. Each time you parse one of  
> these messages, it ends up allocating a new object for each of these  
> sub messages, and a new object for each string inside them. This is  
> pretty slow.
> As I said above: I suspect that in a "real" application, this won't be  
> a problem. However, it would be faster if you get rid of all the sub  
> messages (assuming that you don't actually need them for some other  
> reason).
> Finally, I'll take a moment to promote my patch that improves Java  
> message *encoding* performance, by optimizing string encoding. It is  
> available at the following URL. Unfortunately, there is no similar  
> approach to improving the decoding performance.
> Evan
> --
> Evan Jones

You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to
To unsubscribe from this group, send email to
For more options, visit this group at

Reply via email to