[protobuf] Re: Performance of java proto buffers

2010-08-23 Thread achintms
Thanks Evan. That was very helpful. I got rid of the external object
and created the internal objects directly. After that the only part
that was taking time was decoding. I like the idea of using bytes for
serialization and do my own encoding/decoding on top of that. That way
I can delay decoding until it is needed. For example for comparisons I
should just be able to use the bytes. Also do you think that if I
encode/decode using utf-16 it would be faster? Clearly it is not as
compressed.

On Aug 22, 11:58 am, Evan Jones  wrote:
> On Aug 19, 2010, at 11:45 , achintms wrote:
>
> > I have an application that is reading data from disk and is using
> > proto buffers to create java objects. When doing performance analysis
> > I was surprised to find out that most of the time was spent in and
> > around proto buffers and not reading data from disk.
>
> In my experience, protocol buffers are more than fast enough to be  
> able to keep up with disk speeds. That is, when reading uncached data  
> from the disk at 100 MB/s, protocol buffers can decode it at that  
> speed. Now, if your data is cached, and your application is not doing  
> much with the data, then I would expect protocol buffers to take 100%  
> of the CPU time, since the disk read doesn't take CPU, and your  
> application isn't doing much.
>
> In other words: in a more "real" application, I would expect protocol  
> buffers will take only a very small portion of your application's time.
>
> > Again I expected that decoding strings would be almost all the time
> > (although decoding here still seems slower than in C in my
> > experience). I am trying to figure out why mergeFrom method for this
> > message is taking 6 sec (own time).
>
> Decoding strings in Java is way slower because it actually decodes the  
> UTF-8 encoded strings into UTF-16 strings in memory. The C++ version  
> just leaves the data in UTF-8. If this is a performance issue for your  
> application, you may wish to consider using the bytes protocol buffer  
> type rather than strings. This is less convenient, and means you can  
> "screw up" by accidentally sending invalid data, but is faster.
>
> > There are around 15 SubMessages.
>
> This is basically the problem right here. Each time you parse one of  
> these messages, it ends up allocating a new object for each of these  
> sub messages, and a new object for each string inside them. This is  
> pretty slow.
>
> As I said above: I suspect that in a "real" application, this won't be  
> a problem. However, it would be faster if you get rid of all the sub  
> messages (assuming that you don't actually need them for some other  
> reason).
>
> Finally, I'll take a moment to promote my patch that improves Java  
> message *encoding* performance, by optimizing string encoding. It is  
> available at the following URL. Unfortunately, there is no similar  
> approach to improving the decoding performance.
>
> http://codereview.appspot.com/949044/
>
> Evan
>
> --
> Evan Joneshttp://evanjones.ca/

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Performance of java proto buffers

2010-08-19 Thread achintms
I have an application that is reading data from disk and is using
proto buffers to create java objects. When doing performance analysis
I was surprised to find out that most of the time was spent in and
around proto buffers and not reading data from disk.

On profiling further (using yourkit) I found the following breakdown
of the function that was converting byte[] to a message:

1. Total - 13 sec
2. Decoding strings - 7 sec
3. .Builder.mergeFrom - 6 sec

Again I expected that decoding strings would be almost all the time
(although decoding here still seems slower than in C in my
experience). I am trying to figure out why mergeFrom method for this
message is taking 6 sec (own time). My message looks something like
this:

message Message {
enum SubMessageType {
TYPE1 = 1;
TYPE2 = 2;
...
}
required SubMessageType type = 1;
optional SubMessage1 subMessage1 = 2;
optional SubMessage2 subMessage2 = 3;
...
}

There are around 15 SubMessages. Each of the sub messages are
basically very simple messages with just one string field. Also from
the profiler reading sub messages is taking around 7 sec which is all
in String decoding (as I mentioned above).

The code for .Builder.mergeFrom is basically just a loop
reading a tag and trying to match to one of the sub message types.
This loop would at most have 2 iterations (1 for the enum and 1 for
the sub messages). I am confused then how this method is taking up so
much time. I noticed that there were synthetic accessor warnings in
the proto-buffer generated code and some of that showed up in the
profiler as well (UnknownFieldSet.newBuilder...access$000). That
doesn't quite account for the whole time though.

In your experience is this expected? Am I doing something wrong? Can I
add any options or write the message differently to make it more
efficient. I will appreciate any insight.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: Streaming different types of messages

2009-03-27 Thread achintms

Thanks. Also how do I know the type of the message? One way would be
to check all optional fields (each represent a different type of
message) of the wrapper message and then pick the one which is not
null. Is that the only way?

On Mar 27, 12:08 pm, Dave Bailey  wrote:
> Kenton,
>
> I don't suppose there'd ever be a way to mark a set of fields as
> mutually exclusive?
>
> -dave
>
> On Mar 27, 8:42 am, "Jon Skeet " 
> wrote:
>
> > On Mar 27, 1:32 pm, achin...@gmail.com wrote:
>
> > > If I understand correctly there is no good way to use proto buffers to
> > > stream different types of messages, right? For example if my stream
> > > has a mix of several messages of type m1 and m2, I will have to device
> > > a scheme outside of proto buffers to separate it into 2 streams and
> > > then pass it through parsers for each.
>
> > > In other words is there a way to do event based parsing using proto
> > > buffers, or even a way to say don't parse a repetitive field unless
> > > needed.
>
> > No, you don't have to do it into separate streams. Instead, stream a
> > sequence of messages each of which has either an m1, or an m2, or an
> > m3 etc. This basically ends up being (tag) (message) (tag) (message),
> > where the tag is effectively identifying the type of message.
>
> > All you need to do is create the wrapper message, and the rest should
> > work fine.
>
> > Jon
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Streaming different types of messages

2009-03-27 Thread achintms

If I understand correctly there is no good way to use proto buffers to
stream different types of messages, right? For example if my stream
has a mix of several messages of type m1 and m2, I will have to device
a scheme outside of proto buffers to separate it into 2 streams and
then pass it through parsers for each.

In other words is there a way to do event based parsing using proto
buffers, or even a way to say don't parse a repetitive field unless
needed.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---