[protobuf] Re: Performance of java proto buffers
Thanks Evan. That was very helpful. I got rid of the external object and created the internal objects directly. After that the only part that was taking time was decoding. I like the idea of using bytes for serialization and do my own encoding/decoding on top of that. That way I can delay decoding until it is needed. For example for comparisons I should just be able to use the bytes. Also do you think that if I encode/decode using utf-16 it would be faster? Clearly it is not as compressed. On Aug 22, 11:58 am, Evan Jones wrote: > On Aug 19, 2010, at 11:45 , achintms wrote: > > > I have an application that is reading data from disk and is using > > proto buffers to create java objects. When doing performance analysis > > I was surprised to find out that most of the time was spent in and > > around proto buffers and not reading data from disk. > > In my experience, protocol buffers are more than fast enough to be > able to keep up with disk speeds. That is, when reading uncached data > from the disk at 100 MB/s, protocol buffers can decode it at that > speed. Now, if your data is cached, and your application is not doing > much with the data, then I would expect protocol buffers to take 100% > of the CPU time, since the disk read doesn't take CPU, and your > application isn't doing much. > > In other words: in a more "real" application, I would expect protocol > buffers will take only a very small portion of your application's time. > > > Again I expected that decoding strings would be almost all the time > > (although decoding here still seems slower than in C in my > > experience). I am trying to figure out why mergeFrom method for this > > message is taking 6 sec (own time). > > Decoding strings in Java is way slower because it actually decodes the > UTF-8 encoded strings into UTF-16 strings in memory. The C++ version > just leaves the data in UTF-8. If this is a performance issue for your > application, you may wish to consider using the bytes protocol buffer > type rather than strings. This is less convenient, and means you can > "screw up" by accidentally sending invalid data, but is faster. > > > There are around 15 SubMessages. > > This is basically the problem right here. Each time you parse one of > these messages, it ends up allocating a new object for each of these > sub messages, and a new object for each string inside them. This is > pretty slow. > > As I said above: I suspect that in a "real" application, this won't be > a problem. However, it would be faster if you get rid of all the sub > messages (assuming that you don't actually need them for some other > reason). > > Finally, I'll take a moment to promote my patch that improves Java > message *encoding* performance, by optimizing string encoding. It is > available at the following URL. Unfortunately, there is no similar > approach to improving the decoding performance. > > http://codereview.appspot.com/949044/ > > Evan > > -- > Evan Joneshttp://evanjones.ca/ -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Performance of java proto buffers
I have an application that is reading data from disk and is using proto buffers to create java objects. When doing performance analysis I was surprised to find out that most of the time was spent in and around proto buffers and not reading data from disk. On profiling further (using yourkit) I found the following breakdown of the function that was converting byte[] to a message: 1. Total - 13 sec 2. Decoding strings - 7 sec 3. .Builder.mergeFrom - 6 sec Again I expected that decoding strings would be almost all the time (although decoding here still seems slower than in C in my experience). I am trying to figure out why mergeFrom method for this message is taking 6 sec (own time). My message looks something like this: message Message { enum SubMessageType { TYPE1 = 1; TYPE2 = 2; ... } required SubMessageType type = 1; optional SubMessage1 subMessage1 = 2; optional SubMessage2 subMessage2 = 3; ... } There are around 15 SubMessages. Each of the sub messages are basically very simple messages with just one string field. Also from the profiler reading sub messages is taking around 7 sec which is all in String decoding (as I mentioned above). The code for .Builder.mergeFrom is basically just a loop reading a tag and trying to match to one of the sub message types. This loop would at most have 2 iterations (1 for the enum and 1 for the sub messages). I am confused then how this method is taking up so much time. I noticed that there were synthetic accessor warnings in the proto-buffer generated code and some of that showed up in the profiler as well (UnknownFieldSet.newBuilder...access$000). That doesn't quite account for the whole time though. In your experience is this expected? Am I doing something wrong? Can I add any options or write the message differently to make it more efficient. I will appreciate any insight. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: Streaming different types of messages
Thanks. Also how do I know the type of the message? One way would be to check all optional fields (each represent a different type of message) of the wrapper message and then pick the one which is not null. Is that the only way? On Mar 27, 12:08 pm, Dave Bailey wrote: > Kenton, > > I don't suppose there'd ever be a way to mark a set of fields as > mutually exclusive? > > -dave > > On Mar 27, 8:42 am, "Jon Skeet " > wrote: > > > On Mar 27, 1:32 pm, achin...@gmail.com wrote: > > > > If I understand correctly there is no good way to use proto buffers to > > > stream different types of messages, right? For example if my stream > > > has a mix of several messages of type m1 and m2, I will have to device > > > a scheme outside of proto buffers to separate it into 2 streams and > > > then pass it through parsers for each. > > > > In other words is there a way to do event based parsing using proto > > > buffers, or even a way to say don't parse a repetitive field unless > > > needed. > > > No, you don't have to do it into separate streams. Instead, stream a > > sequence of messages each of which has either an m1, or an m2, or an > > m3 etc. This basically ends up being (tag) (message) (tag) (message), > > where the tag is effectively identifying the type of message. > > > All you need to do is create the wrapper message, and the rest should > > work fine. > > > Jon --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Streaming different types of messages
If I understand correctly there is no good way to use proto buffers to stream different types of messages, right? For example if my stream has a mix of several messages of type m1 and m2, I will have to device a scheme outside of proto buffers to separate it into 2 streams and then pass it through parsers for each. In other words is there a way to do event based parsing using proto buffers, or even a way to say don't parse a repetitive field unless needed. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---