[protobuf] Improve message parsing speed

Michael Grove Wed, 20 Feb 2013 20:05:44 -0800

I am using protobuf for the wire format of a protocol I'm working on as a 
replacement to JSON.  The original protobuf messages were not much more 
than JSON as protobuf; my protobuf message just contained the same fields 
w/ the same format as the JSON structure.  This worked fine, but the 
payloads tended to be the same or larger than their JSON equivalent.  I 
tried using the union types technique, specifically with extensions as 
outlined in the docs [1], and this worked very well wrt to compression, the 
resulting messages were much smaller than the previous approach.


However, the parsing of the smaller messages far outweighs the advantage of 
less IO.  When I run a simple profiling example, the top 10-15 hot spots 
are all parsing of the messages.  The top ten most expensive methods are as 
follows:

MessageType1$Builder.mergeFrom
MessageType2$Builder.mergeFrom
MessageType1.getDescriptor()
MessageType1$Builder.getDescriptorForType
MessageType3$Builder.mergeFrom
MessageType2.getDescriptor
MessageType2$Builder.getDescriptorForType
MessageType1$Builder.create
MessageType1$Builder.buildPartial
MessageType3.isInitialized

The organization is pretty straightforward, MessageType3 contains a 
repeated list of MessageType2.  MessageType2 has three required fields of 
type MessageType1.  MessageType1 has a single required value, which is an 
enum.  The value of the enum defines which of the extensions, again as 
shown in [1], are present on the message.  There are a total of 6 possible 
extensions to MessageType1, each of which is a single primitive value, such 
as an int or a string.  There tends to be no more than 3 of the 6 possible 
extensions used at any give time.

The top two mergeFrom hot spots take ~32% of execution time, the test is 
the transmission of 1.85M objects of MessageType2 from client to server. 
 These are bundled in roughly 64k chunks, using 58 top level MessageType3 
objects.

Obviously all of the hot spot methods are auto-generated (Java).  There 
might be some hand changes I could make to that code, but if I ever 
re-generate, then i'd lose that work.  I am wondering if there are any 
tricks or changes that could be made to improve the parse time of the 
messages?  

Thanks.

Michael

[1] https://developers.google.com/protocol-buffers/docs/techniques

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

[protobuf] Improve message parsing speed

Reply via email to