On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao <xiaof...@google.com> wrote:
> On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove <m...@clarkparsia.com>wrote:
>> I am using protobuf for the wire format of a protocol I'm working on as a
>> replacement to JSON. The original protobuf messages were not much more
>> than JSON as protobuf; my protobuf message just contained the same fields
>> w/ the same format as the JSON structure. This worked fine, but the
>> payloads tended to be the same or larger than their JSON equivalent. I
>> tried using the union types technique, specifically with extensions as
>> outlined in the docs , and this worked very well wrt to compression, the
>> resulting messages were much smaller than the previous approach.
>> However, the parsing of the smaller messages far outweighs the advantage
>> of less IO.
> You mean parsing protobufs performs worse than parsing JSON?
For the nest structured based on extensions as described in the techniques
sections of the protobuf docs, throughput it about the same. I assume that
means parsing is slower because I'm sending fewer bytes over the wire. My
original attempt at a protobuf based format was the fastest option, but it
tended to be the most bytes sent over the wire, often more than the raw
data I was sending.
>> When I run a simple profiling example, the top 10-15 hot spots are all
>> parsing of the messages. The top ten most expensive methods are as follows:
>> The organization is pretty straightforward, MessageType3 contains a
>> repeated list of MessageType2. MessageType2 has three required fields of
>> type MessageType1. MessageType1 has a single required value, which is an
>> enum. The value of the enum defines which of the extensions, again as
>> shown in , are present on the message. There are a total of 6 possible
>> extensions to MessageType1, each of which is a single primitive value, such
>> as an int or a string. There tends to be no more than 3 of the 6 possible
>> extensions used at any give time.
>> The top two mergeFrom hot spots take ~32% of execution time, the test is
>> the transmission of 1.85M objects of MessageType2 from client to server.
>> These are bundled in roughly 64k chunks, using 58 top level MessageType3
> You can try the new parser API introduced in 2.5.0rc1, i.e., use
> MessageType3.parseFrom() instead of the Builder API to parse the message.
> Another option is to simplify the message structure. Instead of nesting
> many small MessageType2 in MessageType3, you can simply put the repeated
> extensions in MessageType3.
This sounds good, I will try both of these options.
Is 2.5.0rc1 fairly stable?
>> Obviously all of the hot spot methods are auto-generated (Java). There
>> might be some hand changes I could make to that code, but if I ever
>> re-generate, then i'd lose that work. I am wondering if there are any
>> tricks or changes that could be made to improve the parse time of the
>>  https://developers.google.com/protocol-buffers/docs/techniques
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to protobuf+unsubscr...@googlegroups.com.
>> To post to this group, send email to firstname.lastname@example.org.
>> Visit this group at http://groups.google.com/group/protobuf?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to email@example.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.