On Fri, Feb 22, 2013 at 12:27 AM, Mike Grove <[email protected]> wrote:
> > > On Thu, Feb 21, 2013 at 9:02 AM, Feng Xiao <[email protected]> wrote: > >> >> >> On Thu, Feb 21, 2013 at 8:37 PM, Mike Grove <[email protected]> wrote: >> >>> >>> >>> >>> On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao <[email protected]> wrote: >>> >>>> >>>> >>>> On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove >>>> <[email protected]>wrote: >>>> >>>>> I am using protobuf for the wire format of a protocol I'm working on >>>>> as a replacement to JSON. The original protobuf messages were not much >>>>> more than JSON as protobuf; my protobuf message just contained the same >>>>> fields w/ the same format as the JSON structure. This worked fine, but >>>>> the >>>>> payloads tended to be the same or larger than their JSON equivalent. I >>>>> tried using the union types technique, specifically with extensions as >>>>> outlined in the docs [1], and this worked very well wrt to compression, >>>>> the >>>>> resulting messages were much smaller than the previous approach. >>>>> >>>>> However, the parsing of the smaller messages far outweighs the >>>>> advantage of less IO. >>>>> >>>> >>> >>>> You mean parsing protobufs performs worse than parsing JSON? >>>> >>> >>> For the nest structured based on extensions as described in the >>> techniques sections of the protobuf docs, throughput it about the same. I >>> assume that means parsing is slower because I'm sending fewer bytes over >>> the wire. My original attempt at a protobuf based format was the fastest >>> option, but it tended to be the most bytes sent over the wire, often more >>> than the raw data I was sending. >>> >>> >>>> >>>> >>>>> When I run a simple profiling example, the top 10-15 hot spots are all >>>>> parsing of the messages. The top ten most expensive methods are as >>>>> follows: >>>>> >>>>> MessageType1$Builder.mergeFrom >>>>> MessageType2$Builder.mergeFrom >>>>> MessageType1.getDescriptor() >>>>> MessageType1$Builder.getDescriptorForType >>>>> MessageType3$Builder.mergeFrom >>>>> MessageType2.getDescriptor >>>>> MessageType2$Builder.getDescriptorForType >>>>> MessageType1$Builder.create >>>>> MessageType1$Builder.buildPartial >>>>> MessageType3.isInitialized >>>>> >>>>> The organization is pretty straightforward, MessageType3 contains a >>>>> repeated list of MessageType2. MessageType2 has three required fields of >>>>> type MessageType1. MessageType1 has a single required value, which is an >>>>> enum. The value of the enum defines which of the extensions, again as >>>>> shown in [1], are present on the message. There are a total of 6 possible >>>>> extensions to MessageType1, each of which is a single primitive value, >>>>> such >>>>> as an int or a string. There tends to be no more than 3 of the 6 possible >>>>> extensions used at any give time. >>>>> >>>>> The top two mergeFrom hot spots take ~32% of execution time, the test >>>>> is the transmission of 1.85M objects of MessageType2 from client to >>>>> server. >>>>> These are bundled in roughly 64k chunks, using 58 top level MessageType3 >>>>> objects. >>>>> >>>> You can try the new parser API introduced in 2.5.0rc1, i.e., use >>>> MessageType3.parseFrom() instead of the Builder API to parse the message. >>>> Another option is to simplify the message structure. Instead of nesting >>>> many small MessageType2 in MessageType3, you can simply put the repeated >>>> extensions in MessageType3. >>>> >>> >>> This sounds good, I will try both of these options. >>> >>> Is 2.5.0rc1 fairly stable? >>> >> Yes, no big changes made since then. >> >> > > 2.5.0rc1 did not work for me. For the messages in question, I changed > from using mergeFrom to using parseFrom and I get 'Protocol message tag had > invalid wire type.' errors when parsing the result. > > Did internal format of message change? > No. > I am using protobuf with Netty; there is a frame size that I must keep my > protobuf payload within, and calling toByteArray after adding each > MessageType2 to the MessageType3 builder is way too expensive. So I'm > using CodedInputStream and toByteArray of MessageType2 directly to > construct the serialized form of MessageType3. > You might not construct the message in the right way and I have a feeling that your code can be improved by avoid some unnecessary copies. Could you attach your serialization code so I can have a look? > This way I can keep track of how many bytes i've written into the stream > and can stop before exceeding the netty frame size. > > This is the only thing I can think of on my end that would cause parsing > issues. > > Thanks. > > Michael > > >> >>> Thanks. >>> >>> Michael >>> >>> >>>> >>>> >>>>> Obviously all of the hot spot methods are auto-generated (Java). >>>>> There might be some hand changes I could make to that code, but if I ever >>>>> re-generate, then i'd lose that work. I am wondering if there are any >>>>> tricks or changes that could be made to improve the parse time of the >>>>> messages? >>>>> >>>>> Thanks. >>>>> >>>>> Michael >>>>> >>>>> [1] https://developers.google.com/protocol-buffers/docs/techniques >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Protocol Buffers" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at http://groups.google.com/group/protobuf?hl=en. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>>> >>>>> >>>> >>>> >>> >> > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/protobuf?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
