On Fri, Feb 22, 2013 at 12:27 AM, Mike Grove <[email protected]> wrote:

>
>
> On Thu, Feb 21, 2013 at 9:02 AM, Feng Xiao <[email protected]> wrote:
>
>>
>>
>> On Thu, Feb 21, 2013 at 8:37 PM, Mike Grove <[email protected]> wrote:
>>
>>>
>>>
>>>
>>> On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove 
>>>> <[email protected]>wrote:
>>>>
>>>>> I am using protobuf for the wire format of a protocol I'm working on
>>>>> as a replacement to JSON.  The original protobuf messages were not much
>>>>> more than JSON as protobuf; my protobuf message just contained the same
>>>>> fields w/ the same format as the JSON structure.  This worked fine, but 
>>>>> the
>>>>> payloads tended to be the same or larger than their JSON equivalent.  I
>>>>> tried using the union types technique, specifically with extensions as
>>>>> outlined in the docs [1], and this worked very well wrt to compression, 
>>>>> the
>>>>> resulting messages were much smaller than the previous approach.
>>>>>
>>>>> However, the parsing of the smaller messages far outweighs the
>>>>> advantage of less IO.
>>>>>
>>>>
>>>
>>>>  You mean parsing protobufs performs worse than parsing JSON?
>>>>
>>>
>>> For the nest structured based on extensions as described in the
>>> techniques sections of the protobuf docs, throughput it about the same.  I
>>> assume that means parsing is slower because I'm sending fewer bytes over
>>> the wire.  My original attempt at a protobuf based format was the fastest
>>> option, but it tended to be the most bytes sent over the wire, often more
>>> than the raw data I was sending.
>>>
>>>
>>>>
>>>>
>>>>> When I run a simple profiling example, the top 10-15 hot spots are all
>>>>> parsing of the messages.  The top ten most expensive methods are as 
>>>>> follows:
>>>>>
>>>>> MessageType1$Builder.mergeFrom
>>>>> MessageType2$Builder.mergeFrom
>>>>> MessageType1.getDescriptor()
>>>>> MessageType1$Builder.getDescriptorForType
>>>>> MessageType3$Builder.mergeFrom
>>>>> MessageType2.getDescriptor
>>>>> MessageType2$Builder.getDescriptorForType
>>>>> MessageType1$Builder.create
>>>>> MessageType1$Builder.buildPartial
>>>>> MessageType3.isInitialized
>>>>>
>>>>> The organization is pretty straightforward, MessageType3 contains a
>>>>> repeated list of MessageType2.  MessageType2 has three required fields of
>>>>> type MessageType1.  MessageType1 has a single required value, which is an
>>>>> enum.  The value of the enum defines which of the extensions, again as
>>>>> shown in [1], are present on the message.  There are a total of 6 possible
>>>>> extensions to MessageType1, each of which is a single primitive value, 
>>>>> such
>>>>> as an int or a string.  There tends to be no more than 3 of the 6 possible
>>>>> extensions used at any give time.
>>>>>
>>>>> The top two mergeFrom hot spots take ~32% of execution time, the test
>>>>> is the transmission of 1.85M objects of MessageType2 from client to 
>>>>> server.
>>>>>  These are bundled in roughly 64k chunks, using 58 top level MessageType3
>>>>> objects.
>>>>>
>>>> You can try the new parser API introduced in 2.5.0rc1, i.e., use
>>>> MessageType3.parseFrom()  instead of the Builder API to parse the message.
>>>> Another option is to simplify the message structure. Instead of nesting
>>>> many small MessageType2 in MessageType3, you can simply put the repeated
>>>> extensions in MessageType3.
>>>>
>>>
>>> This sounds good, I will try both of these options.
>>>
>>> Is 2.5.0rc1 fairly stable?
>>>
>> Yes, no big changes made since then.
>>
>>
>
> 2.5.0rc1 did not work for me.  For the messages in question, I changed
> from using mergeFrom to using parseFrom and I get 'Protocol message tag had
> invalid wire type.' errors when parsing the result.
>
> Did internal format of message change?
>
No.


> I am using protobuf with Netty; there is a frame size that I must keep my
> protobuf payload within, and calling toByteArray after adding each
> MessageType2 to the MessageType3 builder is way too expensive.  So I'm
> using CodedInputStream and toByteArray of MessageType2 directly to
> construct the serialized form of MessageType3.
>
You might not construct the message in the right way and I have a feeling
that your code can be improved by avoid some unnecessary copies. Could you
attach your serialization code so I can have a look?


> This way I can keep track of how many bytes i've written into the stream
> and can stop before exceeding the netty frame size.
>

> This is the only thing I can think of on my end that would cause parsing
> issues.
>
> Thanks.
>
> Michael
>
>
>>
>>> Thanks.
>>>
>>> Michael
>>>
>>>
>>>>
>>>>
>>>>> Obviously all of the hot spot methods are auto-generated (Java).
>>>>>  There might be some hand changes I could make to that code, but if I ever
>>>>> re-generate, then i'd lose that work.  I am wondering if there are any
>>>>> tricks or changes that could be made to improve the parse time of the
>>>>> messages?
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Michael
>>>>>
>>>>> [1] https://developers.google.com/protocol-buffers/docs/techniques
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Protocol Buffers" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at http://groups.google.com/group/protobuf?hl=en.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to