Re: [protobuf] Improve message parsing speed

2013-02-21 Thread Feng Xiao
On Fri, Feb 22, 2013 at 12:27 AM, Mike Grove  wrote:

>
>
> On Thu, Feb 21, 2013 at 9:02 AM, Feng Xiao  wrote:
>
>>
>>
>> On Thu, Feb 21, 2013 at 8:37 PM, Mike Grove  wrote:
>>
>>>
>>>
>>>
>>> On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao  wrote:
>>>


 On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove 
 wrote:

> I am using protobuf for the wire format of a protocol I'm working on
> as a replacement to JSON.  The original protobuf messages were not much
> more than JSON as protobuf; my protobuf message just contained the same
> fields w/ the same format as the JSON structure.  This worked fine, but 
> the
> payloads tended to be the same or larger than their JSON equivalent.  I
> tried using the union types technique, specifically with extensions as
> outlined in the docs [1], and this worked very well wrt to compression, 
> the
> resulting messages were much smaller than the previous approach.
>
> However, the parsing of the smaller messages far outweighs the
> advantage of less IO.
>

>>>
  You mean parsing protobufs performs worse than parsing JSON?

>>>
>>> For the nest structured based on extensions as described in the
>>> techniques sections of the protobuf docs, throughput it about the same.  I
>>> assume that means parsing is slower because I'm sending fewer bytes over
>>> the wire.  My original attempt at a protobuf based format was the fastest
>>> option, but it tended to be the most bytes sent over the wire, often more
>>> than the raw data I was sending.
>>>
>>>


> When I run a simple profiling example, the top 10-15 hot spots are all
> parsing of the messages.  The top ten most expensive methods are as 
> follows:
>
> MessageType1$Builder.mergeFrom
> MessageType2$Builder.mergeFrom
> MessageType1.getDescriptor()
> MessageType1$Builder.getDescriptorForType
> MessageType3$Builder.mergeFrom
> MessageType2.getDescriptor
> MessageType2$Builder.getDescriptorForType
> MessageType1$Builder.create
> MessageType1$Builder.buildPartial
> MessageType3.isInitialized
>
> The organization is pretty straightforward, MessageType3 contains a
> repeated list of MessageType2.  MessageType2 has three required fields of
> type MessageType1.  MessageType1 has a single required value, which is an
> enum.  The value of the enum defines which of the extensions, again as
> shown in [1], are present on the message.  There are a total of 6 possible
> extensions to MessageType1, each of which is a single primitive value, 
> such
> as an int or a string.  There tends to be no more than 3 of the 6 possible
> extensions used at any give time.
>
> The top two mergeFrom hot spots take ~32% of execution time, the test
> is the transmission of 1.85M objects of MessageType2 from client to 
> server.
>  These are bundled in roughly 64k chunks, using 58 top level MessageType3
> objects.
>
 You can try the new parser API introduced in 2.5.0rc1, i.e., use
 MessageType3.parseFrom()  instead of the Builder API to parse the message.
 Another option is to simplify the message structure. Instead of nesting
 many small MessageType2 in MessageType3, you can simply put the repeated
 extensions in MessageType3.

>>>
>>> This sounds good, I will try both of these options.
>>>
>>> Is 2.5.0rc1 fairly stable?
>>>
>> Yes, no big changes made since then.
>>
>>
>
> 2.5.0rc1 did not work for me.  For the messages in question, I changed
> from using mergeFrom to using parseFrom and I get 'Protocol message tag had
> invalid wire type.' errors when parsing the result.
>
> Did internal format of message change?
>
No.


> I am using protobuf with Netty; there is a frame size that I must keep my
> protobuf payload within, and calling toByteArray after adding each
> MessageType2 to the MessageType3 builder is way too expensive.  So I'm
> using CodedInputStream and toByteArray of MessageType2 directly to
> construct the serialized form of MessageType3.
>
You might not construct the message in the right way and I have a feeling
that your code can be improved by avoid some unnecessary copies. Could you
attach your serialization code so I can have a look?


> This way I can keep track of how many bytes i've written into the stream
> and can stop before exceeding the netty frame size.
>

> This is the only thing I can think of on my end that would cause parsing
> issues.
>
> Thanks.
>
> Michael
>
>
>>
>>> Thanks.
>>>
>>> Michael
>>>
>>>


> Obviously all of the hot spot methods are auto-generated (Java).
>  There might be some hand changes I could make to that code, but if I ever
> re-generate, then i'd lose that work.  I am wondering if there are any
> tricks or changes that could be made to improve the parse time of the
> messages?
>
> Thanks.
>
> Michael
>
> [1] https://dev

Re: [protobuf] Improve message parsing speed

2013-02-21 Thread Mike Grove
On Thu, Feb 21, 2013 at 9:02 AM, Feng Xiao  wrote:

>
>
> On Thu, Feb 21, 2013 at 8:37 PM, Mike Grove  wrote:
>
>>
>>
>>
>> On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao  wrote:
>>
>>>
>>>
>>> On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove wrote:
>>>
 I am using protobuf for the wire format of a protocol I'm working on as
 a replacement to JSON.  The original protobuf messages were not much more
 than JSON as protobuf; my protobuf message just contained the same fields
 w/ the same format as the JSON structure.  This worked fine, but the
 payloads tended to be the same or larger than their JSON equivalent.  I
 tried using the union types technique, specifically with extensions as
 outlined in the docs [1], and this worked very well wrt to compression, the
 resulting messages were much smaller than the previous approach.

 However, the parsing of the smaller messages far outweighs the
 advantage of less IO.

>>>
>>
>>>  You mean parsing protobufs performs worse than parsing JSON?
>>>
>>
>> For the nest structured based on extensions as described in the
>> techniques sections of the protobuf docs, throughput it about the same.  I
>> assume that means parsing is slower because I'm sending fewer bytes over
>> the wire.  My original attempt at a protobuf based format was the fastest
>> option, but it tended to be the most bytes sent over the wire, often more
>> than the raw data I was sending.
>>
>>
>>>
>>>
 When I run a simple profiling example, the top 10-15 hot spots are all
 parsing of the messages.  The top ten most expensive methods are as 
 follows:

 MessageType1$Builder.mergeFrom
 MessageType2$Builder.mergeFrom
 MessageType1.getDescriptor()
 MessageType1$Builder.getDescriptorForType
 MessageType3$Builder.mergeFrom
 MessageType2.getDescriptor
 MessageType2$Builder.getDescriptorForType
 MessageType1$Builder.create
 MessageType1$Builder.buildPartial
 MessageType3.isInitialized

 The organization is pretty straightforward, MessageType3 contains a
 repeated list of MessageType2.  MessageType2 has three required fields of
 type MessageType1.  MessageType1 has a single required value, which is an
 enum.  The value of the enum defines which of the extensions, again as
 shown in [1], are present on the message.  There are a total of 6 possible
 extensions to MessageType1, each of which is a single primitive value, such
 as an int or a string.  There tends to be no more than 3 of the 6 possible
 extensions used at any give time.

 The top two mergeFrom hot spots take ~32% of execution time, the test
 is the transmission of 1.85M objects of MessageType2 from client to server.
  These are bundled in roughly 64k chunks, using 58 top level MessageType3
 objects.

>>> You can try the new parser API introduced in 2.5.0rc1, i.e., use
>>> MessageType3.parseFrom()  instead of the Builder API to parse the message.
>>> Another option is to simplify the message structure. Instead of nesting
>>> many small MessageType2 in MessageType3, you can simply put the repeated
>>> extensions in MessageType3.
>>>
>>
>> This sounds good, I will try both of these options.
>>
>> Is 2.5.0rc1 fairly stable?
>>
> Yes, no big changes made since then.
>
>

2.5.0rc1 did not work for me.  For the messages in question, I changed from
using mergeFrom to using parseFrom and I get 'Protocol message tag had
invalid wire type.' errors when parsing the result.

Did internal format of message change?  I am using protobuf with Netty;
there is a frame size that I must keep my protobuf payload within, and
calling toByteArray after adding each MessageType2 to the MessageType3
builder is way too expensive.  So I'm using CodedInputStream and
toByteArray of MessageType2 directly to construct the serialized form of
MessageType3.  This way I can keep track of how many bytes i've written
into the stream and can stop before exceeding the netty frame size.

This is the only thing I can think of on my end that would cause parsing
issues.

Thanks.

Michael


>
>> Thanks.
>>
>> Michael
>>
>>
>>>
>>>
 Obviously all of the hot spot methods are auto-generated (Java).  There
 might be some hand changes I could make to that code, but if I ever
 re-generate, then i'd lose that work.  I am wondering if there are any
 tricks or changes that could be made to improve the parse time of the
 messages?

 Thanks.

 Michael

 [1] https://developers.google.com/protocol-buffers/docs/techniques

 --
 You received this message because you are subscribed to the Google
 Groups "Protocol Buffers" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to protobuf+unsubscr...@googlegroups.com.
 To post to this group, send email to protobuf@googlegroups.com.
 Visit this group at http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Improve message parsing speed

2013-02-21 Thread Feng Xiao
On Thu, Feb 21, 2013 at 8:37 PM, Mike Grove  wrote:

>
>
>
> On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao  wrote:
>
>>
>>
>> On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove wrote:
>>
>>> I am using protobuf for the wire format of a protocol I'm working on as
>>> a replacement to JSON.  The original protobuf messages were not much more
>>> than JSON as protobuf; my protobuf message just contained the same fields
>>> w/ the same format as the JSON structure.  This worked fine, but the
>>> payloads tended to be the same or larger than their JSON equivalent.  I
>>> tried using the union types technique, specifically with extensions as
>>> outlined in the docs [1], and this worked very well wrt to compression, the
>>> resulting messages were much smaller than the previous approach.
>>>
>>> However, the parsing of the smaller messages far outweighs the advantage
>>> of less IO.
>>>
>>
>
>>  You mean parsing protobufs performs worse than parsing JSON?
>>
>
> For the nest structured based on extensions as described in the techniques
> sections of the protobuf docs, throughput it about the same.  I assume that
> means parsing is slower because I'm sending fewer bytes over the wire.  My
> original attempt at a protobuf based format was the fastest option, but it
> tended to be the most bytes sent over the wire, often more than the raw
> data I was sending.
>
>
>>
>>
>>> When I run a simple profiling example, the top 10-15 hot spots are all
>>> parsing of the messages.  The top ten most expensive methods are as follows:
>>>
>>> MessageType1$Builder.mergeFrom
>>> MessageType2$Builder.mergeFrom
>>> MessageType1.getDescriptor()
>>> MessageType1$Builder.getDescriptorForType
>>> MessageType3$Builder.mergeFrom
>>> MessageType2.getDescriptor
>>> MessageType2$Builder.getDescriptorForType
>>> MessageType1$Builder.create
>>> MessageType1$Builder.buildPartial
>>> MessageType3.isInitialized
>>>
>>> The organization is pretty straightforward, MessageType3 contains a
>>> repeated list of MessageType2.  MessageType2 has three required fields of
>>> type MessageType1.  MessageType1 has a single required value, which is an
>>> enum.  The value of the enum defines which of the extensions, again as
>>> shown in [1], are present on the message.  There are a total of 6 possible
>>> extensions to MessageType1, each of which is a single primitive value, such
>>> as an int or a string.  There tends to be no more than 3 of the 6 possible
>>> extensions used at any give time.
>>>
>>> The top two mergeFrom hot spots take ~32% of execution time, the test is
>>> the transmission of 1.85M objects of MessageType2 from client to server.
>>>  These are bundled in roughly 64k chunks, using 58 top level MessageType3
>>> objects.
>>>
>> You can try the new parser API introduced in 2.5.0rc1, i.e., use
>> MessageType3.parseFrom()  instead of the Builder API to parse the message.
>> Another option is to simplify the message structure. Instead of nesting
>> many small MessageType2 in MessageType3, you can simply put the repeated
>> extensions in MessageType3.
>>
>
> This sounds good, I will try both of these options.
>
> Is 2.5.0rc1 fairly stable?
>
Yes, no big changes made since then.


>
> Thanks.
>
> Michael
>
>
>>
>>
>>> Obviously all of the hot spot methods are auto-generated (Java).  There
>>> might be some hand changes I could make to that code, but if I ever
>>> re-generate, then i'd lose that work.  I am wondering if there are any
>>> tricks or changes that could be made to improve the parse time of the
>>> messages?
>>>
>>> Thanks.
>>>
>>> Michael
>>>
>>> [1] https://developers.google.com/protocol-buffers/docs/techniques
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Protocol Buffers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to protobuf+unsubscr...@googlegroups.com.
>>> To post to this group, send email to protobuf@googlegroups.com.
>>> Visit this group at http://groups.google.com/group/protobuf?hl=en.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [protobuf] Improve message parsing speed

2013-02-21 Thread Mike Grove
On Thu, Feb 21, 2013 at 12:25 AM, Feng Xiao  wrote:

>
>
> On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove wrote:
>
>> I am using protobuf for the wire format of a protocol I'm working on as a
>> replacement to JSON.  The original protobuf messages were not much more
>> than JSON as protobuf; my protobuf message just contained the same fields
>> w/ the same format as the JSON structure.  This worked fine, but the
>> payloads tended to be the same or larger than their JSON equivalent.  I
>> tried using the union types technique, specifically with extensions as
>> outlined in the docs [1], and this worked very well wrt to compression, the
>> resulting messages were much smaller than the previous approach.
>>
>> However, the parsing of the smaller messages far outweighs the advantage
>> of less IO.
>>
>

> You mean parsing protobufs performs worse than parsing JSON?
>

For the nest structured based on extensions as described in the techniques
sections of the protobuf docs, throughput it about the same.  I assume that
means parsing is slower because I'm sending fewer bytes over the wire.  My
original attempt at a protobuf based format was the fastest option, but it
tended to be the most bytes sent over the wire, often more than the raw
data I was sending.


>
>
>> When I run a simple profiling example, the top 10-15 hot spots are all
>> parsing of the messages.  The top ten most expensive methods are as follows:
>>
>> MessageType1$Builder.mergeFrom
>> MessageType2$Builder.mergeFrom
>> MessageType1.getDescriptor()
>> MessageType1$Builder.getDescriptorForType
>> MessageType3$Builder.mergeFrom
>> MessageType2.getDescriptor
>> MessageType2$Builder.getDescriptorForType
>> MessageType1$Builder.create
>> MessageType1$Builder.buildPartial
>> MessageType3.isInitialized
>>
>> The organization is pretty straightforward, MessageType3 contains a
>> repeated list of MessageType2.  MessageType2 has three required fields of
>> type MessageType1.  MessageType1 has a single required value, which is an
>> enum.  The value of the enum defines which of the extensions, again as
>> shown in [1], are present on the message.  There are a total of 6 possible
>> extensions to MessageType1, each of which is a single primitive value, such
>> as an int or a string.  There tends to be no more than 3 of the 6 possible
>> extensions used at any give time.
>>
>> The top two mergeFrom hot spots take ~32% of execution time, the test is
>> the transmission of 1.85M objects of MessageType2 from client to server.
>>  These are bundled in roughly 64k chunks, using 58 top level MessageType3
>> objects.
>>
> You can try the new parser API introduced in 2.5.0rc1, i.e., use
> MessageType3.parseFrom()  instead of the Builder API to parse the message.
> Another option is to simplify the message structure. Instead of nesting
> many small MessageType2 in MessageType3, you can simply put the repeated
> extensions in MessageType3.
>

This sounds good, I will try both of these options.

Is 2.5.0rc1 fairly stable?

Thanks.

Michael


>
>
>> Obviously all of the hot spot methods are auto-generated (Java).  There
>> might be some hand changes I could make to that code, but if I ever
>> re-generate, then i'd lose that work.  I am wondering if there are any
>> tricks or changes that could be made to improve the parse time of the
>> messages?
>>
>> Thanks.
>>
>> Michael
>>
>> [1] https://developers.google.com/protocol-buffers/docs/techniques
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to protobuf+unsubscr...@googlegroups.com.
>> To post to this group, send email to protobuf@googlegroups.com.
>> Visit this group at http://groups.google.com/group/protobuf?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [protobuf] Improve message parsing speed

2013-02-20 Thread Feng Xiao
On Thu, Feb 21, 2013 at 12:11 AM, Michael Grove wrote:

> I am using protobuf for the wire format of a protocol I'm working on as a
> replacement to JSON.  The original protobuf messages were not much more
> than JSON as protobuf; my protobuf message just contained the same fields
> w/ the same format as the JSON structure.  This worked fine, but the
> payloads tended to be the same or larger than their JSON equivalent.  I
> tried using the union types technique, specifically with extensions as
> outlined in the docs [1], and this worked very well wrt to compression, the
> resulting messages were much smaller than the previous approach.
>
> However, the parsing of the smaller messages far outweighs the advantage
> of less IO.
>
You mean parsing protobufs performs worse than parsing JSON?


> When I run a simple profiling example, the top 10-15 hot spots are all
> parsing of the messages.  The top ten most expensive methods are as follows:
>
> MessageType1$Builder.mergeFrom
> MessageType2$Builder.mergeFrom
> MessageType1.getDescriptor()
> MessageType1$Builder.getDescriptorForType
> MessageType3$Builder.mergeFrom
> MessageType2.getDescriptor
> MessageType2$Builder.getDescriptorForType
> MessageType1$Builder.create
> MessageType1$Builder.buildPartial
> MessageType3.isInitialized
>
> The organization is pretty straightforward, MessageType3 contains a
> repeated list of MessageType2.  MessageType2 has three required fields of
> type MessageType1.  MessageType1 has a single required value, which is an
> enum.  The value of the enum defines which of the extensions, again as
> shown in [1], are present on the message.  There are a total of 6 possible
> extensions to MessageType1, each of which is a single primitive value, such
> as an int or a string.  There tends to be no more than 3 of the 6 possible
> extensions used at any give time.
>
> The top two mergeFrom hot spots take ~32% of execution time, the test is
> the transmission of 1.85M objects of MessageType2 from client to server.
>  These are bundled in roughly 64k chunks, using 58 top level MessageType3
> objects.
>
You can try the new parser API introduced in 2.5.0rc1, i.e., use
MessageType3.parseFrom()  instead of the Builder API to parse the message.
Another option is to simplify the message structure. Instead of nesting
many small MessageType2 in MessageType3, you can simply put the repeated
extensions in MessageType3.


> Obviously all of the hot spot methods are auto-generated (Java).  There
> might be some hand changes I could make to that code, but if I ever
> re-generate, then i'd lose that work.  I am wondering if there are any
> tricks or changes that could be made to improve the parse time of the
> messages?
>
> Thanks.
>
> Michael
>
> [1] https://developers.google.com/protocol-buffers/docs/techniques
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




[protobuf] Improve message parsing speed

2013-02-20 Thread Michael Grove
I am using protobuf for the wire format of a protocol I'm working on as a 
replacement to JSON.  The original protobuf messages were not much more 
than JSON as protobuf; my protobuf message just contained the same fields 
w/ the same format as the JSON structure.  This worked fine, but the 
payloads tended to be the same or larger than their JSON equivalent.  I 
tried using the union types technique, specifically with extensions as 
outlined in the docs [1], and this worked very well wrt to compression, the 
resulting messages were much smaller than the previous approach.  

However, the parsing of the smaller messages far outweighs the advantage of 
less IO.  When I run a simple profiling example, the top 10-15 hot spots 
are all parsing of the messages.  The top ten most expensive methods are as 
follows:

MessageType1$Builder.mergeFrom
MessageType2$Builder.mergeFrom
MessageType1.getDescriptor()
MessageType1$Builder.getDescriptorForType
MessageType3$Builder.mergeFrom
MessageType2.getDescriptor
MessageType2$Builder.getDescriptorForType
MessageType1$Builder.create
MessageType1$Builder.buildPartial
MessageType3.isInitialized

The organization is pretty straightforward, MessageType3 contains a 
repeated list of MessageType2.  MessageType2 has three required fields of 
type MessageType1.  MessageType1 has a single required value, which is an 
enum.  The value of the enum defines which of the extensions, again as 
shown in [1], are present on the message.  There are a total of 6 possible 
extensions to MessageType1, each of which is a single primitive value, such 
as an int or a string.  There tends to be no more than 3 of the 6 possible 
extensions used at any give time.

The top two mergeFrom hot spots take ~32% of execution time, the test is 
the transmission of 1.85M objects of MessageType2 from client to server. 
 These are bundled in roughly 64k chunks, using 58 top level MessageType3 
objects.

Obviously all of the hot spot methods are auto-generated (Java).  There 
might be some hand changes I could make to that code, but if I ever 
re-generate, then i'd lose that work.  I am wondering if there are any 
tricks or changes that could be made to improve the parse time of the 
messages?  

Thanks.

Michael

[1] https://developers.google.com/protocol-buffers/docs/techniques

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.