Re: [protobuf] python proto optimizations

2014-02-28 Thread Michael Haberler

Am 27.02.2014 um 01:11 schrieb patr...@dropbox.com:

 Hey guys,
 
 
 I wrote two different patches which optimize python proto performance.  Both 
 patches are running in production at Dropbox.  

I would love to see these patches being reintegrated.

Q: what is the relation to Python C bindings, if any?

regards, Michael

 What is the best way to upstream these changes?
 
 
 Patrick
 
 
 
 Patch #1. Python message patch 
 (https://www.dropbox.com/s/q0y44ypti0by779/protobuf-2.5.0.patch1):
 
 Changes:
 - precompute various varint tables
 - don't use proto's ByteSize function for serialization
 - simplified some code (got rid of the listener)
 - got rid of StringIO
 
 Internal benchmark:
 - random repeated int32s - ~18% faster
 - random repeated int64s - ~20% faster
 - random repeated strings - 27% faster
 - random repeated bytes - 27% faster
 - repeated message with each with a single random string - ~20% faster
 
 NOTE:
 - predefined_varints.py is generated by generate_predefined_varints.py
 
 
 
 Patch #2. C++ experimental binding patch 
 (https://www.dropbox.com/s/5nr0v76nfraaxif/protobuf-2.5.0.patch2):
 
 Changes:
 - fixed memory ownership / dangling pointer (see NOTE #1 for known issues)
1. inc ref count parent message when accessing a field, 
2. a cleared field's is freed only when the parent is deleted
 - fixed MakeDescriptor to correctly generating simple proto (see NOTE #2)
 - fixed MergeFrom to not crash on check failure due to self merge 
 - fixed both repeated and non-repeated field clearing 
 - modified varint deserialization to always return PyLong (to match existing 
 python implementation) 
 - always mark message as mutate when extending a repeated field (even when 
 extending by an empty list) 
 - deleted/updated bad tests from the protobuf test suite 
 
 Internal benchmark (relative to the first patch):
 - 30x faster for repeated varints
 - 8x faster for repeated string
 - 6x faster for repeated bytes 
 - 26x speed up for repeated nested msg
 
 NOTE: 
 1. In the current implementation, a new python object is created each time a 
 field is accessed. To make this 100% correct, we should return the same 
 python object whenever the same field is accessed; however, I don't think the 
 accounting overhead is worth it.  Implications due to the current 
 implementation:
- repeatedly clearing / mutating the same message can cause memory blow up 
- There's a subtle bug with clearing / mutating default message fields: 
 
This is correct. Holding a reference to a MUTATED field X, then clearing 
 the parent, then mutate X. e.g., 
child = parent.optional_nested_msg 
child.field = 123 # this mutates the field 
parent.Clear() 
child.field = 321 
assert not parent.HasField('child') # passes 
 
This is incorrect. Holding a reference to a UNMUTATED field X, then 
 clearing the parent, then mutate X. 
child = parent.optional_nested_msg 
parent.Clear() 
child.field = 321 # this inadvertently causes parent to generate a 
 different empty msg for optional_nested_msg. 
assert not parent.HasField('optional_nested_msg') # fail 
 
Luckily, these access patterns are extremely rare (at least at dropbox).
 
 2. I wrote a fully functional MakeDescriptor for c++ protos when I was at 
 google.  Talk to the F1 team (specifically Bart Samwel / Chad Whipkey) if 
 you're interested in upstreaming that to the opensource community.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Protocol Buffers group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to protobuf+unsubscr...@googlegroups.com.
 To post to this group, send email to protobuf@googlegroups.com.
 Visit this group at http://groups.google.com/group/protobuf.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/groups/opt_out.


Re: [protobuf] python proto optimizations

2014-02-28 Thread Feng Xiao
For patch #1, you can submit it to the issue tracker. We don't have time to
look at it immediately but will get to it in the next release.

For patch #2, we don't accept patches to the experimental C++ binding
support any more because it will be replaced by a new design/implementation
in the next release. Sorry for that but AFAIK the new design/implementation
is significantly better so it's worthwhile to try if you are currently
using the old one (which has hard-to-address memory ownership problems).

On Wed, Feb 26, 2014 at 4:11 PM, patr...@dropbox.com wrote:

 Hey guys,


 I wrote two different patches which optimize python proto performance.
  Both patches are running in production at Dropbox.  What is the best way
 to upstream these changes?


 Patrick

 

 Patch #1. Python message patch (
 https://www.dropbox.com/s/q0y44ypti0by779/protobuf-2.5.0.patch1):

 Changes:
 - precompute various varint tables
 - don't use proto's ByteSize function for serialization
 - simplified some code (got rid of the listener)
 - got rid of StringIO

 Internal benchmark:
 - random repeated int32s - ~18% faster
 - random repeated int64s - ~20% faster
 - random repeated strings - 27% faster
 - random repeated bytes - 27% faster
 - repeated message with each with a single random string - ~20% faster

 NOTE:
 - predefined_varints.py is generated by generate_predefined_varints.py

 

 Patch #2. C++ experimental binding patch (
 https://www.dropbox.com/s/5nr0v76nfraaxif/protobuf-2.5.0.patch2):

 Changes:
 - fixed memory ownership / dangling pointer (see NOTE #1 for known issues)
 1. inc ref count parent message when accessing a field,
 2. a cleared field's is freed only when the parent is deleted
 - fixed MakeDescriptor to correctly generating simple proto (see NOTE #2)
 - fixed MergeFrom to not crash on check failure due to self merge
 - fixed both repeated and non-repeated field clearing
 - modified varint deserialization to always return PyLong (to match
 existing python implementation)
 - always mark message as mutate when extending a repeated field (even when
 extending by an empty list)
 - deleted/updated bad tests from the protobuf test suite

 Internal benchmark (relative to the first patch):
 - 30x faster for repeated varints
 - 8x faster for repeated string
 - 6x faster for repeated bytes
 - 26x speed up for repeated nested msg

 NOTE:
 1. In the current implementation, a new python object is created each time
 a field is accessed. To make this 100% correct, we should return the same
 python object whenever the same field is accessed; however, I don't think
 the accounting overhead is worth it.  Implications due to the current
 implementation:
 - repeatedly clearing / mutating the same message can cause memory
 blow up
 - There's a subtle bug with clearing / mutating default message
 fields:

 This is correct. Holding a reference to a MUTATED field X, then
 clearing the parent, then mutate X. e.g.,
 child = parent.optional_nested_msg
 child.field = 123 # this mutates the field
 parent.Clear()
 child.field = 321
 assert not parent.HasField('child') # passes

 This is incorrect. Holding a reference to a UNMUTATED field X, then
 clearing the parent, then mutate X.
 child = parent.optional_nested_msg
 parent.Clear()
 child.field = 321 # this inadvertently causes parent to generate a
 different empty msg for optional_nested_msg.
 assert not parent.HasField('optional_nested_msg') # fail

 Luckily, these access patterns are extremely rare (at least at
 dropbox).

 2. I wrote a fully functional MakeDescriptor for c++ protos when I was at
 google.  Talk to the F1 team (specifically Bart Samwel / Chad Whipkey) if
 you're interested in upstreaming that to the opensource community.

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to protobuf+unsubscr...@googlegroups.com.
 To post to this group, send email to protobuf@googlegroups.com.
 Visit this group at http://groups.google.com/group/protobuf.
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/groups/opt_out.