Am 27.02.2014 um 01:11 schrieb patr...@dropbox.com:
> Hey guys,
> I wrote two different patches which optimize python proto performance. Both
> patches are running in production at Dropbox.
I would love to see these patches being reintegrated.
Q: what is the relation to Python C bindings, if any?
> What is the best way to upstream these changes?
> Patch #1. Python message patch
> - precompute various varint tables
> - don't use proto's ByteSize function for serialization
> - simplified some code (got rid of the listener)
> - got rid of StringIO
> Internal benchmark:
> - random repeated int32s - ~18% faster
> - random repeated int64s - ~20% faster
> - random repeated strings - 27% faster
> - random repeated bytes - 27% faster
> - repeated message with each with a single random string - ~20% faster
> - predefined_varints.py is generated by generate_predefined_varints.py
> Patch #2. C++ experimental binding patch
> - fixed memory ownership / dangling pointer (see NOTE #1 for known issues)
> 1. inc ref count parent message when accessing a field,
> 2. a cleared field's is freed only when the parent is deleted
> - fixed MakeDescriptor to correctly generating simple proto (see NOTE #2)
> - fixed MergeFrom to not crash on check failure due to self merge
> - fixed both repeated and non-repeated field clearing
> - modified varint deserialization to always return PyLong (to match existing
> python implementation)
> - always mark message as mutate when extending a repeated field (even when
> extending by an empty list)
> - deleted/updated bad tests from the protobuf test suite
> Internal benchmark (relative to the first patch):
> - 30x faster for repeated varints
> - 8x faster for repeated string
> - 6x faster for repeated bytes
> - 26x speed up for repeated nested msg
> 1. In the current implementation, a new python object is created each time a
> field is accessed. To make this 100% correct, we should return the same
> python object whenever the same field is accessed; however, I don't think the
> accounting overhead is worth it. Implications due to the current
> - repeatedly clearing / mutating the same message can cause memory blow up
> - There's a subtle bug with clearing / mutating default message fields:
> This is correct. Holding a reference to a MUTATED field X, then clearing
> the parent, then mutate X. e.g.,
> child = parent.optional_nested_msg
> child.field = 123 # this mutates the field
> child.field = 321
> assert not parent.HasField('child') # passes
> This is incorrect. Holding a reference to a UNMUTATED field X, then
> clearing the parent, then mutate X.
> child = parent.optional_nested_msg
> child.field = 321 # this inadvertently causes parent to generate a
> different empty msg for optional_nested_msg.
> assert not parent.HasField('optional_nested_msg') # fail
> Luckily, these access patterns are extremely rare (at least at dropbox).
> 2. I wrote a fully functional MakeDescriptor for c++ protos when I was at
> google. Talk to the F1 team (specifically Bart Samwel / Chad Whipkey) if
> you're interested in upstreaming that to the opensource community.
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to firstname.lastname@example.org.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/groups/opt_out.
You received this message because you are subscribed to the Google Groups
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
To post to this group, send email to email@example.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/groups/opt_out.