Re: Slicing support in Python

Alek Storm Fri, 05 Dec 2008 22:59:26 -0800

On Wed, Dec 3, 2008 at 5:32 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> Sorry, I think you misunderstood.  The C++ parsers generated by protoc
> (with optimize_for = SPEED) are an order of magnitude faster than the
> dynamic *C++* parser (used with optimize_for = CODE_SIZE and
> DynamicMessage).  The Python parser is considerably slower than either of
> them, but that's beside the point.  Your "decoupled" parser which produces a
> tag/value tree will be at least as slow as the existing C++ dynamic parser,
> probably slower (since it sounds like it would use some sort of dictionary
> structure rather than flat classes/structs).
>

Oh, I forgot we have two C++ parsers.  The method I described uses the
generated (SPEED) parser, so it should be a great deal quicker.  It just
outputs a tree instead of a message, leaving the smart object creation to
Python.

Run this backwards when serializing, and you get another advantage: you can
>> easily swap out the function that converts the tree into serialized protobuf
>> for one that outputs XML, JSON, etc.
>>
>
> You can already easily write encoders and decoders for alternative formats
> using reflection.
>

Honestly, I think using reflection for something as basic as changing the
ouput format is hackish and could get ugly.  Reflection should only be used
in certain circumstances, e.g., generating message objects, because it
exposes the internals.  There's a chance we could change how Protocol
Buffers works under the hood in a way that screws up an XML outputter, which
wouldn't happen if we just expose a clean interface.

 Let's include it - it gives us a more complete list interface, there's no
>> downside, and the users can decide whether they want to use it.  We can't
>> predict all possible use cases.
>>
>
> Ah, yes, the old "Why not?" argument.  :)  Actually, I far prefer the
> opposite argument:  If you aren't sure if someone will want a feature, don't
> include it.  There is always a down side to including a feature.  Even if
> people choose not to use it, it increases code size, maintenance burden,
> memory usage, and interface complexity.  Worse yet, if people do use it,
> then we're permanently stuck with it, whether we like it or not.  We can't
> change it later, even if we decide it's wrong.  For example, we may decide
> later -- based on an actual use case, perhaps -- that it would really have
> been better if remove() compared elements by content rather than by
> identity, so that you could remove a message from a repeated field by
> constructing an identical message and then calling remove().  But we
> wouldn't be able to change it.  We'd have to instead add a different method
> like removeByValue(), which would be ugly and add even more complexity.
>
> Protocol Buffers got where they are by stubbornly refusing the vast
> majority of feature suggestions.  :)
>

Ha, I thought you might say that.  It's a good philosophy, and I completely
understand where you're coming from.  So I concede that point, and it all
boils down to "complete interface" vs. "compact interface".

But just for the record, I'm pretty sure Python's list remove() method
compares by value, and doesn't have a method that compares by identity.  So
there would be no reason to include a compare-by-identity method in protobuf
repeated fields.

That said, you do have a good point that the interface should be similar to
> standard Python lists if possible.  But given the other problems that
> prevent this, it seems like a moot point.
>

Okay, you place more value on "compact interface".  So are we keeping
remove() for scalar values?  I think their interfaces should be consistent,
but I don't think you think that's as important.

On Wed, Dec 3, 2008 at 10:25 AM, Petar Petrov <[EMAIL PROTECTED]>wrote:

> It's not that simple. We would also like to improve performance at least in
> MergeFrom/CopyFrom/ParseASCII/IsInitialized.
>

Okay.  So let's say we have a pure-C++ parser with a Python wrapper.  This
brings us back to getting slicing to work in C++ with no garbage collector.
Kenton, could you elaborate on what you meant earlier by "ownership
problems" specific to the C++ version?  I can't really see anything that
would affect PB repeated fields that isn't taken care of by handing the user
control over allocation and deallocation of the field elements.

Currently each composite field has a reference to its parent. This makes it
> impossible to add the same composite to two different repeated composite
> fields. The .add() method guarantees that this never happens.
>

Is there anything wrong with having a list of parents?  I'm guessing I'm
being naive - would speed be affected too much by that?

> I think protobuf's repeated composite fields aren't and shouldn't be
> equivalent to python lists.
>

Okay, that's cleared up now.  Thanks.

Cheers,
Alek Storm

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Slicing support in Python

Reply via email to