Re: Slicing support in Python

2008-12-19 Thread Alek Storm
On Fri, Dec 19, 2008 at 1:17 PM, Petar Petrov wrote:

> Looks like we are still keeping append, insert, remove and __setitem__?
>

Forgot those.  Okay, fixed in the latest patch.

Cheers,
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-19 Thread Petar Petrov
On Thu, Dec 18, 2008 at 4:06 AM, Alek Storm  wrote:

> On Sat, Dec 13, 2008 at 5:09 PM, Petar Petrov wrote:
>
>> On Mon, Dec 8, 2008 at 5:36 PM, Alek Storm  wrote:
>>
>>> Okay, then we just need to cache the size only during serialization.  The
>>> children's sizes are calculated and stored, then added to the parent's
>>> size.  Write the parent size, then write the parent, then the child size,
>>> then the child, on down the tree.  Then it's O(n) (same as we have
>>> currently) and no ownership problems, because we can drop the weak reference
>>> from child to parent.  Would that work?
>>
>>
>> It may work, but ByteSize is a part of the public interface of the
>> message, so making it slower may not be a good idea.
>> However the parent reference will still be needed.
>>
>> file.py:
>> m3 = M3()
>> m3.m2.m1.i = 3
>> m3.HasField('m2') # should be True
>>
>> How does m3 know if m2 was set? This information is right now provided by
>> the setter of 'i' in m1 (by calling TransitionToNonEmpty on the parent,
>> which calls TransitionToNonEmpty on its parent and so on).
>>
>
> Oops, I wasn't clear.  Of course HasField should work for non-repeated
> fields; I only meant to get rid of the weak reference when the message's
> parent is a repeated composite field, because HasField isn't used for those,
> so we don't need it if we cache the size only during serialization.  So we
> get a bunch of benefits in exchange for making a rarely used part of the
> interface slower, and only when used outside of the internal serialization
> functions.  What do you think?
>

I think it's a better idea to avoid adding features, which weren't
requested, until we have a solution to the real problem of the API - speed.
A change like this will make that problem even worse.
Extending the interface might make performance improvements harder to do.
While the API is slow features just fade out.


>
>
>> So the parent references are still needed. Let's keep the slice assignment
>> of repeated scalar fields and just remove the slice assignment of repeated
>> composite fields (I still don't find it usefull). E.g. we can keep
>> __getslice__, __delitem__ and __delslice__ for repeated composite fields.
>>
>
> Okay, I'll submit a patch with just those methods.  We can definitely agree
> on that :).  The above discussion is separate.


Looks like we are still keeping append, insert, remove and __setitem__?
Another concern here is that in an eventual C++ implementation of the API,
these will be a source of object ownership problems.
in cases like:
a = M1()
b = M1()
sub_message = a.add()
b.append(sub_message)
# who owns sub_message?

Adding them is a good idea, only when we have already solved the performance
problem (not before).


>
> By the way, I think something went wrong with your email - apparently it
> was sent to the group, but didn't show up there, so I just now found it in
> my inbox.  Weird.
>
> Cheers,
> Alek Storm
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-18 Thread Alek Storm
On Sat, Dec 13, 2008 at 5:09 PM, Petar Petrov wrote:

> On Mon, Dec 8, 2008 at 5:36 PM, Alek Storm  wrote:
>
>> Okay, then we just need to cache the size only during serialization.  The
>> children's sizes are calculated and stored, then added to the parent's
>> size.  Write the parent size, then write the parent, then the child size,
>> then the child, on down the tree.  Then it's O(n) (same as we have
>> currently) and no ownership problems, because we can drop the weak reference
>> from child to parent.  Would that work?
>
>
> It may work, but ByteSize is a part of the public interface of the message,
> so making it slower may not be a good idea.
> However the parent reference will still be needed.
>
> file.py:
> m3 = M3()
> m3.m2.m1.i = 3
> m3.HasField('m2') # should be True
>
> How does m3 know if m2 was set? This information is right now provided by
> the setter of 'i' in m1 (by calling TransitionToNonEmpty on the parent,
> which calls TransitionToNonEmpty on its parent and so on).
>

Oops, I wasn't clear.  Of course HasField should work for non-repeated
fields; I only meant to get rid of the weak reference when the message's
parent is a repeated composite field, because HasField isn't used for those,
so we don't need it if we cache the size only during serialization.  So we
get a bunch of benefits in exchange for making a rarely used part of the
interface slower, and only when used outside of the internal serialization
functions.  What do you think?


> So the parent references are still needed. Let's keep the slice assignment
> of repeated scalar fields and just remove the slice assignment of repeated
> composite fields (I still don't find it usefull). E.g. we can keep
> __getslice__, __delitem__ and __delslice__ for repeated composite fields.
>

Okay, I'll submit a patch with just those methods.  We can definitely agree
on that :).  The above discussion is separate.

By the way, I think something went wrong with your email - apparently it was
sent to the group, but didn't show up there, so I just now found it in my
inbox.  Weird.

Cheers,
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-13 Thread Petar Petrov
On Mon, Dec 8, 2008 at 5:36 PM, Alek Storm  wrote:

> On Mon, Dec 8, 2008 at 1:16 PM, Kenton Varda  wrote:
>
>> On Sat, Dec 6, 2008 at 1:03 AM, Alek Storm  wrote:
>>
>>> Is it really that useful to have ByteSize() cached for repeated fields?
>>> If it's not, we get everything I mentioned above for free.  I'm genuinely
>>> not sure - it only comes up when serializing the message in wire_format.py.
>>> What do you think?
>>>
>>
>> Yes, it's just as necessary as it is with optional fields.  The main
>> problem is that the size of a message must be written before the message
>> contents itself.  If, while serializing, you call ByteSize() to get this
>> size every time you write a message, then you'll end up computing the size
>> of deeply-nested messages many times (once for each outer message within
>> which they're nested).  Caching avoids that problem.
>>
>
> Okay, then we just need to cache the size only during serialization.  The
> children's sizes are calculated and stored, then added to the parent's
> size.  Write the parent size, then write the parent, then the child size,
> then the child, on down the tree.  Then it's O(n) (same as we have
> currently) and no ownership problems, because we can drop the weak reference
> from child to parent.  Would that work?


It may work, but ByteSize is a part of the public interface of the message,
so making it slower may not be a good idea.
However the parent reference will still be needed.

Example:
file.proto:

message M1 {
  optional int32 i = 1;
}

message M2 {
  optional M1 m1 = 1;
}

message M3 {
  optional M2 m2 = 1;
}

file.py:
m3 = M3()
m3.m2.m1.i = 3
m3.HasField('m2') # should be True

How does m3 know if m2 was set? This information is right now provided by
the setter of 'i' in m1 (by calling TransitionToNonEmpty on the parent,
which calls TransitionToNonEmpty on its parent and so on).
As opposed to the C++ API (where a call to mutable_m2 will mark m2 as set)
in the Python API a mutable and non-mutable calls are not so easy to
distinguish.
So in this case:
m3 = M3()
m3.m2.m1.HasField('i') # Should be False
m3.HasField('m2') # Should be False, even though we used m3.m2.*

So the parent references are still needed. Let's keep the slice assignment
of repeated scalar fields and just remove the slice assignment of repeated
composite fields (I still don't find it usefull). E.g. we can keep
__getslice__, __delitem__ and __delslice__ for repeated composite fields.


>
> Cheers,
> Alek Storm
>
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-08 Thread Alek Storm
On Mon, Dec 8, 2008 at 1:16 PM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> On Sat, Dec 6, 2008 at 1:03 AM, Alek Storm <[EMAIL PROTECTED]> wrote:
>
>> Is it really that useful to have ByteSize() cached for repeated fields?
>> If it's not, we get everything I mentioned above for free.  I'm genuinely
>> not sure - it only comes up when serializing the message in wire_format.py.
>> What do you think?
>>
>
> Yes, it's just as necessary as it is with optional fields.  The main
> problem is that the size of a message must be written before the message
> contents itself.  If, while serializing, you call ByteSize() to get this
> size every time you write a message, then you'll end up computing the size
> of deeply-nested messages many times (once for each outer message within
> which they're nested).  Caching avoids that problem.
>

Okay, then we just need to cache the size only during serialization.  The
children's sizes are calculated and stored, then added to the parent's
size.  Write the parent size, then write the parent, then the child size,
then the child, on down the tree.  Then it's O(n) (same as we have
currently) and no ownership problems, because we can drop the weak reference
from child to parent.  Would that work?

Cheers,
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-08 Thread Kenton Varda
On Sat, Dec 6, 2008 at 1:03 AM, Alek Storm <[EMAIL PROTECTED]> wrote:

> But it does give us a lot of cool functionality, like adding the same
> message to two parents, and (yes!) slicing support.  I thought this was
> common practice in C++, but it's been quite a while since I've coded it.
>

Nope, in the C++ world we have to worry excessively about ownership, and we
generally make defensive copies rather than trying to allow an object to be
referenced from two places.


> Is it really that useful to have ByteSize() cached for repeated fields?  If
> it's not, we get everything I mentioned above for free.  I'm genuinely not
> sure - it only comes up when serializing the message in wire_format.py.
> What do you think?
>

Yes, it's just as necessary as it is with optional fields.  The main problem
is that the size of a message must be written before the message contents
itself.  If, while serializing, you call ByteSize() to get this size every
time you write a message, then you'll end up computing the size of
deeply-nested messages many times (once for each outer message within which
they're nested).  Caching avoids that problem.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-06 Thread Alek Storm
On Sat, Dec 6, 2008 at 12:42 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> On Fri, Dec 5, 2008 at 10:59 PM, Alek Storm <[EMAIL PROTECTED]> wrote:
>
>> On Wed, Dec 3, 2008 at 5:32 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>>
>>>  Sorry, I think you misunderstood.  The C++ parsers generated by protoc
>>> (with optimize_for = SPEED) are an order of magnitude faster than the
>>> dynamic *C++* parser (used with optimize_for = CODE_SIZE and
>>> DynamicMessage).  The Python parser is considerably slower than either of
>>> them, but that's beside the point.  Your "decoupled" parser which produces a
>>> tag/value tree will be at least as slow as the existing C++ dynamic parser,
>>> probably slower (since it sounds like it would use some sort of dictionary
>>> structure rather than flat classes/structs).
>>>
>>
>> Oh, I forgot we have two C++ parsers.  The method I described uses the
>> generated (SPEED) parser, so it should be a great deal quicker.  It just
>> outputs a tree instead of a message, leaving the smart object creation to
>> Python.
>>
>
> No, the static (SPEED) parser parses to generated C++ objects.  It doesn't
> make sense to say that we'll use the static parser to parse to this abstract
> "tree" structure, because the whole point of the static parser is that it
> parses to concrete objects.  If it didn't, it wouldn't be so fast.  (In
> fact, the biggest bottleneck in protobuf parsing is memory bandwidth, and I
> can't see how your "tree" structure would be anywhere near as compact as a
> generated message class.)
>

Gah, you're right.  I was thinking of it the wrong way.  I still kinda like
it, but since apparently the abstraction required would negate the speed
increase, I guess it's time to drop it.


> Honestly, I think using reflection for something as basic as changing the
>> ouput format is hackish and could get ugly.
>>
>
> I think you're thinking of a different kind of reflection.  I'm talking
> about the google::protobuf::Reflection interface.  The whole point of this
> interface is to allow you to do things like write custom codecs for things
> like JSON or XML.  Take a look at text_format.cc for an example usage.
>

Ah.  I wasn't as familiar with the C++ version as I thought.  Still, I
thought it would be cool to have PB/XML/JSON/etc outputters operate at the
same level.

If a message Foo has a repeated field of type Bar, then the Bar objects in
> that field are owned by Foo.  When you delete Foo, all the Bars are
> deleted.  Leaving it up to the user to delete the Bar objects themselves is
> way too much of a burden.
>

But it does give us a lot of cool functionality, like adding the same
message to two parents, and (yes!) slicing support.  I thought this was
common practice in C++, but it's been quite a while since I've coded it.


> Is there anything wrong with having a list of parents?  I'm guessing I'm
>> being naive - would speed be affected too much by that?
>
>
> Way too complicated, probably a lot of overhead, and not very useful in
> practice.
>

Is it really that useful to have ByteSize() cached for repeated fields?  If
it's not, we get everything I mentioned above for free.  I'm genuinely not
sure - it only comes up when serializing the message in wire_format.py.
What do you think?

Cheers,
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-06 Thread Kenton Varda
On Fri, Dec 5, 2008 at 10:59 PM, Alek Storm <[EMAIL PROTECTED]> wrote:

> On Wed, Dec 3, 2008 at 5:32 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
>>  Sorry, I think you misunderstood.  The C++ parsers generated by protoc
>> (with optimize_for = SPEED) are an order of magnitude faster than the
>> dynamic *C++* parser (used with optimize_for = CODE_SIZE and
>> DynamicMessage).  The Python parser is considerably slower than either of
>> them, but that's beside the point.  Your "decoupled" parser which produces a
>> tag/value tree will be at least as slow as the existing C++ dynamic parser,
>> probably slower (since it sounds like it would use some sort of dictionary
>> structure rather than flat classes/structs).
>>
>
> Oh, I forgot we have two C++ parsers.  The method I described uses the
> generated (SPEED) parser, so it should be a great deal quicker.  It just
> outputs a tree instead of a message, leaving the smart object creation to
> Python.
>

No, the static (SPEED) parser parses to generated C++ objects.  It doesn't
make sense to say that we'll use the static parser to parse to this abstract
"tree" structure, because the whole point of the static parser is that it
parses to concrete objects.  If it didn't, it wouldn't be so fast.  (In
fact, the biggest bottleneck in protobuf parsing is memory bandwidth, and I
can't see how your "tree" structure would be anywhere near as compact as a
generated message class.)

But then, protobuf objects might as well be your "tree".  There's no reason
to define a separate "tree" structure when we already have a structure
explicitly designed for holding protocol buffer data.  Your "tree" probably
would not be any easier to access than an actual protocol message object.


> Honestly, I think using reflection for something as basic as changing the
> ouput format is hackish and could get ugly.
>

I think you're thinking of a different kind of reflection.  I'm talking
about the google::protobuf::Reflection interface.  The whole point of this
interface is to allow you to do things like write custom codecs for things
like JSON or XML.  Take a look at text_format.cc for an example usage.


> But just for the record, I'm pretty sure Python's list remove() method
> compares by value, and doesn't have a method that compares by identity.  So
> there would be no reason to include a compare-by-identity method in protobuf
> repeated fields.
>

OK, well, honestly, I think a remove-by-value makes even less sense.
Comparing large non-flat data structures by value is awkward and rarely
useful.  But this is exactly the kind of thing that we don't know if we
don't have a use case to examine.


> Okay, you place more value on "compact interface".  So are we keeping
> remove() for scalar values?  I think their interfaces should be consistent,
> but I don't think you think that's as important.


remove() makes sense for scalars.  There's no question of identity vs. value
comparison, and it's not awkward or unusual to compare two scalar values.


> Okay.  So let's say we have a pure-C++ parser with a Python wrapper.  This
> brings us back to getting slicing to work in C++ with no garbage collector.
> Kenton, could you elaborate on what you meant earlier by "ownership
> problems" specific to the C++ version?  I can't really see anything that
> would affect PB repeated fields that isn't taken care of by handing the user
> control over allocation and deallocation of the field elements.


If a message Foo has a repeated field of type Bar, then the Bar objects in
that field are owned by Foo.  When you delete Foo, all the Bars are
deleted.  Leaving it up to the user to delete the Bar objects themselves is
way too much of a burden.


> Is there anything wrong with having a list of parents?  I'm guessing I'm
> being naive - would speed be affected too much by that?


Way too complicated, probably a lot of overhead, and not very useful in
practice.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-05 Thread Alek Storm
On Wed, Dec 3, 2008 at 5:32 AM, Kenton Varda <[EMAIL PROTECTED]> wrote:

> Sorry, I think you misunderstood.  The C++ parsers generated by protoc
> (with optimize_for = SPEED) are an order of magnitude faster than the
> dynamic *C++* parser (used with optimize_for = CODE_SIZE and
> DynamicMessage).  The Python parser is considerably slower than either of
> them, but that's beside the point.  Your "decoupled" parser which produces a
> tag/value tree will be at least as slow as the existing C++ dynamic parser,
> probably slower (since it sounds like it would use some sort of dictionary
> structure rather than flat classes/structs).
>

Oh, I forgot we have two C++ parsers.  The method I described uses the
generated (SPEED) parser, so it should be a great deal quicker.  It just
outputs a tree instead of a message, leaving the smart object creation to
Python.

Run this backwards when serializing, and you get another advantage: you can
>> easily swap out the function that converts the tree into serialized protobuf
>> for one that outputs XML, JSON, etc.
>>
>
> You can already easily write encoders and decoders for alternative formats
> using reflection.
>

Honestly, I think using reflection for something as basic as changing the
ouput format is hackish and could get ugly.  Reflection should only be used
in certain circumstances, e.g., generating message objects, because it
exposes the internals.  There's a chance we could change how Protocol
Buffers works under the hood in a way that screws up an XML outputter, which
wouldn't happen if we just expose a clean interface.

 Let's include it - it gives us a more complete list interface, there's no
>> downside, and the users can decide whether they want to use it.  We can't
>> predict all possible use cases.
>>
>
> Ah, yes, the old "Why not?" argument.  :)  Actually, I far prefer the
> opposite argument:  If you aren't sure if someone will want a feature, don't
> include it.  There is always a down side to including a feature.  Even if
> people choose not to use it, it increases code size, maintenance burden,
> memory usage, and interface complexity.  Worse yet, if people do use it,
> then we're permanently stuck with it, whether we like it or not.  We can't
> change it later, even if we decide it's wrong.  For example, we may decide
> later -- based on an actual use case, perhaps -- that it would really have
> been better if remove() compared elements by content rather than by
> identity, so that you could remove a message from a repeated field by
> constructing an identical message and then calling remove().  But we
> wouldn't be able to change it.  We'd have to instead add a different method
> like removeByValue(), which would be ugly and add even more complexity.
>
> Protocol Buffers got where they are by stubbornly refusing the vast
> majority of feature suggestions.  :)
>

Ha, I thought you might say that.  It's a good philosophy, and I completely
understand where you're coming from.  So I concede that point, and it all
boils down to "complete interface" vs. "compact interface".

But just for the record, I'm pretty sure Python's list remove() method
compares by value, and doesn't have a method that compares by identity.  So
there would be no reason to include a compare-by-identity method in protobuf
repeated fields.

That said, you do have a good point that the interface should be similar to
> standard Python lists if possible.  But given the other problems that
> prevent this, it seems like a moot point.
>

Okay, you place more value on "compact interface".  So are we keeping
remove() for scalar values?  I think their interfaces should be consistent,
but I don't think you think that's as important.

On Wed, Dec 3, 2008 at 10:25 AM, Petar Petrov <[EMAIL PROTECTED]>wrote:

> It's not that simple. We would also like to improve performance at least in
> MergeFrom/CopyFrom/ParseASCII/IsInitialized.
>

Okay.  So let's say we have a pure-C++ parser with a Python wrapper.  This
brings us back to getting slicing to work in C++ with no garbage collector.
Kenton, could you elaborate on what you meant earlier by "ownership
problems" specific to the C++ version?  I can't really see anything that
would affect PB repeated fields that isn't taken care of by handing the user
control over allocation and deallocation of the field elements.

Currently each composite field has a reference to its parent. This makes it
> impossible to add the same composite to two different repeated composite
> fields. The .add() method guarantees that this never happens.
>

Is there anything wrong with having a list of parents?  I'm guessing I'm
being naive - would speed be affected too much by that?


> I think protobuf's repeated composite fields aren't and shouldn't be
> equivalent to python lists.
>

Okay, that's cleared up now.  Thanks.

Cheers,
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol

Re: Slicing support in Python

2008-12-05 Thread Alek Storm

Hi Kenton and Petar,

Sorry I haven't been able to reply for a few days; I've been so
swamped this week.  Hopefully I'll be able to conjure up an
intelligent reply tomorrow :)

Cheers,
Alek Storm
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Dave Bailey

Well, what I'm getting at is that many scripting languages may
guarantee (or provide a way to guarantee) thread safety of the
reference counting operations.  In Perl, if you want access to a
variable from multiple threads, you explicitly have to designate it as
"shared" so that the variable can be read and written safely.  Python
has a "global interpreter lock" that a Python/C++ wrapper would
presumably use.  I'm not sure about Ruby et al.

The performance of C++/PB in Perl and Python is obviously going to be
vastly better than that of a pure-Perl or pure-Python implementation,
even with this deep-copying going on, and I really doubt it would be a
bottleneck in any actual Perl or Python application (I mean, if you
really want the speed, those aren't the languages to use...).  But it
would still be nice to have a way to unlock the maximum speed.  Let me
know if you change your mind!

-dave

On Dec 3, 3:31 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> It still adds a lot of complication.  And I think most cases where people
> start out thinking thread-safety won't be an issue, particularly with
> reference counting, they later find out otherwise.
>
> On Wed, Dec 3, 2008 at 3:22 PM, Dave Bailey <[EMAIL PROTECTED]> wrote:
>
> > What if thread safety wasn't an issue?
>
> > -dave
>
> > On Dec 3, 2:41 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> > > Ehhh...  Reference counting is slow (assuming it needs to be
> > thread-safe),
> > > and I think even adding it as an option would add an excessive amount of
> > > complication to the system.
>
> > > On Wed, Dec 3, 2008 at 2:04 PM, Dave Bailey <[EMAIL PROTECTED]> wrote:
>
> > > > On Dec 3, 2:00 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> > > > > On Dec 2, 10:49 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
> > > > > > C++ compatibility matters because eventually we want to be able to
> > > > generate
> > > > > > Python code which just wraps C++ code for efficiency.  C++ isn't
> > > > garbage
> > > > > > collected, so append() can't easily be implemented in this case
> > without
> > > > > > having ownership problems.  Slice assignment has the same problem.
> > > > > > Also note that even pure-python protocol buffers have a sort of
> > > > "ownership"
> > > > > > issue:  Sub-messages all contain pointers back to their parents, so
> > > > that
> > > > > > when a sub-message is modified, the parent's cached size can be
> > marked
> > > > > > dirty.  (Also, singular sub-messages have to inform their parents
> > when
> > > > the
> > > > > > first field within them is set, but that doesn't apply here.)
>
> > > > (Here is my post without all of the ridiculous formatting):
>
> > > > While you're on this topic, I ran into this ownership issue while
> > > > implementing the Perl/XS wrapper around the generated C++ code.  I
> > > > think it is the same issue that would face the author of a Python or
> > > > Ruby C++ extension of the generated C++.  I ended up having to new() a
> > > > copy of every message that I transferred from C++ to Perl or vice
> > > > versa.  So, for example, a statement like
>
> > > > $team->member($i)->set_first_name('Dave');
>
> > > > won't have the same effect as (C++)
>
> > > > team.mutable_member(i)->set_first_name("Dave");
>
> > > > because $team->member($i) will generate a copy of the underlying C++
> > > > object, so that it can be managed by Perl's reference counting without
> > > > any concern as to whether or not the underlying C++ object has been
> > > > deleted because the containing message went out of scope.
>
> > > > Anyway, I thought it might be possible to allow for shared ownership
> > > > of a message object if there were a reference counted variant of
> > > > RepeatedPtrField (something like RepeatedSharedPtrField or
> > > > whatever), which would provide incref() and decref() methods such that
> > > > Perl and C++ could use the same underlying C++ objects in the
> > > > generated code.  This would really help the performance of the Perl/XS
> > > > code if all of that copy construction could be avoided somehow.  The C+
> > > > + code generator would need an option that would instruct it to
> > > > generate RepeatedSharedPtrField members (and incref and decref
> > > > calls, where appropriate) for repeated messages (instead of using the
> > > > default RepeatedPtrField).
>
> > > > What do you think?  Is something like this possible, even though it
> > > > would require a change to protobuf?  It is an issue for all {Python,
> > > > Perl, Ruby, ...}/C++ extension wrappers for Protocol Buffers.  I have
> > > > found that protobuf is a faster Perl data serialization mechanism that
> > > > the (generic) Storable module, but I think it can be even faster.
>
> > > > -dave
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PR

Re: Slicing support in Python

2008-12-03 Thread Kenton Varda
It still adds a lot of complication.  And I think most cases where people
start out thinking thread-safety won't be an issue, particularly with
reference counting, they later find out otherwise.

On Wed, Dec 3, 2008 at 3:22 PM, Dave Bailey <[EMAIL PROTECTED]> wrote:

>
> What if thread safety wasn't an issue?
>
> -dave
>
> On Dec 3, 2:41 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> > Ehhh...  Reference counting is slow (assuming it needs to be
> thread-safe),
> > and I think even adding it as an option would add an excessive amount of
> > complication to the system.
> >
> > On Wed, Dec 3, 2008 at 2:04 PM, Dave Bailey <[EMAIL PROTECTED]> wrote:
> >
> > > On Dec 3, 2:00 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> > > > On Dec 2, 10:49 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> >
> > > > > C++ compatibility matters because eventually we want to be able to
> > > generate
> > > > > Python code which just wraps C++ code for efficiency.  C++ isn't
> > > garbage
> > > > > collected, so append() can't easily be implemented in this case
> without
> > > > > having ownership problems.  Slice assignment has the same problem.
> > > > > Also note that even pure-python protocol buffers have a sort of
> > > "ownership"
> > > > > issue:  Sub-messages all contain pointers back to their parents, so
> > > that
> > > > > when a sub-message is modified, the parent's cached size can be
> marked
> > > > > dirty.  (Also, singular sub-messages have to inform their parents
> when
> > > the
> > > > > first field within them is set, but that doesn't apply here.)
> >
> > > (Here is my post without all of the ridiculous formatting):
> >
> > > While you're on this topic, I ran into this ownership issue while
> > > implementing the Perl/XS wrapper around the generated C++ code.  I
> > > think it is the same issue that would face the author of a Python or
> > > Ruby C++ extension of the generated C++.  I ended up having to new() a
> > > copy of every message that I transferred from C++ to Perl or vice
> > > versa.  So, for example, a statement like
> >
> > > $team->member($i)->set_first_name('Dave');
> >
> > > won't have the same effect as (C++)
> >
> > > team.mutable_member(i)->set_first_name("Dave");
> >
> > > because $team->member($i) will generate a copy of the underlying C++
> > > object, so that it can be managed by Perl's reference counting without
> > > any concern as to whether or not the underlying C++ object has been
> > > deleted because the containing message went out of scope.
> >
> > > Anyway, I thought it might be possible to allow for shared ownership
> > > of a message object if there were a reference counted variant of
> > > RepeatedPtrField (something like RepeatedSharedPtrField or
> > > whatever), which would provide incref() and decref() methods such that
> > > Perl and C++ could use the same underlying C++ objects in the
> > > generated code.  This would really help the performance of the Perl/XS
> > > code if all of that copy construction could be avoided somehow.  The C+
> > > + code generator would need an option that would instruct it to
> > > generate RepeatedSharedPtrField members (and incref and decref
> > > calls, where appropriate) for repeated messages (instead of using the
> > > default RepeatedPtrField).
> >
> > > What do you think?  Is something like this possible, even though it
> > > would require a change to protobuf?  It is an issue for all {Python,
> > > Perl, Ruby, ...}/C++ extension wrappers for Protocol Buffers.  I have
> > > found that protobuf is a faster Perl data serialization mechanism that
> > > the (generic) Storable module, but I think it can be even faster.
> >
> > > -dave
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Dave Bailey

What if thread safety wasn't an issue?

-dave

On Dec 3, 2:41 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> Ehhh...  Reference counting is slow (assuming it needs to be thread-safe),
> and I think even adding it as an option would add an excessive amount of
> complication to the system.
>
> On Wed, Dec 3, 2008 at 2:04 PM, Dave Bailey <[EMAIL PROTECTED]> wrote:
>
> > On Dec 3, 2:00 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> > > On Dec 2, 10:49 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
> > > > C++ compatibility matters because eventually we want to be able to
> > generate
> > > > Python code which just wraps C++ code for efficiency.  C++ isn't
> > garbage
> > > > collected, so append() can't easily be implemented in this case without
> > > > having ownership problems.  Slice assignment has the same problem.
> > > > Also note that even pure-python protocol buffers have a sort of
> > "ownership"
> > > > issue:  Sub-messages all contain pointers back to their parents, so
> > that
> > > > when a sub-message is modified, the parent's cached size can be marked
> > > > dirty.  (Also, singular sub-messages have to inform their parents when
> > the
> > > > first field within them is set, but that doesn't apply here.)
>
> > (Here is my post without all of the ridiculous formatting):
>
> > While you're on this topic, I ran into this ownership issue while
> > implementing the Perl/XS wrapper around the generated C++ code.  I
> > think it is the same issue that would face the author of a Python or
> > Ruby C++ extension of the generated C++.  I ended up having to new() a
> > copy of every message that I transferred from C++ to Perl or vice
> > versa.  So, for example, a statement like
>
> > $team->member($i)->set_first_name('Dave');
>
> > won't have the same effect as (C++)
>
> > team.mutable_member(i)->set_first_name("Dave");
>
> > because $team->member($i) will generate a copy of the underlying C++
> > object, so that it can be managed by Perl's reference counting without
> > any concern as to whether or not the underlying C++ object has been
> > deleted because the containing message went out of scope.
>
> > Anyway, I thought it might be possible to allow for shared ownership
> > of a message object if there were a reference counted variant of
> > RepeatedPtrField (something like RepeatedSharedPtrField or
> > whatever), which would provide incref() and decref() methods such that
> > Perl and C++ could use the same underlying C++ objects in the
> > generated code.  This would really help the performance of the Perl/XS
> > code if all of that copy construction could be avoided somehow.  The C+
> > + code generator would need an option that would instruct it to
> > generate RepeatedSharedPtrField members (and incref and decref
> > calls, where appropriate) for repeated messages (instead of using the
> > default RepeatedPtrField).
>
> > What do you think?  Is something like this possible, even though it
> > would require a change to protobuf?  It is an issue for all {Python,
> > Perl, Ruby, ...}/C++ extension wrappers for Protocol Buffers.  I have
> > found that protobuf is a faster Perl data serialization mechanism that
> > the (generic) Storable module, but I think it can be even faster.
>
> > -dave
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Kenton Varda
Ehhh...  Reference counting is slow (assuming it needs to be thread-safe),
and I think even adding it as an option would add an excessive amount of
complication to the system.

On Wed, Dec 3, 2008 at 2:04 PM, Dave Bailey <[EMAIL PROTECTED]> wrote:

>
> On Dec 3, 2:00 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> > On Dec 2, 10:49 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> >
> > > C++ compatibility matters because eventually we want to be able to
> generate
> > > Python code which just wraps C++ code for efficiency.  C++ isn't
> garbage
> > > collected, so append() can't easily be implemented in this case without
> > > having ownership problems.  Slice assignment has the same problem.
> > > Also note that even pure-python protocol buffers have a sort of
> "ownership"
> > > issue:  Sub-messages all contain pointers back to their parents, so
> that
> > > when a sub-message is modified, the parent's cached size can be marked
> > > dirty.  (Also, singular sub-messages have to inform their parents when
> the
> > > first field within them is set, but that doesn't apply here.)
> >
>
> (Here is my post without all of the ridiculous formatting):
>
> While you're on this topic, I ran into this ownership issue while
> implementing the Perl/XS wrapper around the generated C++ code.  I
> think it is the same issue that would face the author of a Python or
> Ruby C++ extension of the generated C++.  I ended up having to new() a
> copy of every message that I transferred from C++ to Perl or vice
> versa.  So, for example, a statement like
>
> $team->member($i)->set_first_name('Dave');
>
> won't have the same effect as (C++)
>
> team.mutable_member(i)->set_first_name("Dave");
>
> because $team->member($i) will generate a copy of the underlying C++
> object, so that it can be managed by Perl's reference counting without
> any concern as to whether or not the underlying C++ object has been
> deleted because the containing message went out of scope.
>
> Anyway, I thought it might be possible to allow for shared ownership
> of a message object if there were a reference counted variant of
> RepeatedPtrField (something like RepeatedSharedPtrField or
> whatever), which would provide incref() and decref() methods such that
> Perl and C++ could use the same underlying C++ objects in the
> generated code.  This would really help the performance of the Perl/XS
> code if all of that copy construction could be avoided somehow.  The C+
> + code generator would need an option that would instruct it to
> generate RepeatedSharedPtrField members (and incref and decref
> calls, where appropriate) for repeated messages (instead of using the
> default RepeatedPtrField).
>
> What do you think?  Is something like this possible, even though it
> would require a change to protobuf?  It is an issue for all {Python,
> Perl, Ruby, ...}/C++ extension wrappers for Protocol Buffers.  I have
> found that protobuf is a faster Perl data serialization mechanism that
> the (generic) Storable module, but I think it can be even faster.
>
> -dave
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Dave Bailey

On Dec 3, 2:00 pm, Dave Bailey <[EMAIL PROTECTED]> wrote:
> On Dec 2, 10:49 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
> > C++ compatibility matters because eventually we want to be able to generate
> > Python code which just wraps C++ code for efficiency.  C++ isn't garbage
> > collected, so append() can't easily be implemented in this case without
> > having ownership problems.  Slice assignment has the same problem.
> > Also note that even pure-python protocol buffers have a sort of "ownership"
> > issue:  Sub-messages all contain pointers back to their parents, so that
> > when a sub-message is modified, the parent's cached size can be marked
> > dirty.  (Also, singular sub-messages have to inform their parents when the
> > first field within them is set, but that doesn't apply here.)
>

(Here is my post without all of the ridiculous formatting):

While you're on this topic, I ran into this ownership issue while
implementing the Perl/XS wrapper around the generated C++ code.  I
think it is the same issue that would face the author of a Python or
Ruby C++ extension of the generated C++.  I ended up having to new() a
copy of every message that I transferred from C++ to Perl or vice
versa.  So, for example, a statement like

$team->member($i)->set_first_name('Dave');

won't have the same effect as (C++)

team.mutable_member(i)->set_first_name("Dave");

because $team->member($i) will generate a copy of the underlying C++
object, so that it can be managed by Perl's reference counting without
any concern as to whether or not the underlying C++ object has been
deleted because the containing message went out of scope.

Anyway, I thought it might be possible to allow for shared ownership
of a message object if there were a reference counted variant of
RepeatedPtrField (something like RepeatedSharedPtrField or
whatever), which would provide incref() and decref() methods such that
Perl and C++ could use the same underlying C++ objects in the
generated code.  This would really help the performance of the Perl/XS
code if all of that copy construction could be avoided somehow.  The C+
+ code generator would need an option that would instruct it to
generate RepeatedSharedPtrField members (and incref and decref
calls, where appropriate) for repeated messages (instead of using the
default RepeatedPtrField).

What do you think?  Is something like this possible, even though it
would require a change to protobuf?  It is an issue for all {Python,
Perl, Ruby, ...}/C++ extension wrappers for Protocol Buffers.  I have
found that protobuf is a faster Perl data serialization mechanism that
the (generic) Storable module, but I think it can be even faster.

-dave

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Dave Bailey

On Dec 2, 10:49 pm, Kenton Varda <[EMAIL PROTECTED]> wrote:
> C++ compatibility matters because eventually we want to be able to generate
> Python code which just wraps C++ code for efficiency.  C++ isn't garbage
> collected, so append() can't easily be implemented in this case without
> having ownership problems.  Slice assignment has the same problem.
> Also note that even pure-python protocol buffers have a sort of "ownership"
> issue:  Sub-messages all contain pointers back to their parents, so that
> when a sub-message is modified, the parent's cached size can be marked
> dirty.  (Also, singular sub-messages have to inform their parents when the
> first field within them is set, but that doesn't apply here.)

While you're on this topic, I ran into this ownership issue while
implementing
the Perl/XS wrapper around the generated C++ code.  I think it is the
same
issue that would face the author of a Python or Ruby C++ extension of
the
generated C++.  I ended up having to new() a copy of every message
that I
transferred from C++ to Perl or vice versa.  So, for example, a
statement like

$team->member($i)->set_first_name('Dave');

won't have the same effect as (C++)

team.mutable_member(i)->set_first_name("Dave");

because $team->member($i) will generate a copy of the underlying C++
object, so that it can be managed by Perl's reference counting without
any
concern as to whether or not the underlying C++ object has been
deleted
because the containing message went out of scope.

Anyway, I thought it might be possible to allow for shared ownership
of a
message object if there were a reference counted variant of
RepeatedPtrField
(something like RepeatedSharedPtrField or whatever), which would
provide
incref() and decref() methods such that Perl and C++ could use the
same
underlying C++ objects in the generated code.  This would really help
the
performance of the Perl/XS code if all of that copy construction could
be
avoided somehow.  The C++ code generator would need an option that
would
instruct it to generate RepeatedSharedPtrField members (and incref
and
decref calls, where appropriate) for repeated messages (instead of
using the
default RepeatedPtrField).

What do you think?  Is something like this possible, even though it
would
require a change to protobuf?  It is an issue for all {Python, Perl,
Ruby, ...}/C++
extension wrappers for Protocol Buffers.

-dave


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Petar Petrov
On Wed, Dec 3, 2008 at 4:30 AM, Alek Storm <[EMAIL PROTECTED]> wrote:

> (Okay, back on track)
>
> On Tue, Dec 2, 2008 at 11:17 PM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
>> On Tue, Dec 2, 2008 at 11:08 PM, Alek Storm <[EMAIL PROTECTED]> wrote:
>>
>>> I would think encoding and decoding would be the main bottlenecks, so
>>> can't those be wrappers around C++, while let object handling (reflection.py
>>> and friends) be pure-python?  It seems like the best of both worlds.
>>>
>>
>>>
>> Well, the generated serializing and parsing code in C++ is an order of
>> magnitude faster than the dynamic (reflection-based) code.  But to use
>> generated code you need to be using C++ object handling.
>>
>
> Not if you decouple them.  Abstractly, the C++ parser receives a serialized
> message and descriptor and returns a tree of the form [(tag_num, value)]
> where tag_num is an integer and value is either a scalar or a subtree (for
> submessages).  The Python reflection code takes the tree and fills the
> message object with its values.  It's simple, fast, and the C++ parser can
> be easily swapped out for a pure-Python one on systems that don't support
> the C++ version.
>
> Run this backwards when serializing, and you get another advantage: you can
> easily swap out the function that converts the tree into serialized protobuf
> for one that outputs XML, JSON, etc.
>

It's not that simple. We would also like to improve performance at least in
MergeFrom/CopyFrom/ParseASCII/IsInitialized.


>
>
>> You're right.  If it's a waste of time for them, most people won't use
>>> it.  But if there's no point to it, why do normal Python lists have it?
>>> It's useful enough to be included there.  And since repeated fields act just
>>> like lists, it should be included here too.
>>
>>
>> I think Python object lists are probably used in a much wider variety of
>> ways than protocol buffer repeated fields generally are.
>>
>
> Let's include it - it gives us a more complete list interface, there's no
> downside, and the users can decide whether they want to use it.  We can't
> predict all possible use cases.
>

The thing is, when they start to use it, you can't remove it later if it
turns to be a problem ...


>
>  In fact, it doesn't even have to be useful for repeated composites.  The
>>> fact that repeated scalars have it means it's automatically included for
>>> repeated composites, because they should have the exact same interface.
>>> Polymorphism is what we want here.
>>
>>
>> But they already can't have the same interface because append() doesn't
>> work.  :)
>>
>
> We don't have confirmation on that yet ;).  Having the same interface is
> what we should be shooting for.
>

Currently each composite field has a reference to its parent. This makes it
impossible to add the same composite to two different repeated composite
fields. The .add() method guarantees that this never happens.

Take a look at this example:

.proto:
message M1 {
  optional int32 i = 1;
}

message M2 {
  repeated M1 m1 = 1;
}

message M3 {
  repeated M1 m1 = 1;
}

usage:
m2 = M2()
m3 = M3()
m1 = M1()
m1.i = 1

m2.m1.append(m1)
m3.m1.append(m1)
print m2.ByteSize() # Correct
print m3.ByteSize() # Correct

m1.i =  # This should mark m2.ByteSize() and m3.ByteSize() dirty.
print m2.ByteSize() # Incorrect - because m1 references its new parent m3,
and when m1 it gets updated, it only notifies m3.
print m3.ByteSize() # Correct

I think protobuf's repeated composite fields aren't and shouldn't be
equivalent to python lists.


>
> Thanks,
> Alek Storm
>
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-03 Thread Kenton Varda
On Wed, Dec 3, 2008 at 12:02 AM, Alek Storm <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 2, 2008 at 11:17 PM, Kenton Varda <[EMAIL PROTECTED]> wrote:
>
>> Well, the generated serializing and parsing code in C++ is an order of
>> magnitude faster than the dynamic (reflection-based) code.  But to use
>> generated code you need to be using C++ object handling.
>>
>
> Not if you decouple them.  Abstractly, the C++ parser receives a serialized
> message and descriptor and returns a tree of the form [(tag_num, value)]
> where tag_num is an integer and value is either a scalar or a subtree (for
> submessages).  The Python reflection code takes the tree and fills the
> message object with its values.  It's simple, fast, and the C++ parser can
> be easily swapped out for a pure-Python one on systems that don't support
> the C++ version.
>

Sorry, I think you misunderstood.  The C++ parsers generated by protoc (with
optimize_for = SPEED) are an order of magnitude faster than the dynamic
*C++* parser (used with optimize_for = CODE_SIZE and DynamicMessage).  The
Python parser is considerably slower than either of them, but that's beside
the point.  Your "decoupled" parser which produces a tag/value tree will be
at least as slow as the existing C++ dynamic parser, probably slower (since
it sounds like it would use some sort of dictionary structure rather than
flat classes/structs).


> Run this backwards when serializing, and you get another advantage: you can
> easily swap out the function that converts the tree into serialized protobuf
> for one that outputs XML, JSON, etc.
>

You can already easily write encoders and decoders for alternative formats
using reflection.


>
>
>
>> You're right.  If it's a waste of time for them, most people won't use
>>> it.  But if there's no point to it, why do normal Python lists have it?
>>> It's useful enough to be included there.  And since repeated fields act just
>>> like lists, it should be included here too.
>>
>>
>> I think Python object lists are probably used in a much wider variety of
>> ways than protocol buffer repeated fields generally are.
>>
>
> Let's include it - it gives us a more complete list interface, there's no
> downside, and the users can decide whether they want to use it.  We can't
> predict all possible use cases.
>

Ah, yes, the old "Why not?" argument.  :)  Actually, I far prefer the
opposite argument:  If you aren't sure if someone will want a feature, don't
include it.  There is always a down side to including a feature.  Even if
people choose not to use it, it increases code size, maintenance burden,
memory usage, and interface complexity.  Worse yet, if people do use it,
then we're permanently stuck with it, whether we like it or not.  We can't
change it later, even if we decide it's wrong.  For example, we may decide
later -- based on an actual use case, perhaps -- that it would really have
been better if remove() compared elements by content rather than by
identity, so that you could remove a message from a repeated field by
constructing an identical message and then calling remove().  But we
wouldn't be able to change it.  We'd have to instead add a different method
like removeByValue(), which would be ugly and add even more complexity.

Protocol Buffers got where they are by stubbornly refusing the vast majority
of feature suggestions.  :)

That said, you do have a good point that the interface should be similar to
standard Python lists if possible.  But given the other problems that
prevent this, it seems like a moot point.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Kenton Varda
C++ compatibility matters because eventually we want to be able to generate
Python code which just wraps C++ code for efficiency.  C++ isn't garbage
collected, so append() can't easily be implemented in this case without
having ownership problems.  Slice assignment has the same problem.
Also note that even pure-python protocol buffers have a sort of "ownership"
issue:  Sub-messages all contain pointers back to their parents, so that
when a sub-message is modified, the parent's cached size can be marked
dirty.  (Also, singular sub-messages have to inform their parents when the
first field within them is set, but that doesn't apply here.)

As for remove(), it seems awkward because you have to identify the message
to be removed rather than the index to be removed.  If you know what message
you want removed, you probably already know what its index is.  Therefore,
removing by identity rather than index is a waste of time, because you're
forcing a linear scan through the list to find the element to be removed.
 (Though removing by index is may also be linear-time depending on how lists
are implemented in Python; I don't really know.)

On Tue, Dec 2, 2008 at 10:34 PM, Alek Storm <[EMAIL PROTECTED]> wrote:

> On Tue, Dec 2, 2008 at 10:04 PM, Petar Petrov <[EMAIL PROTECTED]>wrote:
>
>> How useful is a remove method on a repeated composite field? None has
>> asked about such. Do you have any use cases which require it? What are they?
>>
>> If you want to remove a value from a repeated composite field, you
>> basically have to create the same composite value and pass it to .remove().
>> When can this be useful?
>>
>> I think removal of composite values doesn't make much sense. Maybe even
>> slicing ( __getslice__ could be usefull, but __setslice__ very unlikely).
>>
>
> Just as useful as the remove() method for lists of objects in Python.  I
> think repeated scalar and composite fields should act exactly like lists,
> because that's intuitively what they are.  There's no technical downside to
> including the remove() method, and it brings repeated composites more in
> line with repeated scalars.
>
> Also, you don't have to recreate the composite in order to pass it to
> .remove() - you could easily have another reference to the object stored
> somewhere.  And I don't know why nobody would want to assign to lists of
> objects - this kind of thing is very common.
>
> In fact, this whole patch was pretty easy.  There must be some reason it
>>> hasn't been done before.
>>
>>
>> The problem is not in adding slicing support. The problem is that if there
>> is a Python/C API at some point, it will be very hard to translate the
>> slicing support to the C++ API. However repeated scalar slicing is
>> translatable (using RepeatedField's Set/RemoveLast/Add methods) but I don't
>> think slicing of repeated composite fields is possible with the C++ API.
>>
>> I think we should only keep the repeated scalar slicing and maybe only
>> __getslice__ and __delslice__ for repeated composites.
>>
>
> Why should it matter if there's no slicing in the C++ API?  Each
> implementation of Protocol Buffers should feel natural for each language; it
> shouldn't be held back just because some other language doesn't support the
> same features.  The builder pattern is completely normal in Java, but the
> C++ version uses an entirely different strategy.  Slicing is completely
> normal in Python.
>
> The repeated scalar and composite interfaces should be exactly the same.
> Everything is Python is polymorphic - it feels wrong if you can't do
> something with a list simply based on the types it contains.  This is why I
> included a comment in the add() method saying it should be deprecated - the
> functionaly is superseded by append(), and there's no corresponding method
> in the repeated scalar class.
>
> Cheers,
>
> Alek Storm
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Alek Storm
On Tue, Dec 2, 2008 at 10:04 PM, Petar Petrov <[EMAIL PROTECTED]>wrote:

> How useful is a remove method on a repeated composite field? None has asked
> about such. Do you have any use cases which require it? What are they?
>
> If you want to remove a value from a repeated composite field, you
> basically have to create the same composite value and pass it to .remove().
> When can this be useful?
>
> I think removal of composite values doesn't make much sense. Maybe even
> slicing ( __getslice__ could be usefull, but __setslice__ very unlikely).
>

Just as useful as the remove() method for lists of objects in Python.  I
think repeated scalar and composite fields should act exactly like lists,
because that's intuitively what they are.  There's no technical downside to
including the remove() method, and it brings repeated composites more in
line with repeated scalars.

Also, you don't have to recreate the composite in order to pass it to
.remove() - you could easily have another reference to the object stored
somewhere.  And I don't know why nobody would want to assign to lists of
objects - this kind of thing is very common.

In fact, this whole patch was pretty easy.  There must be some reason it
>> hasn't been done before.
>
>
> The problem is not in adding slicing support. The problem is that if there
> is a Python/C API at some point, it will be very hard to translate the
> slicing support to the C++ API. However repeated scalar slicing is
> translatable (using RepeatedField's Set/RemoveLast/Add methods) but I don't
> think slicing of repeated composite fields is possible with the C++ API.
>
> I think we should only keep the repeated scalar slicing and maybe only
> __getslice__ and __delslice__ for repeated composites.
>

Why should it matter if there's no slicing in the C++ API?  Each
implementation of Protocol Buffers should feel natural for each language; it
shouldn't be held back just because some other language doesn't support the
same features.  The builder pattern is completely normal in Java, but the
C++ version uses an entirely different strategy.  Slicing is completely
normal in Python.

The repeated scalar and composite interfaces should be exactly the same.
Everything is Python is polymorphic - it feels wrong if you can't do
something with a list simply based on the types it contains.  This is why I
included a comment in the add() method saying it should be deprecated - the
functionaly is superseded by append(), and there's no corresponding method
in the repeated scalar class.

Cheers,
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Petar Petrov
On Tue, Dec 2, 2008 at 1:22 AM, Alek Storm <[EMAIL PROTECTED]> wrote:

> This patch (attached, works with r81) adds slicing support for repeated
> fields in Python.  It implements retrieval, setting, and deletion for both
> scalar and composite repeated fields, and adds several unit tests for each
> method.
>
> There was already a remove() method for repeated scalars, but not for
> composites.  I added one without any difficulty.  Did I miss something?


How useful is a remove method on a repeated composite field? None has asked
about such. Do you have any use cases which require it? What are they?

If you want to remove a value from a repeated composite field, you basically
have to create the same composite value and pass it to .remove(). When can
this be useful?

I think removal of composite values doesn't make much sense. Maybe even
slicing ( __getslice__ could be usefull, but __setslice__ very unlikely).

In fact, this whole patch was pretty easy.  There must be some reason it
> hasn't been done before.


The problem is not in adding slicing support. The problem is that if there
is a Python/C API at some point, it will be very hard to translate the
slicing support to the C++ API. However repeated scalar slicing is
translatable (using RepeatedField's Set/RemoveLast/Add methods) but I don't
think slicing of repeated composite fields is possible with the C++ API.

I think we should only keep the repeated scalar slicing and maybe only
__getslice__ and __delslice__ for repeated composites.


>
> The __eq__() method for repeated composites disallows comparing to any
> other sequence type, including ordinary lists.  Why can't we do this?
>

>
> Adding more methods to the _RepeatedScalarFieldContainer and
> _RepeatedCompositeFieldContainer classes makes it even more painfully
> obvious that they need to be merged.  I'm sure we can make it polymorphic,
> and that'll probably be my next patch.
>
> The docs aren't on the wiki, so I can't add anything about this.  Are there
> plans to move it to the wiki, by any chance?
>
> Thanks,
> Alek Storm
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Alek Storm

You're talking about scalar fields, right?  Because nested message
fields can be serialized individually, just like their parents.  I
think it's a cool idea, and it doesn't look that hard to implement,
but I'm not sure how useful it would be - if you're streaming a
message, you just need (tag, value) pairs, which is what a full
message is anyway, so there's no extra overhead.  However, streaming
messages definitely needs to be implemented.

Cheers,
Alek Storm
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Kenton Varda
Hi Shane,
Your message seems unrelated to this thread.  Did you mean to start a new
thread?

On Tue, Dec 2, 2008 at 1:45 PM, Shane Green <[EMAIL PROTECTED]>wrote:

> If I understand correctly, serializing a protocol-buffer message creates
> a byte-string that describes the fields of the serialized message.  The
> field descriptions include their type identifiers, field numbers, and
> current values.
>
> An instance of the type of message that was serialized can then
> configure itself to be equal to the serialized message by parsing the
> byte-string representation.  By telling a message instance to parse its
> serialized representation, one is basically setting the instance's
> "value."
>
> It would seem reasonable that this pattern be maintained down to the
> level of individual fields.  Meaning that message field instances could
> serialize themselves to byte-strings and initialize themselves from
> bytes-strings.
>
> I can see multiple values to such a capability.  It would be possible,
> for example, to partially initialize message instances by initializing
> specific fields within the instance, which may be very useful for doing
> things like streaming messages.  Also, protocol-buffer fields could be
> used outside the context of protocol-buffer messages, which may or may
> not be valuable.  Does this capability already exist?
>
>
>
> Thanks,
> Shane
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, 2008-12-02 at 12:21 -0800, Kenton Varda wrote:
> > Cool!  Can you send this to me and Petar (cc'd) via
> > codereview.appspot.com?
> >
> >
> > On Tue, Dec 2, 2008 at 1:22 AM, Alek Storm <[EMAIL PROTECTED]>
> > wrote:
> > There was already a remove() method for repeated scalars, but
> > not for composites.  I added one without any difficulty.  Did
> > I miss something?  In fact, this whole patch was pretty easy.
> > There must be some reason it hasn't been done before.
> >
> >
> > It could easily be that no one had gotten around to it.  To be
> > perfectly honest, the Python code doesn't get very much attention
> > here.  :/
> >
> > The docs aren't on the wiki, so I can't add anything about
> > this.  Are there plans to move it to the wiki, by any chance?
> >
> >
> > Unfortunately that would require a ton of work (to convert to wiki
> > markup) and the resulting docs would not be as nice.  If you'd like to
> > send me a change to the HTML, though, I could put it up.
> >
> > > >
>
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Shane Green

I suppose the behaviour could be emulated by defining a message type for
each field type, and then using message-type fields only.  But having
the capability on the fields themselves would be nice, if it doesn't
exist already.

On Tue, 2008-12-02 at 14:45 -0700, Shane Green wrote:
> If I understand correctly, serializing a protocol-buffer message creates
> a byte-string that describes the fields of the serialized message.  The
> field descriptions include their type identifiers, field numbers, and
> current values.
> 
> An instance of the type of message that was serialized can then
> configure itself to be equal to the serialized message by parsing the
> byte-string representation.  By telling a message instance to parse its
> serialized representation, one is basically setting the instance's
> "value."
> 
> It would seem reasonable that this pattern be maintained down to the
> level of individual fields.  Meaning that message field instances could
> serialize themselves to byte-strings and initialize themselves from
> bytes-strings.
> 
> I can see multiple values to such a capability.  It would be possible,
> for example, to partially initialize message instances by initializing
> specific fields within the instance, which may be very useful for doing
> things like streaming messages.  Also, protocol-buffer fields could be
> used outside the context of protocol-buffer messages, which may or may
> not be valuable.  Does this capability already exist?
> 
> 
> 
> Thanks, 
> Shane
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Tue, 2008-12-02 at 12:21 -0800, Kenton Varda wrote:
> > Cool!  Can you send this to me and Petar (cc'd) via
> > codereview.appspot.com?
> > 
> > 
> > On Tue, Dec 2, 2008 at 1:22 AM, Alek Storm <[EMAIL PROTECTED]>
> > wrote:
> > There was already a remove() method for repeated scalars, but
> > not for composites.  I added one without any difficulty.  Did
> > I miss something?  In fact, this whole patch was pretty easy.
> > There must be some reason it hasn't been done before.
> > 
> > 
> > It could easily be that no one had gotten around to it.  To be
> > perfectly honest, the Python code doesn't get very much attention
> > here.  :/
> >  
> > The docs aren't on the wiki, so I can't add anything about
> > this.  Are there plans to move it to the wiki, by any chance?
> > 
> > 
> > Unfortunately that would require a ton of work (to convert to wiki
> > markup) and the resulting docs would not be as nice.  If you'd like to
> > send me a change to the HTML, though, I could put it up.
> > 
> > > > 
> 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Shane Green

If I understand correctly, serializing a protocol-buffer message creates
a byte-string that describes the fields of the serialized message.  The
field descriptions include their type identifiers, field numbers, and
current values.

An instance of the type of message that was serialized can then
configure itself to be equal to the serialized message by parsing the
byte-string representation.  By telling a message instance to parse its
serialized representation, one is basically setting the instance's
"value."

It would seem reasonable that this pattern be maintained down to the
level of individual fields.  Meaning that message field instances could
serialize themselves to byte-strings and initialize themselves from
bytes-strings.

I can see multiple values to such a capability.  It would be possible,
for example, to partially initialize message instances by initializing
specific fields within the instance, which may be very useful for doing
things like streaming messages.  Also, protocol-buffer fields could be
used outside the context of protocol-buffer messages, which may or may
not be valuable.  Does this capability already exist?



Thanks, 
Shane













On Tue, 2008-12-02 at 12:21 -0800, Kenton Varda wrote:
> Cool!  Can you send this to me and Petar (cc'd) via
> codereview.appspot.com?
> 
> 
> On Tue, Dec 2, 2008 at 1:22 AM, Alek Storm <[EMAIL PROTECTED]>
> wrote:
> There was already a remove() method for repeated scalars, but
> not for composites.  I added one without any difficulty.  Did
> I miss something?  In fact, this whole patch was pretty easy.
> There must be some reason it hasn't been done before.
> 
> 
> It could easily be that no one had gotten around to it.  To be
> perfectly honest, the Python code doesn't get very much attention
> here.  :/
>  
> The docs aren't on the wiki, so I can't add anything about
> this.  Are there plans to move it to the wiki, by any chance?
> 
> 
> Unfortunately that would require a ton of work (to convert to wiki
> markup) and the resulting docs would not be as nice.  If you'd like to
> send me a change to the HTML, though, I could put it up.
> 
> > 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Slicing support in Python

2008-12-02 Thread Kenton Varda
Cool!  Can you send this to me and Petar (cc'd) via codereview.appspot.com?

On Tue, Dec 2, 2008 at 1:22 AM, Alek Storm <[EMAIL PROTECTED]> wrote:

> There was already a remove() method for repeated scalars, but not for
> composites.  I added one without any difficulty.  Did I miss something?  In
> fact, this whole patch was pretty easy.  There must be some reason it hasn't
> been done before.


It could easily be that no one had gotten around to it.  To be perfectly
honest, the Python code doesn't get very much attention here.  :/


> The docs aren't on the wiki, so I can't add anything about this.  Are there
> plans to move it to the wiki, by any chance?


Unfortunately that would require a ton of work (to convert to wiki markup)
and the resulting docs would not be as nice.  If you'd like to send me a
change to the HTML, though, I could put it up.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---