On Mon, Dec 8, 2008 at 5:36 PM, Alek Storm <[email protected]> wrote:

> On Mon, Dec 8, 2008 at 1:16 PM, Kenton Varda <[email protected]> wrote:
>
>> On Sat, Dec 6, 2008 at 1:03 AM, Alek Storm <[email protected]> wrote:
>>
>>> Is it really that useful to have ByteSize() cached for repeated fields?
>>> If it's not, we get everything I mentioned above for free.  I'm genuinely
>>> not sure - it only comes up when serializing the message in wire_format.py.
>>> What do you think?
>>>
>>
>> Yes, it's just as necessary as it is with optional fields.  The main
>> problem is that the size of a message must be written before the message
>> contents itself.  If, while serializing, you call ByteSize() to get this
>> size every time you write a message, then you'll end up computing the size
>> of deeply-nested messages many times (once for each outer message within
>> which they're nested).  Caching avoids that problem.
>>
>
> Okay, then we just need to cache the size only during serialization.  The
> children's sizes are calculated and stored, then added to the parent's
> size.  Write the parent size, then write the parent, then the child size,
> then the child, on down the tree.  Then it's O(n) (same as we have
> currently) and no ownership problems, because we can drop the weak reference
> from child to parent.  Would that work?


It may work, but ByteSize is a part of the public interface of the message,
so making it slower may not be a good idea.
However the parent reference will still be needed.

Example:
file.proto:

message M1 {
  optional int32 i = 1;
}

message M2 {
  optional M1 m1 = 1;
}

message M3 {
  optional M2 m2 = 1;
}

file.py:
m3 = M3()
m3.m2.m1.i = 3
m3.HasField('m2') # should be True

How does m3 know if m2 was set? This information is right now provided by
the setter of 'i' in m1 (by calling TransitionToNonEmpty on the parent,
which calls TransitionToNonEmpty on its parent and so on).
As opposed to the C++ API (where a call to mutable_m2 will mark m2 as set)
in the Python API a mutable and non-mutable calls are not so easy to
distinguish.
So in this case:
m3 = M3()
m3.m2.m1.HasField('i') # Should be False
m3.HasField('m2') # Should be False, even though we used m3.m2.*

So the parent references are still needed. Let's keep the slice assignment
of repeated scalar fields and just remove the slice assignment of repeated
composite fields (I still don't find it usefull). E.g. we can keep
__getslice__, __delitem__ and __delslice__ for repeated composite fields.


>
> Cheers,
> Alek Storm
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to