On Wed, Dec 3, 2008 at 4:30 AM, Alek Storm <[EMAIL PROTECTED]> wrote:
> (Okay, back on track) > > On Tue, Dec 2, 2008 at 11:17 PM, Kenton Varda <[EMAIL PROTECTED]> wrote: > >> On Tue, Dec 2, 2008 at 11:08 PM, Alek Storm <[EMAIL PROTECTED]> wrote: >> >>> I would think encoding and decoding would be the main bottlenecks, so >>> can't those be wrappers around C++, while let object handling (reflection.py >>> and friends) be pure-python? It seems like the best of both worlds. >>> >> >>> >> Well, the generated serializing and parsing code in C++ is an order of >> magnitude faster than the dynamic (reflection-based) code. But to use >> generated code you need to be using C++ object handling. >> > > Not if you decouple them. Abstractly, the C++ parser receives a serialized > message and descriptor and returns a tree of the form [(tag_num, value)] > where tag_num is an integer and value is either a scalar or a subtree (for > submessages). The Python reflection code takes the tree and fills the > message object with its values. It's simple, fast, and the C++ parser can > be easily swapped out for a pure-Python one on systems that don't support > the C++ version. > > Run this backwards when serializing, and you get another advantage: you can > easily swap out the function that converts the tree into serialized protobuf > for one that outputs XML, JSON, etc. > It's not that simple. We would also like to improve performance at least in MergeFrom/CopyFrom/ParseASCII/IsInitialized. > > >> You're right. If it's a waste of time for them, most people won't use >>> it. But if there's no point to it, why do normal Python lists have it? >>> It's useful enough to be included there. And since repeated fields act just >>> like lists, it should be included here too. >> >> >> I think Python object lists are probably used in a much wider variety of >> ways than protocol buffer repeated fields generally are. >> > > Let's include it - it gives us a more complete list interface, there's no > downside, and the users can decide whether they want to use it. We can't > predict all possible use cases. > The thing is, when they start to use it, you can't remove it later if it turns to be a problem ... > > In fact, it doesn't even have to be useful for repeated composites. The >>> fact that repeated scalars have it means it's automatically included for >>> repeated composites, because they should have the exact same interface. >>> Polymorphism is what we want here. >> >> >> But they already can't have the same interface because append() doesn't >> work. :) >> > > We don't have confirmation on that yet ;). Having the same interface is > what we should be shooting for. > Currently each composite field has a reference to its parent. This makes it impossible to add the same composite to two different repeated composite fields. The .add() method guarantees that this never happens. Take a look at this example: .proto: message M1 { optional int32 i = 1; } message M2 { repeated M1 m1 = 1; } message M3 { repeated M1 m1 = 1; } usage: m2 = M2() m3 = M3() m1 = M1() m1.i = 1 m2.m1.append(m1) m3.m1.append(m1) print m2.ByteSize() # Correct print m3.ByteSize() # Correct m1.i = 11111111 # This should mark m2.ByteSize() and m3.ByteSize() dirty. print m2.ByteSize() # Incorrect - because m1 references its new parent m3, and when m1 it gets updated, it only notifies m3. print m3.ByteSize() # Correct I think protobuf's repeated composite fields aren't and shouldn't be equivalent to python lists. > > Thanks, > Alek Storm > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---