On Wed, Dec 3, 2008 at 12:02 AM, Alek Storm <[EMAIL PROTECTED]> wrote:
> On Tue, Dec 2, 2008 at 11:17 PM, Kenton Varda <[EMAIL PROTECTED]> wrote: > >> Well, the generated serializing and parsing code in C++ is an order of >> magnitude faster than the dynamic (reflection-based) code. But to use >> generated code you need to be using C++ object handling. >> > > Not if you decouple them. Abstractly, the C++ parser receives a serialized > message and descriptor and returns a tree of the form [(tag_num, value)] > where tag_num is an integer and value is either a scalar or a subtree (for > submessages). The Python reflection code takes the tree and fills the > message object with its values. It's simple, fast, and the C++ parser can > be easily swapped out for a pure-Python one on systems that don't support > the C++ version. > Sorry, I think you misunderstood. The C++ parsers generated by protoc (with optimize_for = SPEED) are an order of magnitude faster than the dynamic *C++* parser (used with optimize_for = CODE_SIZE and DynamicMessage). The Python parser is considerably slower than either of them, but that's beside the point. Your "decoupled" parser which produces a tag/value tree will be at least as slow as the existing C++ dynamic parser, probably slower (since it sounds like it would use some sort of dictionary structure rather than flat classes/structs). > Run this backwards when serializing, and you get another advantage: you can > easily swap out the function that converts the tree into serialized protobuf > for one that outputs XML, JSON, etc. > You can already easily write encoders and decoders for alternative formats using reflection. > > > >> You're right. If it's a waste of time for them, most people won't use >>> it. But if there's no point to it, why do normal Python lists have it? >>> It's useful enough to be included there. And since repeated fields act just >>> like lists, it should be included here too. >> >> >> I think Python object lists are probably used in a much wider variety of >> ways than protocol buffer repeated fields generally are. >> > > Let's include it - it gives us a more complete list interface, there's no > downside, and the users can decide whether they want to use it. We can't > predict all possible use cases. > Ah, yes, the old "Why not?" argument. :) Actually, I far prefer the opposite argument: If you aren't sure if someone will want a feature, don't include it. There is always a down side to including a feature. Even if people choose not to use it, it increases code size, maintenance burden, memory usage, and interface complexity. Worse yet, if people do use it, then we're permanently stuck with it, whether we like it or not. We can't change it later, even if we decide it's wrong. For example, we may decide later -- based on an actual use case, perhaps -- that it would really have been better if remove() compared elements by content rather than by identity, so that you could remove a message from a repeated field by constructing an identical message and then calling remove(). But we wouldn't be able to change it. We'd have to instead add a different method like removeByValue(), which would be ugly and add even more complexity. Protocol Buffers got where they are by stubbornly refusing the vast majority of feature suggestions. :) That said, you do have a good point that the interface should be similar to standard Python lists if possible. But given the other problems that prevent this, it seems like a moot point. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---