On Tue, Feb 1, 2011 at 23:02, fpmc <[email protected]> wrote: > Your proposal has one VisitXXX function for each repeated type. How > does it handle a message with two repeated fields of the same type?
I guess the naming is confusing in the example. The Visit is per field-name; but since the typed is named the same as the field in this example, it is confusing. > > On Feb 2, 2:30 pm, Kenton Varda <[email protected]> wrote: >> On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh <[email protected]> wrote: >> > Conceptually this sounds great, the big question to me is whether this >> > should be implemented as an option in the compiler or as a separate plugin. >> > I haven't taken a thorough look at the patch, but I'd guess it adds a >> > decent >> > amount to the core code generator. I have a preference for the plugin >> > approach, but of course I'm primarily an internal protobuf user, so I'm >> > willing to be convinced otherwise :-) Would using a plugin, possibly even >> > shipped with the standard implementation, make this feature too >> > inconvenient >> > to use? Or is there enough demand for this that it warrants implementing as >> > an option? >> >> First of all, note that this feature is off by default. You have to turn it >> on with the generate_visitors message-level option. The only new code added >> to the base library is a couple templates in WireFormatLite, which are of >> course never instantiated if you don't generate visitor code. >> >> There are a few reasons I prefer to make this part of the base code >> generator: >> >> - If you look at the patch, you'll see that the code generation for the two >> Guide classes actually shares a lot with the code generation for >> MergeFromCodedStream and SerializeWithCachedSizes. To make this a plugin, >> either we'd have to expose parts of the C++ code generator internals >> publicly (eww) or we'd have to reproduce a lot of code (also eww). >> >> - The Reader and Writer classes directly use WireFormatLite, which is a >> private interface. >> >> - It seems clear that this feature is widely desired by open source users. >> We're not talking about a niche use case here. >> >> > Regarding the proposed interfaces: I can imagine some applications where >> > the const refs passed to the visitor methods may be too restrictive - the >> > user may instead want to take ownership of the object. e.g., suppose the >> > stream is a series of requests, and each of the visitor handlers needs to >> > start some asynchronous work. It would be good to hear if users have use >> > cases that don't quite fit into this model (or at least if the existing use >> > cases will work). >> >> Interesting point. In the Reader case, it's creating new objects, so in >> theory it ought to be able to hand off ownership to the Visitor it calls. >> But, the Walker is walking an existing object and thus clearly cannot give >> up ownership. It seems clear that some use cases need const references, >> which means that the only way we could support ownership passing is by >> adding another parallel set of methods. I suppose they could have default >> implementations that delegate to the const reference versions, in which case >> only people who wanted to optimize for them would need to override them. >> But I'd like to see that this is really desired first -- it's easy enough >> to add later. >> >> Also note that my code currently doesn't reuse message objects, but >> improving it to do so would be straightforward. A Reader could allocate one >> object of each sub-message type for reuse. But, it seems like that wouldn't >> play well with ownership-passing. >> >> >> >> >> >> >> >> >> >> > On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda <[email protected]> wrote: >> >> >> Hello open source protobuf users, >> >> >> *Background* >> >> >> Probably the biggest deficiency in the open source protocol buffers >> >> libraries today is a lack of built-in support for handling streams of >> >> messages. True, it's not too hard for users to support it manually, by >> >> prefixing each message with its size as described here: >> >> >>http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre... >> >> >> However, this is awkward, and typically requires users to reach into the >> >> low-level CodedInputStream/CodedOutputStream classes and do a lot of work >> >> manually. Furthermore, many users want to handle streams >> >> of heterogeneous message types. We tell them to wrap their messages in an >> >> outer type using the "union" pattern: >> >> >> http://code.google.com/apis/protocolbuffers/docs/techniques.html#union >> >> >> But this is kind of ugly and has unnecessary overhead. >> >> >> These problems never really came up in our internal usage, because inside >> >> Google we have an RPC system and other utility code which builds on top of >> >> protocol buffers and provides appropriate abstraction. While we'd like to >> >> open source this code, a lot of it is large, somewhat messy, and highly >> >> interdependent with unrelated parts of our environment, and no one has had >> >> the time to rewrite it all cleanly (as we did with protocol buffers >> >> itself). >> >> >> *Proposed solution: Generated Visitors* >> >> >> I've been wanting to fix this for some time now, but didn't really have a >> >> good idea how. CodedInputStream is annoyingly low-level, but I couldn't >> >> think of much better an interface for reading a stream of messages off the >> >> wire. >> >> >> A couple weeks ago, though, I realized that I had been failing to consider >> >> how new kinds of code generation could help this problem. I was trying to >> >> think of solutions that would go into the protobuf base library, not >> >> solutions that were generated by the protocol compiler. >> >> >> So then it became pretty clear: A protobuf message definition can also be >> >> interpreted as a definition for a streaming protocol. Each field in the >> >> message is a kind of item in the stream. >> >> >> // A stream of Foo and Bar messages, and also strings. >> >> message MyStream { >> >> option generate_visitors = true; // enables generation of streaming >> >> classes >> >> repeated Foo foo = 1; >> >> repeated Bar bar = 2; >> >> repeated string baz = 3; >> >> } >> >> >> All we need to do is generate code appropriate for treating MyStream as a >> >> stream, rather than one big message. >> >> >> My approach is to generate two interfaces, each with two provided >> >> implementations. The interfaces are "Visitor" and "Guide". >> >> MyStream::Visitor looks like this: >> >> >> class MyStream::Visitor { >> >> public: >> >> virtual ~Visitor(); >> >> >> virtual void VisitFoo(const Foo& foo); >> >> virtual void VisitBar(const Bar& bar); >> >> virtual void VisitBaz(const std::string& baz); >> >> }; >> >> >> The Visitor class has two standard implementations: "Writer" and >> >> "Filler". MyStream::Writer writes the visited fields to a >> >> CodedOutputStream, using the same wire format as would be used to encode >> >> MyStream as one big message. MyStream::Filler fills in a MyStream message >> >> object with the visited values. >> >> >> Meanwhile, Guides are objects that drive Visitors. >> >> >> class MyStream::Guide { >> >> public: >> >> virtual ~Guide(); >> >> >> // Call the methods of the visitor on the Guide's data. >> >> virtual void Accept(MyStream::Visitor* visitor) = 0; >> >> >> // Just fill in a message object directly rather than use a visitor. >> >> virtual void Fill(MyStream* message) = 0; >> >> }; >> >> >> The two standard implementations of Guide are "Reader" and "Walker". >> >> MyStream::Reader reads items from a CodedInputStream and passes them to >> >> the >> >> visitor. MyStream::Walker walks over a MyStream message object and passes >> >> all the fields to the visitor. >> >> >> To handle a stream of messages, simply attach a Reader to your own Visitor >> >> implementation. Your visitor's methods will then be called as each item >> >> is >> >> parsed, kind of like "SAX" XML parsing, but type-safe. >> >> >> *Nonblocking I/O* >> >> >> The "Reader" type declared above is based on blocking I/O, but many users >> >> would prefer a non-blocking approach. I'm less sure how to handle this, >> >> but >> >> my thought was that we could provide a utility class like: >> >> >> class NonblockingHelper { >> >> public: >> >> template <typename MessageType> >> >> NonblockingHelper(typename MessageType::Visitor* visitor); >> >> >> // Push data into the buffer. If the data completes any fields, >> >> // they will be passed to the underlying visitor. Any left-over data >> >> // is remembered for the next call. >> >> void PushData(void* data, int size); >> >> }; >> >> >> With this, you can use whatever non-blocking I/O mechanism you want, and >> >> just have to push the data into the NonblockingHelper, which will take >> >> care >> >> of calling the Visitor as necessary. >> >> >> *C++ implementation* >> >> >> I've written up a patch implementing this for C++ (not yet including the >> >> nonblocking part): >> >> >> http://codereview.appspot.com/4077052 >> >> >> *Feedback* >> >> >> What do you think? >> >> >> I know I'm excited to use this in some of my own side projects (which is >> >> why I spent my weekend working on it), but before adding this to the >> >> official implementation we should make sure it is broadly useful. > > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
