Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

Henner Zeller Wed, 02 Feb 2011 10:13:29 -0800

On Tue, Feb 1, 2011 at 23:02, fpmc <[email protected]> wrote:
> Your proposal has one VisitXXX function for each repeated type.  How
> does it handle a message with two repeated fields of the same type?


I guess the naming is confusing in the example. The Visit is per
field-name; but since the typed is named the same as the field in this
example, it is confusing.

>
> On Feb 2, 2:30 pm, Kenton Varda <[email protected]> wrote:
>> On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh <[email protected]> wrote:
>> > Conceptually this sounds great, the big question to me is whether this
>> > should be implemented as an option in the compiler or as a separate plugin.
>> > I haven't taken a thorough look at the patch, but I'd guess it adds a 
>> > decent
>> > amount to the core code generator. I have a preference for the plugin
>> > approach, but of course I'm primarily an internal protobuf user, so I'm
>> > willing to be convinced otherwise :-) Would using a plugin, possibly even
>> > shipped with the standard implementation, make this feature too 
>> > inconvenient
>> > to use? Or is there enough demand for this that it warrants implementing as
>> > an option?
>>
>> First of all, note that this feature is off by default.  You have to turn it
>> on with the generate_visitors message-level option.  The only new code added
>> to the base library is a couple templates in WireFormatLite, which are of
>> course never instantiated if you don't generate visitor code.
>>
>> There are a few reasons I prefer to make this part of the base code
>> generator:
>>
>> - If you look at the patch, you'll see that the code generation for the two
>> Guide classes actually shares a lot with the code generation for
>> MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
>> either we'd have to expose parts of the C++ code generator internals
>> publicly (eww) or we'd have to reproduce a lot of code (also eww).
>>
>> - The Reader and Writer classes directly use WireFormatLite, which is a
>> private interface.
>>
>> - It seems clear that this feature is widely desired by open source users.
>>  We're not talking about a niche use case here.
>>
>> > Regarding the proposed interfaces: I can imagine some applications where
>> > the const refs passed to the visitor methods may be too restrictive - the
>> > user may instead want to take ownership of the object. e.g., suppose the
>> > stream is a series of requests, and each of the visitor handlers needs to
>> > start some asynchronous work. It would be good to hear if users have use
>> > cases that don't quite fit into this model (or at least if the existing use
>> > cases will work).
>>
>> Interesting point.  In the Reader case, it's creating new objects, so in
>> theory it ought to be able to hand off ownership to the Visitor it calls.
>>  But, the Walker is walking an existing object and thus clearly cannot give
>> up ownership.  It seems clear that some use cases need const references,
>> which means that the only way we could support ownership passing is by
>> adding another parallel set of methods.  I suppose they could have default
>> implementations that delegate to the const reference versions, in which case
>> only people who wanted to optimize for them would need to override them.
>>  But I'd like to see that this is really desired first -- it's easy enough
>> to add later.
>>
>> Also note that my code currently doesn't reuse message objects, but
>> improving it to do so would be straightforward.  A Reader could allocate one
>> object of each sub-message type for reuse.  But, it seems like that wouldn't
>> play well with ownership-passing.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda <[email protected]> wrote:
>>
>> >> Hello open source protobuf users,
>>
>> >> *Background*
>>
>> >> Probably the biggest deficiency in the open source protocol buffers
>> >> libraries today is a lack of built-in support for handling streams of
>> >> messages.  True, it's not too hard for users to support it manually, by
>> >> prefixing each message with its size as described here:
>>
>> >>http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...
>>
>> >> However, this is awkward, and typically requires users to reach into the
>> >> low-level CodedInputStream/CodedOutputStream classes and do a lot of work
>> >> manually.  Furthermore, many users want to handle streams
>> >> of heterogeneous message types.  We tell them to wrap their messages in an
>> >> outer type using the "union" pattern:
>>
>> >>  http://code.google.com/apis/protocolbuffers/docs/techniques.html#union
>>
>> >> But this is kind of ugly and has unnecessary overhead.
>>
>> >> These problems never really came up in our internal usage, because inside
>> >> Google we have an RPC system and other utility code which builds on top of
>> >> protocol buffers and provides appropriate abstraction. While we'd like to
>> >> open source this code, a lot of it is large, somewhat messy, and highly
>> >> interdependent with unrelated parts of our environment, and no one has had
>> >> the time to rewrite it all cleanly (as we did with protocol buffers 
>> >> itself).
>>
>> >> *Proposed solution:  Generated Visitors*
>>
>> >> I've been wanting to fix this for some time now, but didn't really have a
>> >> good idea how.  CodedInputStream is annoyingly low-level, but I couldn't
>> >> think of much better an interface for reading a stream of messages off the
>> >> wire.
>>
>> >> A couple weeks ago, though, I realized that I had been failing to consider
>> >> how new kinds of code generation could help this problem.  I was trying to
>> >> think of solutions that would go into the protobuf base library, not
>> >> solutions that were generated by the protocol compiler.
>>
>> >> So then it became pretty clear:  A protobuf message definition can also be
>> >> interpreted as a definition for a streaming protocol.  Each field in the
>> >> message is a kind of item in the stream.
>>
>> >>   // A stream of Foo and Bar messages, and also strings.
>> >>   message MyStream {
>> >>     option generate_visitors = true;  // enables generation of streaming
>> >> classes
>> >>     repeated Foo foo = 1;
>> >>     repeated Bar bar = 2;
>> >>     repeated string baz = 3;
>> >>   }
>>
>> >> All we need to do is generate code appropriate for treating MyStream as a
>> >> stream, rather than one big message.
>>
>> >> My approach is to generate two interfaces, each with two provided
>> >> implementations.  The interfaces are "Visitor" and "Guide".
>> >>  MyStream::Visitor looks like this:
>>
>> >>   class MyStream::Visitor {
>> >>    public:
>> >>     virtual ~Visitor();
>>
>> >>     virtual void VisitFoo(const Foo& foo);
>> >>     virtual void VisitBar(const Bar& bar);
>> >>     virtual void VisitBaz(const std::string& baz);
>> >>   };
>>
>> >> The Visitor class has two standard implementations:  "Writer" and
>> >> "Filler".  MyStream::Writer writes the visited fields to a
>> >> CodedOutputStream, using the same wire format as would be used to encode
>> >> MyStream as one big message.  MyStream::Filler fills in a MyStream message
>> >> object with the visited values.
>>
>> >> Meanwhile, Guides are objects that drive Visitors.
>>
>> >>   class MyStream::Guide {
>> >>    public:
>> >>     virtual ~Guide();
>>
>> >>     // Call the methods of the visitor on the Guide's data.
>> >>     virtual void Accept(MyStream::Visitor* visitor) = 0;
>>
>> >>     // Just fill in a message object directly rather than use a visitor.
>> >>     virtual void Fill(MyStream* message) = 0;
>> >>   };
>>
>> >> The two standard implementations of Guide are "Reader" and "Walker".
>> >>  MyStream::Reader reads items from a CodedInputStream and passes them to 
>> >> the
>> >> visitor.  MyStream::Walker walks over a MyStream message object and passes
>> >> all the fields to the visitor.
>>
>> >> To handle a stream of messages, simply attach a Reader to your own Visitor
>> >> implementation.  Your visitor's methods will then be called as each item 
>> >> is
>> >> parsed, kind of like "SAX" XML parsing, but type-safe.
>>
>> >> *Nonblocking I/O*
>>
>> >> The "Reader" type declared above is based on blocking I/O, but many users
>> >> would prefer a non-blocking approach.  I'm less sure how to handle this, 
>> >> but
>> >> my thought was that we could provide a utility class like:
>>
>> >>   class NonblockingHelper {
>> >>    public:
>> >>     template <typename MessageType>
>> >>     NonblockingHelper(typename MessageType::Visitor* visitor);
>>
>> >>     // Push data into the buffer.  If the data completes any fields,
>> >>     // they will be passed to the underlying visitor.  Any left-over data
>> >>     // is remembered for the next call.
>> >>     void PushData(void* data, int size);
>> >>   };
>>
>> >> With this, you can use whatever non-blocking I/O mechanism you want, and
>> >> just have to push the data into the NonblockingHelper, which will take 
>> >> care
>> >> of calling the Visitor as necessary.
>>
>> >> *C++ implementation*
>>
>> >> I've written up a patch implementing this for C++ (not yet including the
>> >> nonblocking part):
>>
>> >>  http://codereview.appspot.com/4077052
>>
>> >> *Feedback*
>>
>> >> What do you think?
>>
>> >> I know I'm excited to use this in some of my own side projects (which is
>> >> why I spent my weekend working on it), but before adding this to the
>> >> official implementation we should make sure it is broadly useful.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Protocol Buffers" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

Reply via email to