Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-04-03 Thread Kenton Varda
On Sat, Apr 2, 2011 at 3:53 PM, Jeffrey Damick jeffreydam...@gmail.comwrote:

 This may be a naive question, but wouldn't the format in text_format
 be a prime example another protocol? It seems that if you are able
 to reuse the vistor generate the text format, then it would be easily
 extendable by others for json or the latest encoding of the week..  I
 look forward to seeing it pushed into the tree.


TextFormat is already implemented purely in terms of public interfaces --
namely, the reflection interface.  Thus it is already possible to write,
say, a JSON encoder/decoder for protobufs, and indeed several people have
done this.

The current visitor proposal (which I haven't had time to work on in awhile,
but will get back to eventually...) does not provide any new way to
implement TextFormat, because all visitor classes are type-specific.  In
other words, to implement TextFormat via visitors you would need to write an
implementation for every single type, rather than one implementation that
covers all types.  This could perhaps be solved by inventing some sort of
generic visitor adapter, but I haven't done any such thing in my patch,
since reflection already solves most of the same problems.



 thanks
 -jeff

 On Feb 8, 2:34 pm, Kenton Varda ken...@google.com wrote:
  On Tue, Feb 8, 2011 at 5:47 AM, Evan Jones ev...@mit.edu wrote:
 
The Visitor class has two standard implementations:  Writer and
   Filler.  MyStream::Writer writes the visited fields to a
   CodedOutputStream, using the same wire format as would be used to
 encode
   MyStream as one big message.
 
   Imagine I wanted a different protocol. Eg. I want something that
 checksums
   each message, or maybe compresses them, etc. Will I need to subclass
   MessageType::Visitor for each stream that I want to encode? Or will I
 need
   to change the code generator?
 
  To do these things generically, we'd need to introduce some sort of
  equivalent of Reflection for streams.  This certainly seems like it could
 be
  a useful addition to the family, but I wanted to get the basic
 functionality
  out there first and then see if this is needed.
 
  Note that I expect people will generally only stream their top-level
  message.  Although the proposal allows for streaming sub-messages as
 well, I
  expect that people will normally want to parse them into message objects
  which are handled whole.  So, you only have to manually implement the
  top-level stream, and then you can invoke some reflective algorithm from
  there.
 

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-04-03 Thread Jeffrey Damick
It just seems like a lot machinery has to be repeated across
encoders/decoders to walk the messages  fields vs. a more event driven
style like your vistor writer/filler which would abstract some of that, but
it comes down to a matter of taste i suppose.  I'm definitely in favor the
generic vistor adapter..

thanks,
-jeff



On Sun, Apr 3, 2011 at 3:28 AM, Kenton Varda ken...@google.com wrote:

 On Sat, Apr 2, 2011 at 3:53 PM, Jeffrey Damick jeffreydam...@gmail.comwrote:

 This may be a naive question, but wouldn't the format in text_format
 be a prime example another protocol? It seems that if you are able
 to reuse the vistor generate the text format, then it would be easily
 extendable by others for json or the latest encoding of the week..  I
 look forward to seeing it pushed into the tree.


 TextFormat is already implemented purely in terms of public interfaces --
 namely, the reflection interface.  Thus it is already possible to write,
 say, a JSON encoder/decoder for protobufs, and indeed several people have
 done this.

 The current visitor proposal (which I haven't had time to work on in
 awhile, but will get back to eventually...) does not provide any new way to
 implement TextFormat, because all visitor classes are type-specific.  In
 other words, to implement TextFormat via visitors you would need to write an
 implementation for every single type, rather than one implementation that
 covers all types.  This could perhaps be solved by inventing some sort of
 generic visitor adapter, but I haven't done any such thing in my patch,
 since reflection already solves most of the same problems.



 thanks
 -jeff

 On Feb 8, 2:34 pm, Kenton Varda ken...@google.com wrote:
  On Tue, Feb 8, 2011 at 5:47 AM, Evan Jones ev...@mit.edu wrote:
 
The Visitor class has two standard implementations:  Writer and
   Filler.  MyStream::Writer writes the visited fields to a
   CodedOutputStream, using the same wire format as would be used to
 encode
   MyStream as one big message.
 
   Imagine I wanted a different protocol. Eg. I want something that
 checksums
   each message, or maybe compresses them, etc. Will I need to subclass
   MessageType::Visitor for each stream that I want to encode? Or will I
 need
   to change the code generator?
 
  To do these things generically, we'd need to introduce some sort of
  equivalent of Reflection for streams.  This certainly seems like it
 could be
  a useful addition to the family, but I wanted to get the basic
 functionality
  out there first and then see if this is needed.
 
  Note that I expect people will generally only stream their top-level
  message.  Although the proposal allows for streaming sub-messages as
 well, I
  expect that people will normally want to parse them into message objects
  which are handled whole.  So, you only have to manually implement the
  top-level stream, and then you can invoke some reflective algorithm from
  there.
 

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-04-02 Thread Jeffrey Damick
This may be a naive question, but wouldn't the format in text_format
be a prime example another protocol? It seems that if you are able
to reuse the vistor generate the text format, then it would be easily
extendable by others for json or the latest encoding of the week..  I
look forward to seeing it pushed into the tree.

thanks
-jeff

On Feb 8, 2:34 pm, Kenton Varda ken...@google.com wrote:
 On Tue, Feb 8, 2011 at 5:47 AM, Evan Jones ev...@mit.edu wrote:

   The Visitor class has two standard implementations:  Writer and
  Filler.  MyStream::Writer writes the visited fields to a
  CodedOutputStream, using the same wire format as would be used to encode
  MyStream as one big message.

  Imagine I wanted a different protocol. Eg. I want something that checksums
  each message, or maybe compresses them, etc. Will I need to subclass
  MessageType::Visitor for each stream that I want to encode? Or will I need
  to change the code generator?

 To do these things generically, we'd need to introduce some sort of
 equivalent of Reflection for streams.  This certainly seems like it could be
 a useful addition to the family, but I wanted to get the basic functionality
 out there first and then see if this is needed.

 Note that I expect people will generally only stream their top-level
 message.  Although the proposal allows for streaming sub-messages as well, I
 expect that people will normally want to parse them into message objects
 which are handled whole.  So, you only have to manually implement the
 top-level stream, and then you can invoke some reflective algorithm from
 there.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-07 Thread Jason Hsueh
I'm starting to look at the patch (meant to start end of last week but got
caught up in other stuff)

On Tue, Feb 1, 2011 at 9:30 PM, Kenton Varda ken...@google.com wrote:

 On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh jas...@google.com wrote:

 Conceptually this sounds great, the big question to me is whether this
 should be implemented as an option in the compiler or as a separate plugin.
 I haven't taken a thorough look at the patch, but I'd guess it adds a decent
 amount to the core code generator. I have a preference for the plugin
 approach, but of course I'm primarily an internal protobuf user, so I'm
 willing to be convinced otherwise :-) Would using a plugin, possibly even
 shipped with the standard implementation, make this feature too inconvenient
 to use? Or is there enough demand for this that it warrants implementing as
 an option?


 First of all, note that this feature is off by default.  You have to turn
 it on with the generate_visitors message-level option.  The only new code
 added to the base library is a couple templates in WireFormatLite, which are
 of course never instantiated if you don't generate visitor code.

 There are a few reasons I prefer to make this part of the base code
 generator:

 - If you look at the patch, you'll see that the code generation for the two
 Guide classes actually shares a lot with the code generation for
 MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
 either we'd have to expose parts of the C++ code generator internals
 publicly (eww) or we'd have to reproduce a lot of code (also eww).

 - The Reader and Writer classes directly use WireFormatLite, which is a
 private interface.


 - It seems clear that this feature is widely desired by open source users.
  We're not talking about a niche use case here.


 Regarding the proposed interfaces: I can imagine some applications where
 the const refs passed to the visitor methods may be too restrictive - the
 user may instead want to take ownership of the object. e.g., suppose the
 stream is a series of requests, and each of the visitor handlers needs to
 start some asynchronous work. It would be good to hear if users have use
 cases that don't quite fit into this model (or at least if the existing use
 cases will work).


 Interesting point.  In the Reader case, it's creating new objects, so in
 theory it ought to be able to hand off ownership to the Visitor it calls.
  But, the Walker is walking an existing object and thus clearly cannot give
 up ownership.  It seems clear that some use cases need const references,
 which means that the only way we could support ownership passing is by
 adding another parallel set of methods.  I suppose they could have default
 implementations that delegate to the const reference versions, in which case
 only people who wanted to optimize for them would need to override them.
  But I'd like to see that this is really desired first -- it's easy enough
 to add later.


Yeah, there's definitely a need for the const ref versions. It sounds like
nobody is clamoring for mutable access/ownership-passing so let's proceed as
is.


 Also note that my code currently doesn't reuse message objects, but
 improving it to do so would be straightforward.  A Reader could allocate one
 object of each sub-message type for reuse.  But, it seems like that wouldn't
 play well with ownership-passing.


 Perhaps instead of ownership-passing the methods could provide mutable
access so people could Swap() etc. It would defeat the optimization, but at
least be less messy. Anyway, all of this can be revisited later should the
need arise.








 On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda ken...@google.com wrote:

 Hello open source protobuf users,

 *Background*

 Probably the biggest deficiency in the open source protocol buffers
 libraries today is a lack of built-in support for handling streams of
 messages.  True, it's not too hard for users to support it manually, by
 prefixing each message with its size as described here:


 http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming

 However, this is awkward, and typically requires users to reach into the
 low-level CodedInputStream/CodedOutputStream classes and do a lot of work
 manually.  Furthermore, many users want to handle streams
 of heterogeneous message types.  We tell them to wrap their messages in an
 outer type using the union pattern:

   http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

 But this is kind of ugly and has unnecessary overhead.

 These problems never really came up in our internal usage, because inside
 Google we have an RPC system and other utility code which builds on top of
 protocol buffers and provides appropriate abstraction. While we'd like to
 open source this code, a lot of it is large, somewhat messy, and highly
 interdependent with unrelated parts of our environment, and no one has had
 the time to rewrite it all cleanly (as we did with 

Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-05 Thread Frank Chu
Can the naming be

visit_bar()
visit_baz()

then?  It's good to have some consistency.

Frank

On Sat, Feb 5, 2011 at 08:06, Kenton Varda ken...@google.com wrote:

 On Wed, Feb 2, 2011 at 10:13 AM, Henner Zeller 
 henner.zel...@googlemail.com wrote:

 I guess the naming is confusing in the example. The Visit is per
 field-name; but since the typed is named the same as the field in this
 example, it is confusing.


 Yes, sorry.  Better example:

   message MyStream {
 option generate_visitors = true;
 repeated Foo bar = 1;
 repeated Foo baz = 2;
   }

 creates:

   class MyStream::Visitor {
public:
 virtual ~Visitor();

 virtual void VisitBar(const Foo value);
 virtual void VisitBaz(const Foo value);
   };



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-05 Thread Kenton Varda
Unfortunately, the Google C++ Style Guide prescribes inconsistency.  Only
simple inline methods can use lowercase-with-underscores naming; everything
else is supposed to use capitalized camelcase.

On Sat, Feb 5, 2011 at 8:40 AM, Frank Chu f...@google.com wrote:

 Can the naming be

 visit_bar()
 visit_baz()

 then?  It's good to have some consistency.

 Frank


 On Sat, Feb 5, 2011 at 08:06, Kenton Varda ken...@google.com wrote:

 On Wed, Feb 2, 2011 at 10:13 AM, Henner Zeller 
 henner.zel...@googlemail.com wrote:

 I guess the naming is confusing in the example. The Visit is per
 field-name; but since the typed is named the same as the field in this
 example, it is confusing.


 Yes, sorry.  Better example:

   message MyStream {
 option generate_visitors = true;
 repeated Foo bar = 1;
 repeated Foo baz = 2;
   }

 creates:

   class MyStream::Visitor {
public:
 virtual ~Visitor();

 virtual void VisitBar(const Foo value);
 virtual void VisitBaz(const Foo value);
   };




-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-04 Thread Kenton Varda
On Wed, Feb 2, 2011 at 10:13 AM, Henner Zeller henner.zel...@googlemail.com
 wrote:

 I guess the naming is confusing in the example. The Visit is per
 field-name; but since the typed is named the same as the field in this
 example, it is confusing.


Yes, sorry.  Better example:

  message MyStream {
option generate_visitors = true;
repeated Foo bar = 1;
repeated Foo baz = 2;
  }

creates:

  class MyStream::Visitor {
   public:
virtual ~Visitor();

virtual void VisitBar(const Foo value);
virtual void VisitBaz(const Foo value);
  };

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-02 Thread fpmc
Your proposal has one VisitXXX function for each repeated type.  How
does it handle a message with two repeated fields of the same type?

On Feb 2, 2:30 pm, Kenton Varda ken...@google.com wrote:
 On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh jas...@google.com wrote:
  Conceptually this sounds great, the big question to me is whether this
  should be implemented as an option in the compiler or as a separate plugin.
  I haven't taken a thorough look at the patch, but I'd guess it adds a decent
  amount to the core code generator. I have a preference for the plugin
  approach, but of course I'm primarily an internal protobuf user, so I'm
  willing to be convinced otherwise :-) Would using a plugin, possibly even
  shipped with the standard implementation, make this feature too inconvenient
  to use? Or is there enough demand for this that it warrants implementing as
  an option?

 First of all, note that this feature is off by default.  You have to turn it
 on with the generate_visitors message-level option.  The only new code added
 to the base library is a couple templates in WireFormatLite, which are of
 course never instantiated if you don't generate visitor code.

 There are a few reasons I prefer to make this part of the base code
 generator:

 - If you look at the patch, you'll see that the code generation for the two
 Guide classes actually shares a lot with the code generation for
 MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
 either we'd have to expose parts of the C++ code generator internals
 publicly (eww) or we'd have to reproduce a lot of code (also eww).

 - The Reader and Writer classes directly use WireFormatLite, which is a
 private interface.

 - It seems clear that this feature is widely desired by open source users.
  We're not talking about a niche use case here.

  Regarding the proposed interfaces: I can imagine some applications where
  the const refs passed to the visitor methods may be too restrictive - the
  user may instead want to take ownership of the object. e.g., suppose the
  stream is a series of requests, and each of the visitor handlers needs to
  start some asynchronous work. It would be good to hear if users have use
  cases that don't quite fit into this model (or at least if the existing use
  cases will work).

 Interesting point.  In the Reader case, it's creating new objects, so in
 theory it ought to be able to hand off ownership to the Visitor it calls.
  But, the Walker is walking an existing object and thus clearly cannot give
 up ownership.  It seems clear that some use cases need const references,
 which means that the only way we could support ownership passing is by
 adding another parallel set of methods.  I suppose they could have default
 implementations that delegate to the const reference versions, in which case
 only people who wanted to optimize for them would need to override them.
  But I'd like to see that this is really desired first -- it's easy enough
 to add later.

 Also note that my code currently doesn't reuse message objects, but
 improving it to do so would be straightforward.  A Reader could allocate one
 object of each sub-message type for reuse.  But, it seems like that wouldn't
 play well with ownership-passing.









  On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda ken...@google.com wrote:

  Hello open source protobuf users,

  *Background*

  Probably the biggest deficiency in the open source protocol buffers
  libraries today is a lack of built-in support for handling streams of
  messages.  True, it's not too hard for users to support it manually, by
  prefixing each message with its size as described here:

 http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...

  However, this is awkward, and typically requires users to reach into the
  low-level CodedInputStream/CodedOutputStream classes and do a lot of work
  manually.  Furthermore, many users want to handle streams
  of heterogeneous message types.  We tell them to wrap their messages in an
  outer type using the union pattern:

   http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

  But this is kind of ugly and has unnecessary overhead.

  These problems never really came up in our internal usage, because inside
  Google we have an RPC system and other utility code which builds on top of
  protocol buffers and provides appropriate abstraction. While we'd like to
  open source this code, a lot of it is large, somewhat messy, and highly
  interdependent with unrelated parts of our environment, and no one has had
  the time to rewrite it all cleanly (as we did with protocol buffers 
  itself).

  *Proposed solution:  Generated Visitors*

  I've been wanting to fix this for some time now, but didn't really have a
  good idea how.  CodedInputStream is annoyingly low-level, but I couldn't
  think of much better an interface for reading a stream of messages off the
  wire.

  A couple weeks ago, though, I 

Re: [protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-02 Thread Henner Zeller
On Tue, Feb 1, 2011 at 23:02, fpmc f...@google.com wrote:
 Your proposal has one VisitXXX function for each repeated type.  How
 does it handle a message with two repeated fields of the same type?

I guess the naming is confusing in the example. The Visit is per
field-name; but since the typed is named the same as the field in this
example, it is confusing.


 On Feb 2, 2:30 pm, Kenton Varda ken...@google.com wrote:
 On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh jas...@google.com wrote:
  Conceptually this sounds great, the big question to me is whether this
  should be implemented as an option in the compiler or as a separate plugin.
  I haven't taken a thorough look at the patch, but I'd guess it adds a 
  decent
  amount to the core code generator. I have a preference for the plugin
  approach, but of course I'm primarily an internal protobuf user, so I'm
  willing to be convinced otherwise :-) Would using a plugin, possibly even
  shipped with the standard implementation, make this feature too 
  inconvenient
  to use? Or is there enough demand for this that it warrants implementing as
  an option?

 First of all, note that this feature is off by default.  You have to turn it
 on with the generate_visitors message-level option.  The only new code added
 to the base library is a couple templates in WireFormatLite, which are of
 course never instantiated if you don't generate visitor code.

 There are a few reasons I prefer to make this part of the base code
 generator:

 - If you look at the patch, you'll see that the code generation for the two
 Guide classes actually shares a lot with the code generation for
 MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
 either we'd have to expose parts of the C++ code generator internals
 publicly (eww) or we'd have to reproduce a lot of code (also eww).

 - The Reader and Writer classes directly use WireFormatLite, which is a
 private interface.

 - It seems clear that this feature is widely desired by open source users.
  We're not talking about a niche use case here.

  Regarding the proposed interfaces: I can imagine some applications where
  the const refs passed to the visitor methods may be too restrictive - the
  user may instead want to take ownership of the object. e.g., suppose the
  stream is a series of requests, and each of the visitor handlers needs to
  start some asynchronous work. It would be good to hear if users have use
  cases that don't quite fit into this model (or at least if the existing use
  cases will work).

 Interesting point.  In the Reader case, it's creating new objects, so in
 theory it ought to be able to hand off ownership to the Visitor it calls.
  But, the Walker is walking an existing object and thus clearly cannot give
 up ownership.  It seems clear that some use cases need const references,
 which means that the only way we could support ownership passing is by
 adding another parallel set of methods.  I suppose they could have default
 implementations that delegate to the const reference versions, in which case
 only people who wanted to optimize for them would need to override them.
  But I'd like to see that this is really desired first -- it's easy enough
 to add later.

 Also note that my code currently doesn't reuse message objects, but
 improving it to do so would be straightforward.  A Reader could allocate one
 object of each sub-message type for reuse.  But, it seems like that wouldn't
 play well with ownership-passing.









  On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda ken...@google.com wrote:

  Hello open source protobuf users,

  *Background*

  Probably the biggest deficiency in the open source protocol buffers
  libraries today is a lack of built-in support for handling streams of
  messages.  True, it's not too hard for users to support it manually, by
  prefixing each message with its size as described here:

 http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...

  However, this is awkward, and typically requires users to reach into the
  low-level CodedInputStream/CodedOutputStream classes and do a lot of work
  manually.  Furthermore, many users want to handle streams
  of heterogeneous message types.  We tell them to wrap their messages in an
  outer type using the union pattern:

   http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

  But this is kind of ugly and has unnecessary overhead.

  These problems never really came up in our internal usage, because inside
  Google we have an RPC system and other utility code which builds on top of
  protocol buffers and provides appropriate abstraction. While we'd like to
  open source this code, a lot of it is large, somewhat messy, and highly
  interdependent with unrelated parts of our environment, and no one has had
  the time to rewrite it all cleanly (as we did with protocol buffers 
  itself).

  *Proposed solution:  Generated Visitors*

  I've been wanting to fix this for some 

[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-02 Thread Thingfish
Just want to add my vote for this feature to be added to the base
compiler. I've implemented similar multiplexing patterns over and over
again, and would love for the compiler to free me from writing and
maintaining that code.

On 2 Feb., 19:13, Henner Zeller henner.zel...@googlemail.com wrote:
 On Tue, Feb 1, 2011 at 23:02, fpmc f...@google.com wrote:
  Your proposal has one VisitXXX function for each repeated type.  How
  does it handle a message with two repeated fields of the same type?

 I guess the naming is confusing in the example. The Visit is per
 field-name; but since the typed is named the same as the field in this
 example, it is confusing.









  On Feb 2, 2:30 pm, Kenton Varda ken...@google.com wrote:
  On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh jas...@google.com wrote:
   Conceptually this sounds great, the big question to me is whether this
   should be implemented as an option in the compiler or as a separate 
   plugin.
   I haven't taken a thorough look at the patch, but I'd guess it adds a 
   decent
   amount to the core code generator. I have a preference for the plugin
   approach, but of course I'm primarily an internal protobuf user, so I'm
   willing to be convinced otherwise :-) Would using a plugin, possibly even
   shipped with the standard implementation, make this feature too 
   inconvenient
   to use? Or is there enough demand for this that it warrants implementing 
   as
   an option?

  First of all, note that this feature is off by default.  You have to turn 
  it
  on with the generate_visitors message-level option.  The only new code 
  added
  to the base library is a couple templates in WireFormatLite, which are of
  course never instantiated if you don't generate visitor code.

  There are a few reasons I prefer to make this part of the base code
  generator:

  - If you look at the patch, you'll see that the code generation for the two
  Guide classes actually shares a lot with the code generation for
  MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
  either we'd have to expose parts of the C++ code generator internals
  publicly (eww) or we'd have to reproduce a lot of code (also eww).

  - The Reader and Writer classes directly use WireFormatLite, which is a
  private interface.

  - It seems clear that this feature is widely desired by open source users.
   We're not talking about a niche use case here.

   Regarding the proposed interfaces: I can imagine some applications where
   the const refs passed to the visitor methods may be too restrictive - the
   user may instead want to take ownership of the object. e.g., suppose the
   stream is a series of requests, and each of the visitor handlers needs to
   start some asynchronous work. It would be good to hear if users have use
   cases that don't quite fit into this model (or at least if the existing 
   use
   cases will work).

  Interesting point.  In the Reader case, it's creating new objects, so in
  theory it ought to be able to hand off ownership to the Visitor it calls.
   But, the Walker is walking an existing object and thus clearly cannot give
  up ownership.  It seems clear that some use cases need const references,
  which means that the only way we could support ownership passing is by
  adding another parallel set of methods.  I suppose they could have default
  implementations that delegate to the const reference versions, in which 
  case
  only people who wanted to optimize for them would need to override them.
   But I'd like to see that this is really desired first -- it's easy enough
  to add later.

  Also note that my code currently doesn't reuse message objects, but
  improving it to do so would be straightforward.  A Reader could allocate 
  one
  object of each sub-message type for reuse.  But, it seems like that 
  wouldn't
  play well with ownership-passing.

   On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda ken...@google.com wrote:

   Hello open source protobuf users,

   *Background*

   Probably the biggest deficiency in the open source protocol buffers
   libraries today is a lack of built-in support for handling streams of
   messages.  True, it's not too hard for users to support it manually, by
   prefixing each message with its size as described here:

  http://code.google.com/apis/protocolbuffers/docs/techniques.html#stre...

   However, this is awkward, and typically requires users to reach into the
   low-level CodedInputStream/CodedOutputStream classes and do a lot of 
   work
   manually.  Furthermore, many users want to handle streams
   of heterogeneous message types.  We tell them to wrap their messages in 
   an
   outer type using the union pattern:

    http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

   But this is kind of ugly and has unnecessary overhead.

   These problems never really came up in our internal usage, because 
   inside
   Google we have an RPC system and other utility code which builds 

[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-01 Thread Jason Hsueh
Conceptually this sounds great, the big question to me is whether this
should be implemented as an option in the compiler or as a separate plugin.
I haven't taken a thorough look at the patch, but I'd guess it adds a decent
amount to the core code generator. I have a preference for the plugin
approach, but of course I'm primarily an internal protobuf user, so I'm
willing to be convinced otherwise :-) Would using a plugin, possibly even
shipped with the standard implementation, make this feature too inconvenient
to use? Or is there enough demand for this that it warrants implementing as
an option?

Regarding the proposed interfaces: I can imagine some applications where the
const refs passed to the visitor methods may be too restrictive - the user
may instead want to take ownership of the object. e.g., suppose the stream
is a series of requests, and each of the visitor handlers needs to start
some asynchronous work. It would be good to hear if users have use cases
that don't quite fit into this model (or at least if the existing use cases
will work).

On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda ken...@google.com wrote:

 Hello open source protobuf users,

 *Background*

 Probably the biggest deficiency in the open source protocol buffers
 libraries today is a lack of built-in support for handling streams of
 messages.  True, it's not too hard for users to support it manually, by
 prefixing each message with its size as described here:


 http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming

 However, this is awkward, and typically requires users to reach into the
 low-level CodedInputStream/CodedOutputStream classes and do a lot of work
 manually.  Furthermore, many users want to handle streams
 of heterogeneous message types.  We tell them to wrap their messages in an
 outer type using the union pattern:

   http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

 But this is kind of ugly and has unnecessary overhead.

 These problems never really came up in our internal usage, because inside
 Google we have an RPC system and other utility code which builds on top of
 protocol buffers and provides appropriate abstraction. While we'd like to
 open source this code, a lot of it is large, somewhat messy, and highly
 interdependent with unrelated parts of our environment, and no one has had
 the time to rewrite it all cleanly (as we did with protocol buffers itself).

 *Proposed solution:  Generated Visitors*

 I've been wanting to fix this for some time now, but didn't really have a
 good idea how.  CodedInputStream is annoyingly low-level, but I couldn't
 think of much better an interface for reading a stream of messages off the
 wire.

 A couple weeks ago, though, I realized that I had been failing to consider
 how new kinds of code generation could help this problem.  I was trying to
 think of solutions that would go into the protobuf base library, not
 solutions that were generated by the protocol compiler.

 So then it became pretty clear:  A protobuf message definition can also be
 interpreted as a definition for a streaming protocol.  Each field in the
 message is a kind of item in the stream.

   // A stream of Foo and Bar messages, and also strings.
   message MyStream {
 option generate_visitors = true;  // enables generation of streaming
 classes
 repeated Foo foo = 1;
 repeated Bar bar = 2;
 repeated string baz = 3;
   }

 All we need to do is generate code appropriate for treating MyStream as a
 stream, rather than one big message.

 My approach is to generate two interfaces, each with two provided
 implementations.  The interfaces are Visitor and Guide.
  MyStream::Visitor looks like this:

   class MyStream::Visitor {
public:
 virtual ~Visitor();

 virtual void VisitFoo(const Foo foo);
 virtual void VisitBar(const Bar bar);
 virtual void VisitBaz(const std::string baz);
   };

 The Visitor class has two standard implementations:  Writer and Filler.
  MyStream::Writer writes the visited fields to a CodedOutputStream, using
 the same wire format as would be used to encode MyStream as one big message.
  MyStream::Filler fills in a MyStream message object with the visited
 values.

 Meanwhile, Guides are objects that drive Visitors.

   class MyStream::Guide {
public:
 virtual ~Guide();

 // Call the methods of the visitor on the Guide's data.
 virtual void Accept(MyStream::Visitor* visitor) = 0;

 // Just fill in a message object directly rather than use a visitor.
 virtual void Fill(MyStream* message) = 0;
   };

 The two standard implementations of Guide are Reader and Walker.
  MyStream::Reader reads items from a CodedInputStream and passes them to the
 visitor.  MyStream::Walker walks over a MyStream message object and passes
 all the fields to the visitor.

 To handle a stream of messages, simply attach a Reader to your own Visitor
 implementation.  Your visitor's methods will then be called 

[protobuf] Re: New protobuf feature proposal: Generated classes for streaming / visitors

2011-02-01 Thread Kenton Varda
On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh jas...@google.com wrote:

 Conceptually this sounds great, the big question to me is whether this
 should be implemented as an option in the compiler or as a separate plugin.
 I haven't taken a thorough look at the patch, but I'd guess it adds a decent
 amount to the core code generator. I have a preference for the plugin
 approach, but of course I'm primarily an internal protobuf user, so I'm
 willing to be convinced otherwise :-) Would using a plugin, possibly even
 shipped with the standard implementation, make this feature too inconvenient
 to use? Or is there enough demand for this that it warrants implementing as
 an option?


First of all, note that this feature is off by default.  You have to turn it
on with the generate_visitors message-level option.  The only new code added
to the base library is a couple templates in WireFormatLite, which are of
course never instantiated if you don't generate visitor code.

There are a few reasons I prefer to make this part of the base code
generator:

- If you look at the patch, you'll see that the code generation for the two
Guide classes actually shares a lot with the code generation for
MergeFromCodedStream and SerializeWithCachedSizes.  To make this a plugin,
either we'd have to expose parts of the C++ code generator internals
publicly (eww) or we'd have to reproduce a lot of code (also eww).

- The Reader and Writer classes directly use WireFormatLite, which is a
private interface.

- It seems clear that this feature is widely desired by open source users.
 We're not talking about a niche use case here.


 Regarding the proposed interfaces: I can imagine some applications where
 the const refs passed to the visitor methods may be too restrictive - the
 user may instead want to take ownership of the object. e.g., suppose the
 stream is a series of requests, and each of the visitor handlers needs to
 start some asynchronous work. It would be good to hear if users have use
 cases that don't quite fit into this model (or at least if the existing use
 cases will work).


Interesting point.  In the Reader case, it's creating new objects, so in
theory it ought to be able to hand off ownership to the Visitor it calls.
 But, the Walker is walking an existing object and thus clearly cannot give
up ownership.  It seems clear that some use cases need const references,
which means that the only way we could support ownership passing is by
adding another parallel set of methods.  I suppose they could have default
implementations that delegate to the const reference versions, in which case
only people who wanted to optimize for them would need to override them.
 But I'd like to see that this is really desired first -- it's easy enough
to add later.

Also note that my code currently doesn't reuse message objects, but
improving it to do so would be straightforward.  A Reader could allocate one
object of each sub-message type for reuse.  But, it seems like that wouldn't
play well with ownership-passing.



 On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda ken...@google.com wrote:

 Hello open source protobuf users,

 *Background*

 Probably the biggest deficiency in the open source protocol buffers
 libraries today is a lack of built-in support for handling streams of
 messages.  True, it's not too hard for users to support it manually, by
 prefixing each message with its size as described here:


 http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming

 However, this is awkward, and typically requires users to reach into the
 low-level CodedInputStream/CodedOutputStream classes and do a lot of work
 manually.  Furthermore, many users want to handle streams
 of heterogeneous message types.  We tell them to wrap their messages in an
 outer type using the union pattern:

   http://code.google.com/apis/protocolbuffers/docs/techniques.html#union

 But this is kind of ugly and has unnecessary overhead.

 These problems never really came up in our internal usage, because inside
 Google we have an RPC system and other utility code which builds on top of
 protocol buffers and provides appropriate abstraction. While we'd like to
 open source this code, a lot of it is large, somewhat messy, and highly
 interdependent with unrelated parts of our environment, and no one has had
 the time to rewrite it all cleanly (as we did with protocol buffers itself).

 *Proposed solution:  Generated Visitors*

 I've been wanting to fix this for some time now, but didn't really have a
 good idea how.  CodedInputStream is annoyingly low-level, but I couldn't
 think of much better an interface for reading a stream of messages off the
 wire.

 A couple weeks ago, though, I realized that I had been failing to consider
 how new kinds of code generation could help this problem.  I was trying to
 think of solutions that would go into the protobuf base library, not
 solutions that were generated by the protocol compiler.

 So then it became