Re: ActiveMQ implementation of protobuf

2009-09-19 Thread hi...@hiramchirino.com

I've been founding and contributing to open source projects for over
nine years now, so I understand your situation.

Here are my suggestions for encouraging users to contribute:

1) Folks have language preferences, so ideally the code generator for
a language should be written in the language of the implementation.
Why?  Because then you have a better chance that a user will turn into
a contributor since they will be able to grok and be comfortable with
all the parts of the implementation, including the code generator.
2) Some enhancements require more drastic changes than others.  You
should provide an avenue where folks can research and explore the
bigger drastic changes within your project.
3) Be more open to contributor feedback.  Even if an idea seems wacky
at first, encourage the contribution and have it at least go into an
experimental branch.

Regarding the maven question.  Let me first explain the build
challenges that most of the project I participate in experience.  I
spend most of my time working on ActiveMQ, Camel, and ServiceMix.  The
common thread with these projects is that they are integration
technologies.  And since they are integration technologies, their goal
is to integrate and leverage the strengths of as many technologies as
possible.

The build challenge this presents is that the laundry list of
dependencies that are needed to compile each project is mind
boggling.  Manually installing all the dependencies is a waste of
time.  Maven automates dependency downloading and this even includes
downloading the maven plugins that are used to compile a maven build.

The net result is users of maven builds hardly ever have to worry
about having the right prerequisites installed before kicking off the
build.  Having to exec out to protoc would break that concept.

--
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://fusesource.com/

On Sep 19, 2:16 am, Kenton Varda ken...@google.com wrote:
 Somehow I missed that message.  Sorry about that.
 I'd definitely like to have lazy parsing (as an option) in the official
 implementation.  The reason I'm stressing is because there's a lot of
 these things that I'd like protocol buffers to have, but I don't have enough
 time to write them all myself, so I need help from contributors.
  Unfortunately it seems that a lot of people would rather write their own
 implementations from scratch than try to contribute to the main one -- you
 aren't the first person who has done this.  That said, having competition is
 a good thing too.

 Regarding maven plugins -- why can't the plugin just invoke protoc using
 Runtime.exec()?  What's the benefit of having the code generator running
 inside the Maven process?  Honest question -- I don't know very much about
 Maven.

 On Fri, Sep 18, 2009 at 7:36 PM, hi...@hiramchirino.com
 chir...@gmail.comwrote:



  Firstly, I want to clarify that I did not write the benchmark that I
  plugged into.  There is no ill intent.  I published the benchmark so
  that folks take the time to look into why my implementation performed
  so much better.  I think it's good to have healthy discussions about
  the pros and cons of alternative implementations which deliver
  different sets of features.

  The main reason I started from scratch is that I wanted to implement a
  java based code generator so that it would be easy to embed in a maven
  plugin or ant task.  Furthermore, It was just more expedient to start
  from a clean slate and design my ideal object model.
  I did ping this list over a year ago to gauge if there would be any
  interest in collaborating, but did not garner interest. So, I did not
  pursue it further:

 http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea870...

  Perhaps I'm misreading you, but it seems like there have been very few
  ideas that you are actually interested in from my implementation.  So
  I'm not sure why you're stressing about me rolling this out as new
  implementation.

  Bottom line, is I would LOVE IT if the google implementation achieves
  feature parity with mine.  That way it's one less code base I need to
  maintain!  Best of luck and if you do change your mind and want to
  poach any of the concepts or code, please feel free to do so.

  Regards,
  Hiram

  On Sep 18, 9:40 pm, Kenton Varda ken...@google.com wrote:
   I think the usual way we would have solved this problem at Google would
  be
   to have the message payload be encoded separately and embedded in the
   envelope as a bytes field, e.g.:
     message Envelope {
       required string to_address = 1;
       optional string from_address = 2;
       required bytes payload = 3;  // an encoded message
     }

   It's not as transparent as your solution, but it is a whole lot simpler,
  and
   the behavior is easy to understand.

   That said, again, there's nothing preventing lazy parsing from being
  added
   to Google's Java protobuf implementation, so I'm not sure why writing
   something

Re: ActiveMQ implementation of protobuf

2009-09-18 Thread hi...@hiramchirino.com

Firstly, I want to clarify that I did not write the benchmark that I
plugged into.  There is no ill intent.  I published the benchmark so
that folks take the time to look into why my implementation performed
so much better.  I think it's good to have healthy discussions about
the pros and cons of alternative implementations which deliver
different sets of features.

The main reason I started from scratch is that I wanted to implement a
java based code generator so that it would be easy to embed in a maven
plugin or ant task.  Furthermore, It was just more expedient to start
from a clean slate and design my ideal object model.
I did ping this list over a year ago to gauge if there would be any
interest in collaborating, but did not garner interest. So, I did not
pursue it further:

http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea8706b40146f/bdd22ddf89e4a6d3?#bdd22ddf89e4a6d3

Perhaps I'm misreading you, but it seems like there have been very few
ideas that you are actually interested in from my implementation.  So
I'm not sure why you're stressing about me rolling this out as new
implementation.

Bottom line, is I would LOVE IT if the google implementation achieves
feature parity with mine.  That way it's one less code base I need to
maintain!  Best of luck and if you do change your mind and want to
poach any of the concepts or code, please feel free to do so.

Regards,
Hiram

On Sep 18, 9:40 pm, Kenton Varda ken...@google.com wrote:
 I think the usual way we would have solved this problem at Google would be
 to have the message payload be encoded separately and embedded in the
 envelope as a bytes field, e.g.:
   message Envelope {
     required string to_address = 1;
     optional string from_address = 2;
     required bytes payload = 3;  // an encoded message
   }

 It's not as transparent as your solution, but it is a whole lot simpler, and
 the behavior is easy to understand.

 That said, again, there's nothing preventing lazy parsing from being added
 to Google's Java protobuf implementation, so I'm not sure why writing
 something completely new was necessary.

 As far as the performance arguments go, I'd again encourage you to create a
 benchmark that actually measures the performance of the case where the
 application code ends up accessing all the fields.  If you really think
 there's no significant overhead, prove it.  :)

 I'd also suggest that you not publish benchmarks implying that your
 implementation is an order of magnitude faster at parsing without explaining
 what is really going on.  It's rather misleading.

 On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com
 chir...@gmail.comwrote:



  Hi Kenton,

  Let me start off by describing my usage scenario.

  I'm interested in using protobuf to implement the messaging protocol
  between clients and servers of a distributed messaging system.  For
  simplicity, lets pretend the that protocol is similar to xmpp and that
  there are severs which handle delivering messages to and from clients.

  In this case, the server clearly is not interested in the meat of the
  messages being sent around.  It is typically only interested routing
  data.  In this case, deferred decoding provides a substantial win.
  Furthermore, when the server passes on the message to the consumer, he
  does not need to encode the message again.  For important messages,
  the server may be configured to persist those messages as they come
  in, so the server would once again benefit from not having to encode
  the message yet again.

  I don't think the user could implement those optimizations on their
  own without support from the protobuf implementation.  At least not as
  efficiently and elegantly.  You have to realize that the 'free
  encoding' holds true for even nested message structures in the
  message.  So lets say that the user aggregating data from multiple
  source protobuf messages and is picking data out of it and placing it
  into a new protobuf message that then gets encoded.  Only the outer
  message would need encoding, the inner nested element which were
  picked from the other buffers would benefit from the 'free encoding'.

  The overhead of the lazy decoding is exactly 1 extra if (bean ==
  null) statement, which is probably cheaper than most virtual dispatch
  invocations.  But if you're really trying to milk the performance out
  of your app, you should just call buffer.copy() to get the bean
  backing the buffer.  All get operations on the bean do NOT have the
  overhead.

  Regarding threading, since the buffer is immutable and decoding is
  idempotent, you don't really need to worry about thread safety.  Worst
  case scenario is that 2 threads decode the same buffer concurrently
  and then set the bean field of the buffer.  Since the resulting beans
  are equal, in most cases it would not really matter which thread wins
  when they overwrite the bean field.

  As for up front validation, in my use case, deferring