Hi Kenton,

Let me start off by describing my usage scenario.

I'm interested in using protobuf to implement the messaging protocol
between clients and servers of a distributed messaging system.  For
simplicity, lets pretend the that protocol is similar to xmpp and that
there are severs which handle delivering messages to and from clients.

In this case, the server clearly is not interested in the meat of the
messages being sent around.  It is typically only interested routing
data.  In this case, deferred decoding provides a substantial win.
Furthermore, when the server passes on the message to the consumer, he
does not need to encode the message again.  For important messages,
the server may be configured to persist those messages as they come
in, so the server would once again benefit from not having to encode
the message yet again.

I don't think the user could implement those optimizations on their
own without support from the protobuf implementation.  At least not as
efficiently and elegantly.  You have to realize that the 'free
encoding' holds true for even nested message structures in the
message.  So lets say that the user aggregating data from multiple
source protobuf messages and is picking data out of it and placing it
into a new protobuf message that then gets encoded.  Only the outer
message would need encoding, the inner nested element which were
picked from the other buffers would benefit from the 'free encoding'.

The overhead of the lazy decoding is exactly 1 extra "if (bean ==
null)" statement, which is probably cheaper than most virtual dispatch
invocations.  But if you're really trying to milk the performance out
of your app, you should just call buffer.copy() to get the bean
backing the buffer.  All get operations on the bean do NOT have the
overhead.

Regarding threading, since the buffer is immutable and decoding is
idempotent, you don't really need to worry about thread safety.  Worst
case scenario is that 2 threads decode the same buffer concurrently
and then set the bean field of the buffer.  Since the resulting beans
are equal, in most cases it would not really matter which thread wins
when they overwrite the bean field.

As for up front validation, in my use case, deferring validation is a
feature.  The less work the server has to do the better since, it will
help scale vertically.  I do agree that in some use cases it would be
desirable to fully validate up front.  I think it should be up to the
application to decide if it wants up front validation or deferred
decoding.  For example, it would be likely that the client of the
messaging protocol would opt for up front validation.   On the other
hand, the server would use deferred decoding.  It's definitely a
performance versus consistency trade-off.

I think that once you make 'free encoding', and deferred decoding an
option, users that have high performance use cases will design their
application so that they can exploit those features as much as
possible.

--
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://fusesource.com/

On Sep 18, 6:43 pm, Kenton Varda <ken...@google.com> wrote:
> Hmm, your bean and buffer classes sound conceptually equivalent to my
> builder and message classes.
> Regarding lazy parsing, this is certainly something we've considered before,
> but it introduces a lot of problems:
>
> 1) Every getter method must now first check whether the message is parsed,
> and parse it if not.  Worse, for proper thread safety it really needs to
> lock a mutex while performing this check.  For a fair comparison of parsing
> speed, you really need another benchmark which measures the speed of
> accessing all the fields of the message.  I think you'll find that parsing a
> message *and* accessing all its fields is significantly slower with the lazy
> approach.  Your approach might be faster in the case of a very deep message
> in which the user only wants to access a few shallow fields, but I think
> this case is relatively uncommon.
>
> 2) What happens if the message is invalid?  The user will probably expect
> that calling simple getter methods will not throw parse exceptions, and
> probably isn't in a good position to handle these exceptions.  You really
> want to detect parse errors at parse time, not later on down the road.
>
> We might add lazy parsing to the official implementation at some point.
>  However, the approach we'd probably take is to use it only on fields which
> are explicitly marked with a "[lazy=true]" option.  Developers would use
> this to indicate fields for which the performance trade-offs favor lazy
> parsing, and they are willing to deal with delayed error-checking.
>
> In your blog post you also mention that encoding the same message object
> multiple times without modifying it in between, or parsing a message and
> then serializing it without modification, is "free"...  but how often does
> this happen in practice?  These seem like unlikely cases, and easy for the
> user to optimize on their own without support from the protobuf
> implementation.
>
> On Fri, Sep 18, 2009 at 3:15 PM, hi...@hiramchirino.com
> <chir...@gmail.com>wrote:
>
>
>
> > Hi Kenton,
>
> > Your right, the reason that one benchmark has those results is because
> > the implementation does lazy decoding.  While lazy decoding is nice, I
> > think that implementation has a couple of other features which are
> > equally as nice.  See more details about it them here:
>
> >http://hiramchirino.com/blog/2009/09/activemq-protobuf-implemtation-f...
>
> > It would have hard to impossible to implement some of the stuff
> > without the completely different class structure it uses.  I'd be
> > happy if it's features could be absorbed into the official
> > implementation.  I'm just not sure how you could do that and maintain
> > compatibility with your existing users.
>
> > If you have any suggestions of how we can integrate better please
> > advise.
>
> > Regards,
> > Hiram
>
> > On Sep 18, 12:34 pm, Kenton Varda <ken...@google.com> wrote:
> > > So, his implementation is a little bit faster in two of the benchmarks,
> > and
> > > impossibly faster in the other one.  I don't really believe that it's
> > > possible to improve parsing time by as much as he claims, except by doing
> > > something like lazy parsing, which would just be deferring the work to
> > later
> > > on.  Would have been nice if he'd contributed his optimizations back to
> > the
> > > official implementation rather than write a whole new one...
>
> > > On Fri, Sep 18, 2009 at 1:38 AM, ijuma <ism...@juma.me.uk> wrote:
>
> > > > Hey all,
>
> > > > I ran across the following and thought it may be of interest to this
> > > > list:
>
> > > >http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation.
> > ..
>
> > > > Best,
> > > > Ismael
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to