ActiveMQ implementation of protobuf

2009-09-18 Thread ijuma

Hey all,

I ran across the following and thought it may be of interest to this
list:

http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation-rocks.html

Best,
Ismael
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Success story issue with reading limited number of bytes from a stream

2009-09-18 Thread Martin Bravenboer

Hi all,

In case you enjoy success stories: at LogicBlox we have recently
started using Protocol Buffers for a protocol between our Datalog
database server (written in C++) and our Datalog compiler (written in
Java). We use protobuf in two different configurations: over sockets
between separate processes, but also for communication via JNI inside
the same process. Amusingly, the use of protobuf was a performance
improvement over our earlier JNI-based implementation, where we
created Java objects directly from C++ using JNI calls. Apparently,
the serialization overhead of Protocol Buffers was lower than the
overhead of the JNI calls! This probably does not apply in general,
but at least for us it did. Thanks to all the developers of protobuf!

We encountered one issue that might be interesting to others.

We send sequences of protobuf messages over a socket. Some of these
messages might be very small. The size of the message is indicated by
a simple header that precedes the serialized protobuf message.

The problem we had was that there is no way to read a limited number
of bytes from the socket input stream on the C++ side. We tried
several alternatives. The most attractive one would be:


  google::protobuf::io::IstreamInputStream zistream(io);
  if(!msg.ParseFromBoundedZeroCopyStream(zistream, size))
-

ParseFromBounded is implemented as:

---
  io::CodedInputStream decoder(input);
  decoder.PushLimit(size);
---

Unfortunately, this hangs in the constructor of CodedInputStream
because insufficient bytes are available to make the call to Refresh
in the constructor of CodedInputStream terminate:

---
  // Eagerly Refresh() so buffer space is immediately available.
  Refresh();
---

So, as far as I know there is currently no way to read a limited
number of bytes from a stream when the remaining bytes on that stream
are not yet available. We resorted to reading the message in a uint8*
buffer separately, and parse the message from that buffer.

Two emails in the mail archive of this mailing list suggest that using
PushLimit should work, so it seems this issue is not widely known.

http://markmail.org/message/fvmubiw5ihwge7wt
http://markmail.org/message/sdjovyr5ng6tjgpm

Thanks again for all the work!
-- 
Martin Bravenboer
LogicBlox

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Success story issue with reading limited number of bytes from a stream

2009-09-18 Thread Martin Bravenboer

Hi Kenton,

For the problem I observed, the Refresh() call hang on a message of 2
bytes. I'll try to reproduce a small example and get back to you.

Cheers,
Martin


On Fri, Sep 18, 2009 at 1:53 PM, Kenton Varda ken...@google.com wrote:
 That Refresh() call should only block if there are no bytes at all available
 on the stream.  But if you're about to read a message, then you expect there
 to be some bytes, right?  Or is it that you're actually receiving a message
 of zero size?  In that case, you could check if the message size is zero
 before calling ParseFromBoundedZeroCopyStream() and skip the call if so.
  Arguably, ParseFromBoundedZeroCopyStream() should itself check for
 zero-size messages and return immediately in this case, without creating a
 CodedInputStream -- I would accept a patch that makes this change.

 On Fri, Sep 18, 2009 at 9:56 AM, Martin Bravenboer
 martin.bravenb...@gmail.com wrote:

 Hi all,

 In case you enjoy success stories: at LogicBlox we have recently
 started using Protocol Buffers for a protocol between our Datalog
 database server (written in C++) and our Datalog compiler (written in
 Java). We use protobuf in two different configurations: over sockets
 between separate processes, but also for communication via JNI inside
 the same process. Amusingly, the use of protobuf was a performance
 improvement over our earlier JNI-based implementation, where we
 created Java objects directly from C++ using JNI calls. Apparently,
 the serialization overhead of Protocol Buffers was lower than the
 overhead of the JNI calls! This probably does not apply in general,
 but at least for us it did. Thanks to all the developers of protobuf!

 We encountered one issue that might be interesting to others.

 We send sequences of protobuf messages over a socket. Some of these
 messages might be very small. The size of the message is indicated by
 a simple header that precedes the serialized protobuf message.

 The problem we had was that there is no way to read a limited number
 of bytes from the socket input stream on the C++ side. We tried
 several alternatives. The most attractive one would be:

 
  google::protobuf::io::IstreamInputStream zistream(io);
  if(!msg.ParseFromBoundedZeroCopyStream(zistream, size))
 -

 ParseFromBounded is implemented as:

 ---
  io::CodedInputStream decoder(input);
  decoder.PushLimit(size);
 ---

 Unfortunately, this hangs in the constructor of CodedInputStream
 because insufficient bytes are available to make the call to Refresh
 in the constructor of CodedInputStream terminate:

 ---
  // Eagerly Refresh() so buffer space is immediately available.
  Refresh();
 ---

 So, as far as I know there is currently no way to read a limited
 number of bytes from a stream when the remaining bytes on that stream
 are not yet available. We resorted to reading the message in a uint8*
 buffer separately, and parse the message from that buffer.

 Two emails in the mail archive of this mailing list suggest that using
 PushLimit should work, so it seems this issue is not widely known.

 http://markmail.org/message/fvmubiw5ihwge7wt
 http://markmail.org/message/sdjovyr5ng6tjgpm

 Thanks again for all the work!
 --
 Martin Bravenboer
 LogicBlox

 





-- 
Martin Bravenboer
LogicBlox

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



serialize message to UDP socket

2009-09-18 Thread jayt0...@gmail.com

Hello all,

I am having trouble figuring out how to serialize data over a socket
utilizing UDP protocol.  I am in C++ environment.  When writing to the
socket without protocol buffers, I use the standard sendto() socket
call which allows me to specify the port and IP address of the
intended receiver of my UDP message.  When trying to send a protocol
buffers message, this seems to be the recommended strategy on the
google docs:

ZeroCopyOutputStream* raw_output   = new FileOutputStream
(sock);
CodedOutputStream*coded_output = new CodedOutputStream
(raw_output);
coded_output-WriteRaw(send_data,strlen(send_data));

There is no way to specify what the port and IP address is here,
analogous to when using the standard sendto() socket writing call.  So
my message never gets received by the intended recipient on the
network.  I am aware that this is a raw message, not a PB message.
Getting this raw message over the network is a first step in
accomplishing the ultimate goal of getting the PB message over the
network.

Is there a way to get all of the bytes of a serialized PB message into
raw form and then send them with sendto()?

Any ideas? Thanks for any help.

Jay


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: serialize message to UDP socket

2009-09-18 Thread jayt0...@gmail.com

One other thing I wanted to say was that I chose to use
CodedOutputStream to send
data because ultimately I have to manually encode a length prefix in
front of my PB message.
With the C++ environment, I understand that this is the only way to do
this (ugh is right; I am sure this is a common problem with using PB
over sockets that remain in use).
I am fully aware that there are methods to serialize directly from the
object but those will not serve my ultimate aim of getting a length
prefix ahead of the data bytes.

Thanks

Jay

On Sep 18, 12:19 pm, jayt0...@gmail.com jayt0...@gmail.com wrote:
 Hello all,

 I am having trouble figuring out how to serialize data over a socket
 utilizing UDP protocol.  I am in C++ environment.  When writing to the
 socket without protocol buffers, I use the standard sendto() socket
 call which allows me to specify the port and IP address of the
 intended receiver of my UDP message.  When trying to send a protocol
 buffers message, this seems to be the recommended strategy on the
 google docs:

         ZeroCopyOutputStream* raw_output   = new FileOutputStream
 (sock);
         CodedOutputStream*    coded_output = new CodedOutputStream
 (raw_output);
         coded_output-WriteRaw(send_data,strlen(send_data));

 There is no way to specify what the port and IP address is here,
 analogous to when using the standard sendto() socket writing call.  So
 my message never gets received by the intended recipient on the
 network.  I am aware that this is a raw message, not a PB message.
 Getting this raw message over the network is a first step in
 accomplishing the ultimate goal of getting the PB message over the
 network.

 Is there a way to get all of the bytes of a serialized PB message into
 raw form and then send them with sendto()?

 Any ideas? Thanks for any help.

 Jay
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: ActiveMQ implementation of protobuf

2009-09-18 Thread Kenton Varda
Hmm, your bean and buffer classes sound conceptually equivalent to my
builder and message classes.
Regarding lazy parsing, this is certainly something we've considered before,
but it introduces a lot of problems:

1) Every getter method must now first check whether the message is parsed,
and parse it if not.  Worse, for proper thread safety it really needs to
lock a mutex while performing this check.  For a fair comparison of parsing
speed, you really need another benchmark which measures the speed of
accessing all the fields of the message.  I think you'll find that parsing a
message *and* accessing all its fields is significantly slower with the lazy
approach.  Your approach might be faster in the case of a very deep message
in which the user only wants to access a few shallow fields, but I think
this case is relatively uncommon.

2) What happens if the message is invalid?  The user will probably expect
that calling simple getter methods will not throw parse exceptions, and
probably isn't in a good position to handle these exceptions.  You really
want to detect parse errors at parse time, not later on down the road.

We might add lazy parsing to the official implementation at some point.
 However, the approach we'd probably take is to use it only on fields which
are explicitly marked with a [lazy=true] option.  Developers would use
this to indicate fields for which the performance trade-offs favor lazy
parsing, and they are willing to deal with delayed error-checking.

In your blog post you also mention that encoding the same message object
multiple times without modifying it in between, or parsing a message and
then serializing it without modification, is free...  but how often does
this happen in practice?  These seem like unlikely cases, and easy for the
user to optimize on their own without support from the protobuf
implementation.

On Fri, Sep 18, 2009 at 3:15 PM, hi...@hiramchirino.com
chir...@gmail.comwrote:


 Hi Kenton,

 Your right, the reason that one benchmark has those results is because
 the implementation does lazy decoding.  While lazy decoding is nice, I
 think that implementation has a couple of other features which are
 equally as nice.  See more details about it them here:


 http://hiramchirino.com/blog/2009/09/activemq-protobuf-implemtation-features.html

 It would have hard to impossible to implement some of the stuff
 without the completely different class structure it uses.  I'd be
 happy if it's features could be absorbed into the official
 implementation.  I'm just not sure how you could do that and maintain
 compatibility with your existing users.

 If you have any suggestions of how we can integrate better please
 advise.

 Regards,
 Hiram

 On Sep 18, 12:34 pm, Kenton Varda ken...@google.com wrote:
  So, his implementation is a little bit faster in two of the benchmarks,
 and
  impossibly faster in the other one.  I don't really believe that it's
  possible to improve parsing time by as much as he claims, except by doing
  something like lazy parsing, which would just be deferring the work to
 later
  on.  Would have been nice if he'd contributed his optimizations back to
 the
  official implementation rather than write a whole new one...
 
  On Fri, Sep 18, 2009 at 1:38 AM, ijuma ism...@juma.me.uk wrote:
 
   Hey all,
 
   I ran across the following and thought it may be of interest to this
   list:
 
  http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation.
 ..
 
   Best,
   Ismael
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: ActiveMQ implementation of protobuf

2009-09-18 Thread Kenton Varda
I think the usual way we would have solved this problem at Google would be
to have the message payload be encoded separately and embedded in the
envelope as a bytes field, e.g.:
  message Envelope {
required string to_address = 1;
optional string from_address = 2;
required bytes payload = 3;  // an encoded message
  }

It's not as transparent as your solution, but it is a whole lot simpler, and
the behavior is easy to understand.

That said, again, there's nothing preventing lazy parsing from being added
to Google's Java protobuf implementation, so I'm not sure why writing
something completely new was necessary.

As far as the performance arguments go, I'd again encourage you to create a
benchmark that actually measures the performance of the case where the
application code ends up accessing all the fields.  If you really think
there's no significant overhead, prove it.  :)

I'd also suggest that you not publish benchmarks implying that your
implementation is an order of magnitude faster at parsing without explaining
what is really going on.  It's rather misleading.

On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com
chir...@gmail.comwrote:


 Hi Kenton,

 Let me start off by describing my usage scenario.

 I'm interested in using protobuf to implement the messaging protocol
 between clients and servers of a distributed messaging system.  For
 simplicity, lets pretend the that protocol is similar to xmpp and that
 there are severs which handle delivering messages to and from clients.

 In this case, the server clearly is not interested in the meat of the
 messages being sent around.  It is typically only interested routing
 data.  In this case, deferred decoding provides a substantial win.
 Furthermore, when the server passes on the message to the consumer, he
 does not need to encode the message again.  For important messages,
 the server may be configured to persist those messages as they come
 in, so the server would once again benefit from not having to encode
 the message yet again.

 I don't think the user could implement those optimizations on their
 own without support from the protobuf implementation.  At least not as
 efficiently and elegantly.  You have to realize that the 'free
 encoding' holds true for even nested message structures in the
 message.  So lets say that the user aggregating data from multiple
 source protobuf messages and is picking data out of it and placing it
 into a new protobuf message that then gets encoded.  Only the outer
 message would need encoding, the inner nested element which were
 picked from the other buffers would benefit from the 'free encoding'.

 The overhead of the lazy decoding is exactly 1 extra if (bean ==
 null) statement, which is probably cheaper than most virtual dispatch
 invocations.  But if you're really trying to milk the performance out
 of your app, you should just call buffer.copy() to get the bean
 backing the buffer.  All get operations on the bean do NOT have the
 overhead.

 Regarding threading, since the buffer is immutable and decoding is
 idempotent, you don't really need to worry about thread safety.  Worst
 case scenario is that 2 threads decode the same buffer concurrently
 and then set the bean field of the buffer.  Since the resulting beans
 are equal, in most cases it would not really matter which thread wins
 when they overwrite the bean field.

 As for up front validation, in my use case, deferring validation is a
 feature.  The less work the server has to do the better since, it will
 help scale vertically.  I do agree that in some use cases it would be
 desirable to fully validate up front.  I think it should be up to the
 application to decide if it wants up front validation or deferred
 decoding.  For example, it would be likely that the client of the
 messaging protocol would opt for up front validation.   On the other
 hand, the server would use deferred decoding.  It's definitely a
 performance versus consistency trade-off.

 I think that once you make 'free encoding', and deferred decoding an
 option, users that have high performance use cases will design their
 application so that they can exploit those features as much as
 possible.

 --
 Regards,
 Hiram

 Blog: http://hiramchirino.com

 Open Source SOA
 http://fusesource.com/

 On Sep 18, 6:43 pm, Kenton Varda ken...@google.com wrote:
  Hmm, your bean and buffer classes sound conceptually equivalent to my
  builder and message classes.
  Regarding lazy parsing, this is certainly something we've considered
 before,
  but it introduces a lot of problems:
 
  1) Every getter method must now first check whether the message is
 parsed,
  and parse it if not.  Worse, for proper thread safety it really needs to
  lock a mutex while performing this check.  For a fair comparison of
 parsing
  speed, you really need another benchmark which measures the speed of
  accessing all the fields of the message.  I think you'll find that
 parsing a
  message *and* 

Re: ActiveMQ implementation of protobuf

2009-09-18 Thread hi...@hiramchirino.com

Firstly, I want to clarify that I did not write the benchmark that I
plugged into.  There is no ill intent.  I published the benchmark so
that folks take the time to look into why my implementation performed
so much better.  I think it's good to have healthy discussions about
the pros and cons of alternative implementations which deliver
different sets of features.

The main reason I started from scratch is that I wanted to implement a
java based code generator so that it would be easy to embed in a maven
plugin or ant task.  Furthermore, It was just more expedient to start
from a clean slate and design my ideal object model.
I did ping this list over a year ago to gauge if there would be any
interest in collaborating, but did not garner interest. So, I did not
pursue it further:

http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea8706b40146f/bdd22ddf89e4a6d3?#bdd22ddf89e4a6d3

Perhaps I'm misreading you, but it seems like there have been very few
ideas that you are actually interested in from my implementation.  So
I'm not sure why you're stressing about me rolling this out as new
implementation.

Bottom line, is I would LOVE IT if the google implementation achieves
feature parity with mine.  That way it's one less code base I need to
maintain!  Best of luck and if you do change your mind and want to
poach any of the concepts or code, please feel free to do so.

Regards,
Hiram

On Sep 18, 9:40 pm, Kenton Varda ken...@google.com wrote:
 I think the usual way we would have solved this problem at Google would be
 to have the message payload be encoded separately and embedded in the
 envelope as a bytes field, e.g.:
   message Envelope {
     required string to_address = 1;
     optional string from_address = 2;
     required bytes payload = 3;  // an encoded message
   }

 It's not as transparent as your solution, but it is a whole lot simpler, and
 the behavior is easy to understand.

 That said, again, there's nothing preventing lazy parsing from being added
 to Google's Java protobuf implementation, so I'm not sure why writing
 something completely new was necessary.

 As far as the performance arguments go, I'd again encourage you to create a
 benchmark that actually measures the performance of the case where the
 application code ends up accessing all the fields.  If you really think
 there's no significant overhead, prove it.  :)

 I'd also suggest that you not publish benchmarks implying that your
 implementation is an order of magnitude faster at parsing without explaining
 what is really going on.  It's rather misleading.

 On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com
 chir...@gmail.comwrote:



  Hi Kenton,

  Let me start off by describing my usage scenario.

  I'm interested in using protobuf to implement the messaging protocol
  between clients and servers of a distributed messaging system.  For
  simplicity, lets pretend the that protocol is similar to xmpp and that
  there are severs which handle delivering messages to and from clients.

  In this case, the server clearly is not interested in the meat of the
  messages being sent around.  It is typically only interested routing
  data.  In this case, deferred decoding provides a substantial win.
  Furthermore, when the server passes on the message to the consumer, he
  does not need to encode the message again.  For important messages,
  the server may be configured to persist those messages as they come
  in, so the server would once again benefit from not having to encode
  the message yet again.

  I don't think the user could implement those optimizations on their
  own without support from the protobuf implementation.  At least not as
  efficiently and elegantly.  You have to realize that the 'free
  encoding' holds true for even nested message structures in the
  message.  So lets say that the user aggregating data from multiple
  source protobuf messages and is picking data out of it and placing it
  into a new protobuf message that then gets encoded.  Only the outer
  message would need encoding, the inner nested element which were
  picked from the other buffers would benefit from the 'free encoding'.

  The overhead of the lazy decoding is exactly 1 extra if (bean ==
  null) statement, which is probably cheaper than most virtual dispatch
  invocations.  But if you're really trying to milk the performance out
  of your app, you should just call buffer.copy() to get the bean
  backing the buffer.  All get operations on the bean do NOT have the
  overhead.

  Regarding threading, since the buffer is immutable and decoding is
  idempotent, you don't really need to worry about thread safety.  Worst
  case scenario is that 2 threads decode the same buffer concurrently
  and then set the bean field of the buffer.  Since the resulting beans
  are equal, in most cases it would not really matter which thread wins
  when they overwrite the bean field.

  As for up front validation, in my use case, deferring 

Email Processor Job-work At Home.Make $200+ Daily

2009-09-18 Thread Delsie Odebralski

Would you like to be your own boss and work from home?We guarantee
that you will be paid $15 to $25 per email that you process
successfully.
http://processemailonline.com
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---