ActiveMQ implementation of protobuf
Hey all, I ran across the following and thought it may be of interest to this list: http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation-rocks.html Best, Ismael --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Success story issue with reading limited number of bytes from a stream
Hi all, In case you enjoy success stories: at LogicBlox we have recently started using Protocol Buffers for a protocol between our Datalog database server (written in C++) and our Datalog compiler (written in Java). We use protobuf in two different configurations: over sockets between separate processes, but also for communication via JNI inside the same process. Amusingly, the use of protobuf was a performance improvement over our earlier JNI-based implementation, where we created Java objects directly from C++ using JNI calls. Apparently, the serialization overhead of Protocol Buffers was lower than the overhead of the JNI calls! This probably does not apply in general, but at least for us it did. Thanks to all the developers of protobuf! We encountered one issue that might be interesting to others. We send sequences of protobuf messages over a socket. Some of these messages might be very small. The size of the message is indicated by a simple header that precedes the serialized protobuf message. The problem we had was that there is no way to read a limited number of bytes from the socket input stream on the C++ side. We tried several alternatives. The most attractive one would be: google::protobuf::io::IstreamInputStream zistream(io); if(!msg.ParseFromBoundedZeroCopyStream(zistream, size)) - ParseFromBounded is implemented as: --- io::CodedInputStream decoder(input); decoder.PushLimit(size); --- Unfortunately, this hangs in the constructor of CodedInputStream because insufficient bytes are available to make the call to Refresh in the constructor of CodedInputStream terminate: --- // Eagerly Refresh() so buffer space is immediately available. Refresh(); --- So, as far as I know there is currently no way to read a limited number of bytes from a stream when the remaining bytes on that stream are not yet available. We resorted to reading the message in a uint8* buffer separately, and parse the message from that buffer. Two emails in the mail archive of this mailing list suggest that using PushLimit should work, so it seems this issue is not widely known. http://markmail.org/message/fvmubiw5ihwge7wt http://markmail.org/message/sdjovyr5ng6tjgpm Thanks again for all the work! -- Martin Bravenboer LogicBlox --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Success story issue with reading limited number of bytes from a stream
Hi Kenton, For the problem I observed, the Refresh() call hang on a message of 2 bytes. I'll try to reproduce a small example and get back to you. Cheers, Martin On Fri, Sep 18, 2009 at 1:53 PM, Kenton Varda ken...@google.com wrote: That Refresh() call should only block if there are no bytes at all available on the stream. But if you're about to read a message, then you expect there to be some bytes, right? Or is it that you're actually receiving a message of zero size? In that case, you could check if the message size is zero before calling ParseFromBoundedZeroCopyStream() and skip the call if so. Arguably, ParseFromBoundedZeroCopyStream() should itself check for zero-size messages and return immediately in this case, without creating a CodedInputStream -- I would accept a patch that makes this change. On Fri, Sep 18, 2009 at 9:56 AM, Martin Bravenboer martin.bravenb...@gmail.com wrote: Hi all, In case you enjoy success stories: at LogicBlox we have recently started using Protocol Buffers for a protocol between our Datalog database server (written in C++) and our Datalog compiler (written in Java). We use protobuf in two different configurations: over sockets between separate processes, but also for communication via JNI inside the same process. Amusingly, the use of protobuf was a performance improvement over our earlier JNI-based implementation, where we created Java objects directly from C++ using JNI calls. Apparently, the serialization overhead of Protocol Buffers was lower than the overhead of the JNI calls! This probably does not apply in general, but at least for us it did. Thanks to all the developers of protobuf! We encountered one issue that might be interesting to others. We send sequences of protobuf messages over a socket. Some of these messages might be very small. The size of the message is indicated by a simple header that precedes the serialized protobuf message. The problem we had was that there is no way to read a limited number of bytes from the socket input stream on the C++ side. We tried several alternatives. The most attractive one would be: google::protobuf::io::IstreamInputStream zistream(io); if(!msg.ParseFromBoundedZeroCopyStream(zistream, size)) - ParseFromBounded is implemented as: --- io::CodedInputStream decoder(input); decoder.PushLimit(size); --- Unfortunately, this hangs in the constructor of CodedInputStream because insufficient bytes are available to make the call to Refresh in the constructor of CodedInputStream terminate: --- // Eagerly Refresh() so buffer space is immediately available. Refresh(); --- So, as far as I know there is currently no way to read a limited number of bytes from a stream when the remaining bytes on that stream are not yet available. We resorted to reading the message in a uint8* buffer separately, and parse the message from that buffer. Two emails in the mail archive of this mailing list suggest that using PushLimit should work, so it seems this issue is not widely known. http://markmail.org/message/fvmubiw5ihwge7wt http://markmail.org/message/sdjovyr5ng6tjgpm Thanks again for all the work! -- Martin Bravenboer LogicBlox -- Martin Bravenboer LogicBlox --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
serialize message to UDP socket
Hello all, I am having trouble figuring out how to serialize data over a socket utilizing UDP protocol. I am in C++ environment. When writing to the socket without protocol buffers, I use the standard sendto() socket call which allows me to specify the port and IP address of the intended receiver of my UDP message. When trying to send a protocol buffers message, this seems to be the recommended strategy on the google docs: ZeroCopyOutputStream* raw_output = new FileOutputStream (sock); CodedOutputStream*coded_output = new CodedOutputStream (raw_output); coded_output-WriteRaw(send_data,strlen(send_data)); There is no way to specify what the port and IP address is here, analogous to when using the standard sendto() socket writing call. So my message never gets received by the intended recipient on the network. I am aware that this is a raw message, not a PB message. Getting this raw message over the network is a first step in accomplishing the ultimate goal of getting the PB message over the network. Is there a way to get all of the bytes of a serialized PB message into raw form and then send them with sendto()? Any ideas? Thanks for any help. Jay --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: serialize message to UDP socket
One other thing I wanted to say was that I chose to use CodedOutputStream to send data because ultimately I have to manually encode a length prefix in front of my PB message. With the C++ environment, I understand that this is the only way to do this (ugh is right; I am sure this is a common problem with using PB over sockets that remain in use). I am fully aware that there are methods to serialize directly from the object but those will not serve my ultimate aim of getting a length prefix ahead of the data bytes. Thanks Jay On Sep 18, 12:19 pm, jayt0...@gmail.com jayt0...@gmail.com wrote: Hello all, I am having trouble figuring out how to serialize data over a socket utilizing UDP protocol. I am in C++ environment. When writing to the socket without protocol buffers, I use the standard sendto() socket call which allows me to specify the port and IP address of the intended receiver of my UDP message. When trying to send a protocol buffers message, this seems to be the recommended strategy on the google docs: ZeroCopyOutputStream* raw_output = new FileOutputStream (sock); CodedOutputStream* coded_output = new CodedOutputStream (raw_output); coded_output-WriteRaw(send_data,strlen(send_data)); There is no way to specify what the port and IP address is here, analogous to when using the standard sendto() socket writing call. So my message never gets received by the intended recipient on the network. I am aware that this is a raw message, not a PB message. Getting this raw message over the network is a first step in accomplishing the ultimate goal of getting the PB message over the network. Is there a way to get all of the bytes of a serialized PB message into raw form and then send them with sendto()? Any ideas? Thanks for any help. Jay --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: ActiveMQ implementation of protobuf
Hmm, your bean and buffer classes sound conceptually equivalent to my builder and message classes. Regarding lazy parsing, this is certainly something we've considered before, but it introduces a lot of problems: 1) Every getter method must now first check whether the message is parsed, and parse it if not. Worse, for proper thread safety it really needs to lock a mutex while performing this check. For a fair comparison of parsing speed, you really need another benchmark which measures the speed of accessing all the fields of the message. I think you'll find that parsing a message *and* accessing all its fields is significantly slower with the lazy approach. Your approach might be faster in the case of a very deep message in which the user only wants to access a few shallow fields, but I think this case is relatively uncommon. 2) What happens if the message is invalid? The user will probably expect that calling simple getter methods will not throw parse exceptions, and probably isn't in a good position to handle these exceptions. You really want to detect parse errors at parse time, not later on down the road. We might add lazy parsing to the official implementation at some point. However, the approach we'd probably take is to use it only on fields which are explicitly marked with a [lazy=true] option. Developers would use this to indicate fields for which the performance trade-offs favor lazy parsing, and they are willing to deal with delayed error-checking. In your blog post you also mention that encoding the same message object multiple times without modifying it in between, or parsing a message and then serializing it without modification, is free... but how often does this happen in practice? These seem like unlikely cases, and easy for the user to optimize on their own without support from the protobuf implementation. On Fri, Sep 18, 2009 at 3:15 PM, hi...@hiramchirino.com chir...@gmail.comwrote: Hi Kenton, Your right, the reason that one benchmark has those results is because the implementation does lazy decoding. While lazy decoding is nice, I think that implementation has a couple of other features which are equally as nice. See more details about it them here: http://hiramchirino.com/blog/2009/09/activemq-protobuf-implemtation-features.html It would have hard to impossible to implement some of the stuff without the completely different class structure it uses. I'd be happy if it's features could be absorbed into the official implementation. I'm just not sure how you could do that and maintain compatibility with your existing users. If you have any suggestions of how we can integrate better please advise. Regards, Hiram On Sep 18, 12:34 pm, Kenton Varda ken...@google.com wrote: So, his implementation is a little bit faster in two of the benchmarks, and impossibly faster in the other one. I don't really believe that it's possible to improve parsing time by as much as he claims, except by doing something like lazy parsing, which would just be deferring the work to later on. Would have been nice if he'd contributed his optimizations back to the official implementation rather than write a whole new one... On Fri, Sep 18, 2009 at 1:38 AM, ijuma ism...@juma.me.uk wrote: Hey all, I ran across the following and thought it may be of interest to this list: http://hiramchirino.com/blog/2009/09/activemq-protobuf-implementation. .. Best, Ismael --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: ActiveMQ implementation of protobuf
I think the usual way we would have solved this problem at Google would be to have the message payload be encoded separately and embedded in the envelope as a bytes field, e.g.: message Envelope { required string to_address = 1; optional string from_address = 2; required bytes payload = 3; // an encoded message } It's not as transparent as your solution, but it is a whole lot simpler, and the behavior is easy to understand. That said, again, there's nothing preventing lazy parsing from being added to Google's Java protobuf implementation, so I'm not sure why writing something completely new was necessary. As far as the performance arguments go, I'd again encourage you to create a benchmark that actually measures the performance of the case where the application code ends up accessing all the fields. If you really think there's no significant overhead, prove it. :) I'd also suggest that you not publish benchmarks implying that your implementation is an order of magnitude faster at parsing without explaining what is really going on. It's rather misleading. On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com chir...@gmail.comwrote: Hi Kenton, Let me start off by describing my usage scenario. I'm interested in using protobuf to implement the messaging protocol between clients and servers of a distributed messaging system. For simplicity, lets pretend the that protocol is similar to xmpp and that there are severs which handle delivering messages to and from clients. In this case, the server clearly is not interested in the meat of the messages being sent around. It is typically only interested routing data. In this case, deferred decoding provides a substantial win. Furthermore, when the server passes on the message to the consumer, he does not need to encode the message again. For important messages, the server may be configured to persist those messages as they come in, so the server would once again benefit from not having to encode the message yet again. I don't think the user could implement those optimizations on their own without support from the protobuf implementation. At least not as efficiently and elegantly. You have to realize that the 'free encoding' holds true for even nested message structures in the message. So lets say that the user aggregating data from multiple source protobuf messages and is picking data out of it and placing it into a new protobuf message that then gets encoded. Only the outer message would need encoding, the inner nested element which were picked from the other buffers would benefit from the 'free encoding'. The overhead of the lazy decoding is exactly 1 extra if (bean == null) statement, which is probably cheaper than most virtual dispatch invocations. But if you're really trying to milk the performance out of your app, you should just call buffer.copy() to get the bean backing the buffer. All get operations on the bean do NOT have the overhead. Regarding threading, since the buffer is immutable and decoding is idempotent, you don't really need to worry about thread safety. Worst case scenario is that 2 threads decode the same buffer concurrently and then set the bean field of the buffer. Since the resulting beans are equal, in most cases it would not really matter which thread wins when they overwrite the bean field. As for up front validation, in my use case, deferring validation is a feature. The less work the server has to do the better since, it will help scale vertically. I do agree that in some use cases it would be desirable to fully validate up front. I think it should be up to the application to decide if it wants up front validation or deferred decoding. For example, it would be likely that the client of the messaging protocol would opt for up front validation. On the other hand, the server would use deferred decoding. It's definitely a performance versus consistency trade-off. I think that once you make 'free encoding', and deferred decoding an option, users that have high performance use cases will design their application so that they can exploit those features as much as possible. -- Regards, Hiram Blog: http://hiramchirino.com Open Source SOA http://fusesource.com/ On Sep 18, 6:43 pm, Kenton Varda ken...@google.com wrote: Hmm, your bean and buffer classes sound conceptually equivalent to my builder and message classes. Regarding lazy parsing, this is certainly something we've considered before, but it introduces a lot of problems: 1) Every getter method must now first check whether the message is parsed, and parse it if not. Worse, for proper thread safety it really needs to lock a mutex while performing this check. For a fair comparison of parsing speed, you really need another benchmark which measures the speed of accessing all the fields of the message. I think you'll find that parsing a message *and*
Re: ActiveMQ implementation of protobuf
Firstly, I want to clarify that I did not write the benchmark that I plugged into. There is no ill intent. I published the benchmark so that folks take the time to look into why my implementation performed so much better. I think it's good to have healthy discussions about the pros and cons of alternative implementations which deliver different sets of features. The main reason I started from scratch is that I wanted to implement a java based code generator so that it would be easy to embed in a maven plugin or ant task. Furthermore, It was just more expedient to start from a clean slate and design my ideal object model. I did ping this list over a year ago to gauge if there would be any interest in collaborating, but did not garner interest. So, I did not pursue it further: http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea8706b40146f/bdd22ddf89e4a6d3?#bdd22ddf89e4a6d3 Perhaps I'm misreading you, but it seems like there have been very few ideas that you are actually interested in from my implementation. So I'm not sure why you're stressing about me rolling this out as new implementation. Bottom line, is I would LOVE IT if the google implementation achieves feature parity with mine. That way it's one less code base I need to maintain! Best of luck and if you do change your mind and want to poach any of the concepts or code, please feel free to do so. Regards, Hiram On Sep 18, 9:40 pm, Kenton Varda ken...@google.com wrote: I think the usual way we would have solved this problem at Google would be to have the message payload be encoded separately and embedded in the envelope as a bytes field, e.g.: message Envelope { required string to_address = 1; optional string from_address = 2; required bytes payload = 3; // an encoded message } It's not as transparent as your solution, but it is a whole lot simpler, and the behavior is easy to understand. That said, again, there's nothing preventing lazy parsing from being added to Google's Java protobuf implementation, so I'm not sure why writing something completely new was necessary. As far as the performance arguments go, I'd again encourage you to create a benchmark that actually measures the performance of the case where the application code ends up accessing all the fields. If you really think there's no significant overhead, prove it. :) I'd also suggest that you not publish benchmarks implying that your implementation is an order of magnitude faster at parsing without explaining what is really going on. It's rather misleading. On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com chir...@gmail.comwrote: Hi Kenton, Let me start off by describing my usage scenario. I'm interested in using protobuf to implement the messaging protocol between clients and servers of a distributed messaging system. For simplicity, lets pretend the that protocol is similar to xmpp and that there are severs which handle delivering messages to and from clients. In this case, the server clearly is not interested in the meat of the messages being sent around. It is typically only interested routing data. In this case, deferred decoding provides a substantial win. Furthermore, when the server passes on the message to the consumer, he does not need to encode the message again. For important messages, the server may be configured to persist those messages as they come in, so the server would once again benefit from not having to encode the message yet again. I don't think the user could implement those optimizations on their own without support from the protobuf implementation. At least not as efficiently and elegantly. You have to realize that the 'free encoding' holds true for even nested message structures in the message. So lets say that the user aggregating data from multiple source protobuf messages and is picking data out of it and placing it into a new protobuf message that then gets encoded. Only the outer message would need encoding, the inner nested element which were picked from the other buffers would benefit from the 'free encoding'. The overhead of the lazy decoding is exactly 1 extra if (bean == null) statement, which is probably cheaper than most virtual dispatch invocations. But if you're really trying to milk the performance out of your app, you should just call buffer.copy() to get the bean backing the buffer. All get operations on the bean do NOT have the overhead. Regarding threading, since the buffer is immutable and decoding is idempotent, you don't really need to worry about thread safety. Worst case scenario is that 2 threads decode the same buffer concurrently and then set the bean field of the buffer. Since the resulting beans are equal, in most cases it would not really matter which thread wins when they overwrite the bean field. As for up front validation, in my use case, deferring
Email Processor Job-work At Home.Make $200+ Daily
Would you like to be your own boss and work from home?We guarantee that you will be paid $15 to $25 per email that you process successfully. http://processemailonline.com --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---