Re: ActiveMQ implementation of protobuf
I've been founding and contributing to open source projects for over nine years now, so I understand your situation. Here are my suggestions for encouraging users to contribute: 1) Folks have language preferences, so ideally the code generator for a language should be written in the language of the implementation. Why? Because then you have a better chance that a user will turn into a contributor since they will be able to grok and be comfortable with all the parts of the implementation, including the code generator. 2) Some enhancements require more drastic changes than others. You should provide an avenue where folks can research and explore the bigger drastic changes within your project. 3) Be more open to contributor feedback. Even if an idea seems wacky at first, encourage the contribution and have it at least go into an experimental branch. Regarding the maven question. Let me first explain the build challenges that most of the project I participate in experience. I spend most of my time working on ActiveMQ, Camel, and ServiceMix. The common thread with these projects is that they are integration technologies. And since they are integration technologies, their goal is to integrate and leverage the strengths of as many technologies as possible. The build challenge this presents is that the laundry list of dependencies that are needed to compile each project is mind boggling. Manually installing all the dependencies is a waste of time. Maven automates dependency downloading and this even includes downloading the maven plugins that are used to compile a maven build. The net result is users of maven builds hardly ever have to worry about having the right prerequisites installed before kicking off the build. Having to exec out to protoc would break that concept. -- Regards, Hiram Blog: http://hiramchirino.com Open Source SOA http://fusesource.com/ On Sep 19, 2:16 am, Kenton Varda ken...@google.com wrote: Somehow I missed that message. Sorry about that. I'd definitely like to have lazy parsing (as an option) in the official implementation. The reason I'm stressing is because there's a lot of these things that I'd like protocol buffers to have, but I don't have enough time to write them all myself, so I need help from contributors. Unfortunately it seems that a lot of people would rather write their own implementations from scratch than try to contribute to the main one -- you aren't the first person who has done this. That said, having competition is a good thing too. Regarding maven plugins -- why can't the plugin just invoke protoc using Runtime.exec()? What's the benefit of having the code generator running inside the Maven process? Honest question -- I don't know very much about Maven. On Fri, Sep 18, 2009 at 7:36 PM, hi...@hiramchirino.com chir...@gmail.comwrote: Firstly, I want to clarify that I did not write the benchmark that I plugged into. There is no ill intent. I published the benchmark so that folks take the time to look into why my implementation performed so much better. I think it's good to have healthy discussions about the pros and cons of alternative implementations which deliver different sets of features. The main reason I started from scratch is that I wanted to implement a java based code generator so that it would be easy to embed in a maven plugin or ant task. Furthermore, It was just more expedient to start from a clean slate and design my ideal object model. I did ping this list over a year ago to gauge if there would be any interest in collaborating, but did not garner interest. So, I did not pursue it further: http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea870... Perhaps I'm misreading you, but it seems like there have been very few ideas that you are actually interested in from my implementation. So I'm not sure why you're stressing about me rolling this out as new implementation. Bottom line, is I would LOVE IT if the google implementation achieves feature parity with mine. That way it's one less code base I need to maintain! Best of luck and if you do change your mind and want to poach any of the concepts or code, please feel free to do so. Regards, Hiram On Sep 18, 9:40 pm, Kenton Varda ken...@google.com wrote: I think the usual way we would have solved this problem at Google would be to have the message payload be encoded separately and embedded in the envelope as a bytes field, e.g.: message Envelope { required string to_address = 1; optional string from_address = 2; required bytes payload = 3; // an encoded message } It's not as transparent as your solution, but it is a whole lot simpler, and the behavior is easy to understand. That said, again, there's nothing preventing lazy parsing from being added to Google's Java protobuf implementation, so I'm not sure why writing something
Re: ActiveMQ implementation of protobuf
Firstly, I want to clarify that I did not write the benchmark that I plugged into. There is no ill intent. I published the benchmark so that folks take the time to look into why my implementation performed so much better. I think it's good to have healthy discussions about the pros and cons of alternative implementations which deliver different sets of features. The main reason I started from scratch is that I wanted to implement a java based code generator so that it would be easy to embed in a maven plugin or ant task. Furthermore, It was just more expedient to start from a clean slate and design my ideal object model. I did ping this list over a year ago to gauge if there would be any interest in collaborating, but did not garner interest. So, I did not pursue it further: http://groups.google.com/group/protobuf/browse_thread/thread/fe7ea8706b40146f/bdd22ddf89e4a6d3?#bdd22ddf89e4a6d3 Perhaps I'm misreading you, but it seems like there have been very few ideas that you are actually interested in from my implementation. So I'm not sure why you're stressing about me rolling this out as new implementation. Bottom line, is I would LOVE IT if the google implementation achieves feature parity with mine. That way it's one less code base I need to maintain! Best of luck and if you do change your mind and want to poach any of the concepts or code, please feel free to do so. Regards, Hiram On Sep 18, 9:40 pm, Kenton Varda ken...@google.com wrote: I think the usual way we would have solved this problem at Google would be to have the message payload be encoded separately and embedded in the envelope as a bytes field, e.g.: message Envelope { required string to_address = 1; optional string from_address = 2; required bytes payload = 3; // an encoded message } It's not as transparent as your solution, but it is a whole lot simpler, and the behavior is easy to understand. That said, again, there's nothing preventing lazy parsing from being added to Google's Java protobuf implementation, so I'm not sure why writing something completely new was necessary. As far as the performance arguments go, I'd again encourage you to create a benchmark that actually measures the performance of the case where the application code ends up accessing all the fields. If you really think there's no significant overhead, prove it. :) I'd also suggest that you not publish benchmarks implying that your implementation is an order of magnitude faster at parsing without explaining what is really going on. It's rather misleading. On Fri, Sep 18, 2009 at 5:53 PM, hi...@hiramchirino.com chir...@gmail.comwrote: Hi Kenton, Let me start off by describing my usage scenario. I'm interested in using protobuf to implement the messaging protocol between clients and servers of a distributed messaging system. For simplicity, lets pretend the that protocol is similar to xmpp and that there are severs which handle delivering messages to and from clients. In this case, the server clearly is not interested in the meat of the messages being sent around. It is typically only interested routing data. In this case, deferred decoding provides a substantial win. Furthermore, when the server passes on the message to the consumer, he does not need to encode the message again. For important messages, the server may be configured to persist those messages as they come in, so the server would once again benefit from not having to encode the message yet again. I don't think the user could implement those optimizations on their own without support from the protobuf implementation. At least not as efficiently and elegantly. You have to realize that the 'free encoding' holds true for even nested message structures in the message. So lets say that the user aggregating data from multiple source protobuf messages and is picking data out of it and placing it into a new protobuf message that then gets encoded. Only the outer message would need encoding, the inner nested element which were picked from the other buffers would benefit from the 'free encoding'. The overhead of the lazy decoding is exactly 1 extra if (bean == null) statement, which is probably cheaper than most virtual dispatch invocations. But if you're really trying to milk the performance out of your app, you should just call buffer.copy() to get the bean backing the buffer. All get operations on the bean do NOT have the overhead. Regarding threading, since the buffer is immutable and decoding is idempotent, you don't really need to worry about thread safety. Worst case scenario is that 2 threads decode the same buffer concurrently and then set the bean field of the buffer. Since the resulting beans are equal, in most cases it would not really matter which thread wins when they overwrite the bean field. As for up front validation, in my use case, deferring