[protobuf] Customizing protoc to support an existing wire format

2014-05-27 Thread Chris Beams
Hello,

I'd like to see if any prior work has been done in customizing protobuf 
compilation to support message encoding/decoding against a legacy wire 
format.

Put another way, I'm interested in:

 1. specifying an existing protocol using protobuf's .proto file syntax, and
 2. reusing protobuf's .proto file parsing and code generation 
infrastructure, while
 3. replacing protobuf's default encoding algorithm and replacing it with 
one that conforms to an existing format.

This discussion from 2013 is the closest thing I've found to a similar 
question on this mailing list. Unfortunately it doesn't go into much detail:

 https://groups.google.com/forum/#!topic/protobuf/zvughVLk6BU

Some context will probably be of use. The existing wire format in question 
is that of Bitcoin's peer-to-peer network protocol. These messages and 
their binary representations are defined in this document:

 https://en.bitcoin.it/wiki/Protocol_specification#Message_types

Note that protocol buffers were considered for use during Bitcoin's initial 
development, but rejected on concerns around complexity and security:

 https://bitcointalk.org/index.php?topic=632.msg7090#msg7090

Whether or not those concerns were well-founded, Bitcoin's resulting wire 
format works well today, and for this reason, changing it is not considered 
to be an option.

The impetus for this question, then, is that there are an increasing number 
of implementations of the Bitcoin protocol under development today, and in 
order to participate in the peer-to-peer network, each must faithfully 
re-implement handling this custom wire format. Typically this work is done 
through a combination of studying the documentation above and carefully 
transcribing code from the Bitcoin Core reference implementation. This 
creates a significant barrier to entry as well as a potential source of 
bugs that can threaten network stability.

To avoid this tedious and error-prone work, there is a desire to codify the 
message formats in such a way that language-specific bindings may be 
generated rather than hand-coded.

The encoding algorithm and code generation for each specific language would 
of course have to be custom developed, but the idea is to do so within an 
otherwise widely-used framework such as protocol buffers, minimizing the 
need to re-invent as much as possible.

I have not yet looked deeply at the extension points within protocol 
buffers to assess the feasibility of this idea. I have seen that protoc 
supports plugins [1], but don't know whether anyone has gone so far with 
them as to replace fundamental assumptions about wire format. I have also 
noticed Custom Options [2], which may help in expressing particular 
quirks or nuances of the existing protocol within .proto files.

At this point, I'd simply like to see whether anyone has been down this 
road before, and whether there are reasons for dismissing the idea 
completely before digging in too much further.

- Chris

P.S: Please note that in posting this question I am in no way presuming to 
represent the Bitcoin Core development team.

[1]: 
https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.compiler.plugin.pb
[2]: https://developers.google.com/protocol-buffers/docs/proto#options

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.


Re: [protobuf] Customizing protoc to support an existing wire format

2014-05-27 Thread 'Feng Xiao' via Protocol Buffers
There are two ways to support custom wire format with protobufs.
  1. Implement the parsing/serializing code as a runtime library. The text
format support in protobuf can be seen as such a library. Support for
Json/XML is also done using this approach. It relies on the protobuf
reflection support which allows you to query the type information of a
protobuf message and manipulate arbitrary protobuf messages (like the
Reflection support in certain languages like Java but for protobufs).
  2. Override protobuf code generation behavior to inject code into the
generated classes or generate completely new custom classes. To do this you
have two choices, one is to build a custom protoc binary and the other is
to use plugins. Both will require an implementation of the CodeGenerator
interface. Many people who implement protobufs (for languages other than
the officially supported C++, Java, Python) take the first approach
probably because it's easy to do and the use of plugins is not well
documented. The use of plugin is the recommended approach though.

If you are only implementing this wire format for C++ and don't care that
much about performance, 1) is probably easier to try out. Otherwise you'll
need to use the 2) code gen approach. For examples you can have a look at
the third-party add-ons
listhttps://code.google.com/p/protobuf/wiki/ThirdPartyAddOns.
The programming languages covers how to generate new classes and the RPC
section has examples covering how to insert code into existing generated
code. They are not exactly the same as supporting a new wire format but the
implementation techniques should be no different. In your case, you can
either generate new custom classes to encode/decode from the custom wire
format or generate additional parsing/serializing methods in existing
classes. Note that the support for the latter is limited. It's best
supported in C++ and not very well supported in Java/Python. For other
languages it might not be supported at all.

On Mon, May 26, 2014 at 3:56 AM, Chris Beams ch...@beams.io wrote:

 Hello,

 I'd like to see if any prior work has been done in customizing protobuf
 compilation to support message encoding/decoding against a legacy wire
 format.

 Put another way, I'm interested in:

  1. specifying an existing protocol using protobuf's .proto file syntax,
 and
  2. reusing protobuf's .proto file parsing and code generation
 infrastructure, while
  3. replacing protobuf's default encoding algorithm and replacing it with
 one that conforms to an existing format.

Plugins allow you to insert code into existing generated code, but you
won't be able to replace existing code. As I mentioned above this is only
well supported in C++. If this is not a concern to you I would be happy to
give you more details on how to implement such a plugin.



 This discussion from 2013 is the closest thing I've found to a similar
 question on this mailing list. Unfortunately it doesn't go into much detail:

  https://groups.google.com/forum/#!topic/protobuf/zvughVLk6BU

 Some context will probably be of use. The existing wire format in question
 is that of Bitcoin's peer-to-peer network protocol. These messages and
 their binary representations are defined in this document:

  https://en.bitcoin.it/wiki/Protocol_specification#Message_types

 Note that protocol buffers were considered for use during Bitcoin's
 initial development, but rejected on concerns around complexity and
 security:

  https://bitcointalk.org/index.php?topic=632.msg7090#msg7090

 Whether or not those concerns were well-founded, Bitcoin's resulting wire
 format works well today, and for this reason, changing it is not considered
 to be an option.

 The impetus for this question, then, is that there are an increasing
 number of implementations of the Bitcoin protocol under development today,
 and in order to participate in the peer-to-peer network, each must
 faithfully re-implement handling this custom wire format. Typically this
 work is done through a combination of studying the documentation above and
 carefully transcribing code from the Bitcoin Core reference implementation.
 This creates a significant barrier to entry as well as a potential source
 of bugs that can threaten network stability.

 To avoid this tedious and error-prone work, there is a desire to codify
 the message formats in such a way that language-specific bindings may be
 generated rather than hand-coded.

 The encoding algorithm and code generation for each specific language
 would of course have to be custom developed, but the idea is to do so
 within an otherwise widely-used framework such as protocol buffers,
 minimizing the need to re-invent as much as possible.

 I have not yet looked deeply at the extension points within protocol
 buffers to assess the feasibility of this idea. I have seen that protoc
 supports plugins [1], but don't know whether anyone has gone so far with
 them as to replace fundamental assumptions about wire 

Re: [protobuf] Customizing protoc to support an existing wire format

2014-05-27 Thread David Yu
On Mon, May 26, 2014 at 6:56 PM, Chris Beams ch...@beams.io wrote:

 Hello,

 I'd like to see if any prior work has been done in customizing protobuf
 compilation to support message encoding/decoding against a legacy wire
 format.

 Put another way, I'm interested in:

  1. specifying an existing protocol using protobuf's .proto file syntax,
 and
  2. reusing protobuf's .proto file parsing and code generation
 infrastructure, while
  3. replacing protobuf's default encoding algorithm and replacing it with
 one that conforms to an existing format.

 This discussion from 2013 is the closest thing I've found to a similar
 question on this mailing list. Unfortunately it doesn't go into much detail:

  https://groups.google.com/forum/#!topic/protobuf/zvughVLk6BU

 Some context will probably be of use. The existing wire format in question
 is that of Bitcoin's peer-to-peer network protocol. These messages and
 their binary representations are defined in this document:

  https://en.bitcoin.it/wiki/Protocol_specification#Message_types

 Note that protocol buffers were considered for use during Bitcoin's
 initial development, but rejected on concerns around complexity and
 security:

  https://bitcointalk.org/index.php?topic=632.msg7090#msg7090

 Whether or not those concerns were well-founded, Bitcoin's resulting wire
 format works well today, and for this reason, changing it is not considered
 to be an option.

 The impetus for this question, then, is that there are an increasing
 number of implementations of the Bitcoin protocol under development today,
 and in order to participate in the peer-to-peer network, each must
 faithfully re-implement handling this custom wire format. Typically this
 work is done through a combination of studying the documentation above and
 carefully transcribing code from the Bitcoin Core reference implementation.
 This creates a significant barrier to entry as well as a potential source
 of bugs that can threaten network stability.

 To avoid this tedious and error-prone work, there is a desire to codify
 the message formats in such a way that language-specific bindings may be
 generated rather than hand-coded.

 The encoding algorithm and code generation for each specific language
 would of course have to be custom developed, but the idea is to do so
 within an otherwise widely-used framework such as protocol buffers,
 minimizing the need to re-invent as much as possible.

 I have not yet looked deeply at the extension points within protocol
 buffers to assess the feasibility of this idea. I have seen that protoc
 supports plugins [1], but don't know whether anyone has gone so far with
 them as to replace fundamental assumptions about wire format. I have also
 noticed Custom Options [2], which may help in expressing particular
 quirks or nuances of the existing protocol within .proto files.

 At this point, I'd simply like to see whether anyone has been down this
 road before, and whether there are reasons for dismissing the idea
 completely before digging in too much further.

Check out https://code.google.com/p/protostuff/
It uses proto files for compilation/code generation but does not really
implement the full proto spec.
Custom options and annotations have been supported from the start (along
with external compiler options) to aid code generation for specific
languages/formats.


 - Chris

 P.S: Please note that in posting this question I am in no way presuming to
 represent the Bitcoin Core development team.

 [1]:
 https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.compiler.plugin.pb
 [2]: https://developers.google.com/protocol-buffers/docs/proto#options

  --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to protobuf+unsubscr...@googlegroups.com.
 To post to this group, send email to protobuf@googlegroups.com.
 Visit this group at http://groups.google.com/group/protobuf.
 For more options, visit https://groups.google.com/d/optout.




-- 
When the cat is away, the mouse is alone.
- David Yu

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.