Re: Protobuf's Missing Features

Kenton Varda Fri, 07 Nov 2008 19:06:47 -0800

On Fri, Nov 7, 2008 at 5:14 PM, code_monkey_steve <
[EMAIL PROTECTED]> wrote:


>
> After playing with protobuf for the last few months, I've decided that
> it's not quite suitable for my purposes, due to some design decisions
> (which I'm sure seemed the like a good idea at the time).  As much as
> I hate reinventing the wheel, I've decided to create my own message
> encoding framework implementing the features below ("And blackjack!
> And hookers!  Ah, who needs the framework"), while maintaining wire-
> level compatibility with protobuf.


Good luck with that.  It's more work than you might expect.

1. XML vs. Yet-Another-Proprietary-File-Format
> The arguments against using XML at the wire-level are well documented,
> but why, oh why, couldn't you have made the message definition format
> (.proto) XML-based?


Because XML is too verbose and, frankly, really hard to read.

<message name="Foo">
  <field name="foo" number="1" type="int32" label="optional"/>
  <field name="bar" number="2" type="string" label="repeated"/>
</message>

vs.

message Foo {
  optional int32 foo = 1;
  optional string bar = 2;
}

 Now every language has to code and debug (!)
> their own parser, and there's no way to add meta-data to the message
> definitions.


Actually, libprotoc allows you to reuse protoc's implementation, so there's
no need for anyone to write their own parser.

http://code.google.com/apis/protocolbuffers/docs/reference/cpp/google.protobuf.compiler.command_line_interface.html

If you can't stand writing your code generator, you can always invoke protoc
with the --descriptor_set_out option to parse the .proto files and convert
them into a FileDescriptorSet, which is itself a protocol buffer (see
src/google/protobuf/descriptor.proto).  You can then parse that in any
language that supports protobufs and generate your code based on it.


> This is my single biggest complaint, and the one reason protobuf is
> unsuitable for my project:  the message definitions need to include
> enough information to dynamically generate the user interface for both
> displaying and composing messages.


You can do this with custom options.  For example, to annotate fields with
descriptions for use in a UI:

  import "google/protobuf/descriptor.proto";
  extend google.protobuf.FieldOptions {
    optional string description = 12345;
  }

  message Foo {
    optional int32 foo = 1 [(description) = "The foo field."];
    repeated string bar = 2 [(description) = "The bar field."];
  }

This is a new feature and I admit it is not adequately documented at the
moment.


> 2. Message Inheritance (vs. Extensions?)
> Are there any languages left that don't support single-inheritance,
> even C?  Reserving a zero'th message field for a base message class
> uses almost no overhead, and allows for a nice message class
> hierarchy.
>
> Perhaps I just don't grok Extensions, but they seem more like a safety
> feature than a re-usability mechanism.


This question is asked so often that I have a canned response ready:

================================

Many people have observed that extensions solve similar problems to
inheritance, and wonder why Protocol Buffers do not implement inheritance
instead. The short answer is that extensions fit better into the Protocol
Buffer model, whereas inheritance creates many difficult questions and
significantly complicates both interface and implementation. The long answer
(copied from an e-mail discussion) follows.

When people talking about protocol buffer inheritance, there are generally
two distinct ways they want to use it (1) Cases where the consumer of the
message knows exactly which subclass they expect to receive. In this case,
all the user really wants is to be able to define a message which has all
the same fields as some base message plus some extras specific to their app.
(2) Cases where the consumer does not necessarily know which subclass it
will receive, and wants to be able to check what kind of message it has
received after receiving it (like a "dynamic_cast" or "instanceof").

Our feeling about case 1 is that the best way to accomplish it is to simply
embed an instance of the "base" message into your "derived" message. Sure,
we could add a whole lot of code generation which makes this look like
inheritance, but it does not seem worth the effort. Besides, this is
arguably "implementation inheritance", which many believe is not good O-O
design.

If we wanted to go further and make the wire format be compatible between
base classes and derived classes (which it seems many people would expect),
it would either add a bunch of complication to the parsing code or would
require that each subclass contain a complete copy of the superclass's
parser, extended with the subclass's additional fields.

Additionally, the descriptor and reflection interfaces would have to be
updated to know about subclassing, etc., which is complicated.

Overall, it just doesn't seem worth the added complexity.

Case 2 is more interesting. This is the case extensions were designed to
address. The previous solution -- MessageSet -- is used a lot, and there are
many cases where a single MessageSet contains multiple messages. In Google,
we frequently see MessageSets containing several messages.

The most obvious problem with using inheritance in this case is that we
would need multiple inheritance even just to cover existing use cases. Many
people object to multiple inheritance for many reasons.

Now, even if we pretended the multiple-extension use cases didn't exist, it
would still be extremely difficult to solve this problem using an
inheritance model. For example, if you don't know what message type you're
receiving, how do you know what class to use to parse it? The wire format
would have to identify this somehow -- before the actual data started --
which would have to be hacky (if not impossible) to do without breaking
backwards-compatibility. Alternatively, you could put the data to the side
and not actually instantiate the subclass until someone attempts to
"down-cast" the object, but that's awkward.

Add this to all the same design issues listed in case 1 and the fact that
people frequently want a single message to contain multiple extensions and
we see that inheritance just is not the right solution here.

=============================


> 3. Typedefs
> E.g., "UUID=string", "Timestamp=double", etc.  Syntactic sugar is
> always good.


Even many fully-featured programming languages -- e.g. Java -- don't provide
this.

4. Built-un UUID Type
> There are lots of other built-in types I'd like to have, but I think
> this one's a must for a message encoder.


What's wrong with defining your own UUID message?  What would we gain from
having it built-in?

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Protobuf's Missing Features

Reply via email to