Re: Backwards Compatibility of sizes and encodings.

2009-04-16 Thread Kenton Varda
We will absolutely maintain backwards-compatability of the wire format in
future versions.  A version of protocol buffers that wasn't backwards
compatible would be thoroughly useless.
However, our idea of compatibility means that newer versions of the code
can successfully parse messages produced by older versions and vice-versa.
 Although it seems unlikely that the encoded size of a message (containing
exactly the same data) would change in future versions of the serialization
code, this isn't a guarantee I feel comfortable making.  Even if you use
only fixed-width field types, there are many different technically-valid
ways to encode the data which could very well have different sizes (e.g. by
using overlong varints when encoding tags, or by splitting an optional
sub-message into multiple parts).

But I think assuming that messages of a particular type will always be the
same size is a bad idea anyway, even if you stick with the same version of
protocol buffers.  If you make this assumption, not only do you have to
avoid using variable-width fields, but you can never add new fields to your
message definition.  This defeats one of the most valuable features of
protocol buffers.

I think you should just write the size of your header message to the stream
before the message itself.  If you write it as a varint, this will probably
only cost you a byte, and you'll probably save at least a byte by using
varints inside your message rather than fixed-width fields.

On Thu, Apr 16, 2009 at 3:55 PM, Chris Brumgard chris.brumg...@gmail.comwrote:


 I have question regarding the future direction of protocol buffers.
 Is Google planning on adding features or changing the encoding of data
 types in any way that would break backwards compatibility?  I've read
 through the posts and it appears that the developers will try to
 maintain compatibility as much as possible.  My primary concern is
 that I plan on using a header message type that includes various
 fields to describe the next message including type and size.  Because
 I would be using fixed integer sizes (no varints) in the header, I
 will know in advance the size of the header therefore I wouldn't need
 to give the size in the stream.  However, this makes the assumption
 that future version of Protocol Buffers will not change the size of
 the serialized header or the individual fields.   Since the header has
 more than just size data information, I would prefer to use a protocol
 buffer message instead of straight binary as it makes it easier for
 languages that do not make it easy to convert binary to native data
 types and removes concerns about endianness and data type sizes (work
 is already done for me).  My other option is use text strings as
 almost all languages make it simple to convert strings to native data
 types, but I would prefer to keep the wire protocol pure.  Some of
 my fellow developers also have concerns about freezing development to
 one particular version of protocol buffers.  How realistic is it to
 expect the encoding and size of this type of message to remain
 unchanged or infrequently changed?
 


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Backwards Compatibility of sizes and encodings.

2009-04-16 Thread Chris Brumgard


Thanks for getting back with me on this.   Its been a while but I
believe I've seen several posts that uses something akin to the
following:

message A
{
  .
}

message B
{
  .
}


message wrapper
{
 required fixed32 size = 1;
 required fixed32 type = 2;

 optional A a = 3;
 optional B b = 4;
}


So message wrapper would be used for the actual sending of  messages A
or B. You would peek at the size apparently at the size with an
initial ParseFromString() to cover size and type.  Then going back and
deserializing the whole message given by size.  Does this not place a
requirement on knowing how much to read to cover size and type or have
I missed something?  I assume this would have a similar problem to the
one that you mentioned. I wanted to do something similar but
separating out size and type (and a few others) into a separate
message. It seems like all of the optional fields would slow down
parsing.


I can agree that it might be poor design in general, but I'm not sure
if this case given the common need for type and size shouldn't be
allowed to break it.  As a developer for protocol buffers, what was
the rational for leaving out out type and size in the protocol and
requiring users to specify it themselves?  Is this just not to break
compatibility with 1.0 or speed?

By the way, how is writing the message size and type as an independent
protocol buffer varints any different then doing such as Message
header { required int size = 1; required int type=2 }.  Is this just a
design philosophy?   In truth, your still creating a message buffer
its just implicitly defined and unnamed as opposed to explicitly
defined in a .proto file some where and you can't add fields to it
either without updating all the pre-existing client code (I guess you
could add an options count int to allow for more fields later before
the main message, but that gets complicated for something that should
be simple) .  I'm not opposed to  just the varints.  I can see how in
the API you would do this C++ and Java, but how would you do it in
Python?  OutputStream and InputStream class in the internal directory?

Thanks for your patience.



On Apr 16, 8:25 pm, Kenton Varda ken...@google.com wrote:
 We will absolutely maintain backwards-compatability of the wire format in
 future versions.  A version of protocol buffers that wasn't backwards
 compatible would be thoroughly useless.
 However, our idea of compatibility means that newer versions of the code
 can successfully parse messages produced by older versions and vice-versa.
  Although it seems unlikely that the encoded size of a message (containing
 exactly the same data) would change in future versions of the serialization
 code, this isn't a guarantee I feel comfortable making.  Even if you use
 only fixed-width field types, there are many different technically-valid
 ways to encode the data which could very well have different sizes (e.g. by
 using overlong varints when encoding tags, or by splitting an optional
 sub-message into multiple parts).

 But I think assuming that messages of a particular type will always be the
 same size is a bad idea anyway, even if you stick with the same version of
 protocol buffers.  If you make this assumption, not only do you have to
 avoid using variable-width fields, but you can never add new fields to your
 message definition.  This defeats one of the most valuable features of
 protocol buffers.

 I think you should just write the size of your header message to the stream
 before the message itself.  If you write it as a varint, this will probably
 only cost you a byte, and you'll probably save at least a byte by using
 varints inside your message rather than fixed-width fields.

 On Thu, Apr 16, 2009 at 3:55 PM, Chris Brumgard 
 chris.brumg...@gmail.comwrote:



  I have question regarding the future direction of protocol buffers.
  Is Google planning on adding features or changing the encoding of data
  types in any way that would break backwards compatibility?  I've read
  through the posts and it appears that the developers will try to
  maintain compatibility as much as possible.  My primary concern is
  that I plan on using a header message type that includes various
  fields to describe the next message including type and size.  Because
  I would be using fixed integer sizes (no varints) in the header, I
  will know in advance the size of the header therefore I wouldn't need
  to give the size in the stream.  However, this makes the assumption
  that future version of Protocol Buffers will not change the size of
  the serialized header or the individual fields.   Since the header has
  more than just size data information, I would prefer to use a protocol
  buffer message instead of straight binary as it makes it easier for
  languages that do not make it easy to convert binary to native data
  types and removes concerns about endianness and data type sizes (work
  is already done for me).  My other option is