Thanks for getting back with me on this. Its been a while but I
believe I've seen several posts that uses something akin to the
following:
message A
{
.
}
message B
{
.
}
message wrapper
{
required fixed32 size = 1;
required fixed32 type = 2;
optional A a = 3;
optional B b = 4;
}
So message wrapper would be used for the actual sending of messages A
or B. You would peek at the size apparently at the size with an
initial ParseFromString() to cover size and type. Then going back and
deserializing the whole message given by size. Does this not place a
requirement on knowing how much to read to cover size and type or have
I missed something? I assume this would have a similar problem to the
one that you mentioned. I wanted to do something similar but
separating out size and type (and a few others) into a separate
message. It seems like all of the optional fields would slow down
parsing.
I can agree that it might be poor design in general, but I'm not sure
if this case given the common need for type and size shouldn't be
allowed to break it. As a developer for protocol buffers, what was
the rational for leaving out out type and size in the protocol and
requiring users to specify it themselves? Is this just not to break
compatibility with 1.0 or speed?
By the way, how is writing the message size and type as an independent
protocol buffer varints any different then doing such as Message
header { required int size = 1; required int type=2 }. Is this just a
design philosophy? In truth, your still creating a message buffer
its just implicitly defined and unnamed as opposed to explicitly
defined in a .proto file some where and you can't add fields to it
either without updating all the pre-existing client code (I guess you
could add an options count int to allow for more fields later before
the main message, but that gets complicated for something that should
be simple) . I'm not opposed to just the varints. I can see how in
the API you would do this C++ and Java, but how would you do it in
Python? OutputStream and InputStream class in the internal directory?
Thanks for your patience.
On Apr 16, 8:25 pm, Kenton Varda ken...@google.com wrote:
We will absolutely maintain backwards-compatability of the wire format in
future versions. A version of protocol buffers that wasn't backwards
compatible would be thoroughly useless.
However, our idea of compatibility means that newer versions of the code
can successfully parse messages produced by older versions and vice-versa.
Although it seems unlikely that the encoded size of a message (containing
exactly the same data) would change in future versions of the serialization
code, this isn't a guarantee I feel comfortable making. Even if you use
only fixed-width field types, there are many different technically-valid
ways to encode the data which could very well have different sizes (e.g. by
using overlong varints when encoding tags, or by splitting an optional
sub-message into multiple parts).
But I think assuming that messages of a particular type will always be the
same size is a bad idea anyway, even if you stick with the same version of
protocol buffers. If you make this assumption, not only do you have to
avoid using variable-width fields, but you can never add new fields to your
message definition. This defeats one of the most valuable features of
protocol buffers.
I think you should just write the size of your header message to the stream
before the message itself. If you write it as a varint, this will probably
only cost you a byte, and you'll probably save at least a byte by using
varints inside your message rather than fixed-width fields.
On Thu, Apr 16, 2009 at 3:55 PM, Chris Brumgard
chris.brumg...@gmail.comwrote:
I have question regarding the future direction of protocol buffers.
Is Google planning on adding features or changing the encoding of data
types in any way that would break backwards compatibility? I've read
through the posts and it appears that the developers will try to
maintain compatibility as much as possible. My primary concern is
that I plan on using a header message type that includes various
fields to describe the next message including type and size. Because
I would be using fixed integer sizes (no varints) in the header, I
will know in advance the size of the header therefore I wouldn't need
to give the size in the stream. However, this makes the assumption
that future version of Protocol Buffers will not change the size of
the serialized header or the individual fields. Since the header has
more than just size data information, I would prefer to use a protocol
buffer message instead of straight binary as it makes it easier for
languages that do not make it easy to convert binary to native data
types and removes concerns about endianness and data type sizes (work
is already done for me). My other option is