Bryan Duxbury wrote:
I don't mean to discourage you from thinking about more optimal structures,
but man, this doesn't strike you as overcomplicated? I can tell you from
experience that managing a stack of metadata (in the compact protocol) is
non-trivial both in terms of complexity and performance - and that's just
field IDs.
Not really; its as complicated as it needs to be to satisfy the goals of:
- allow fields to be present or absent
- allow for incremental encoding
- allow for a type-less representation.
It allows for a very efficient serializer and deserializer; in
particular it may even make it possible to do in-place/zero-copy
deserialization, something that is a little difficult to do with the
types being interspersed with the data.
[And BTW - do you *REALLY* think this is complicated? I'd estimate that
this is a few hundred lines of code at most - in fact, I'd call this
pretty straightforward, if not trivial]
Additionally, including the type information gives us more than just the
ability to skip correctly. It also makes the serialized data fully
described.
That I agree - though its described at the duck-typing level, not the
strong typing level.
This makes it easy to debug it when you might think something is
going wrong, or write a generic tool that can digest serialized Thrift for
some reason.
Again, I agree - but that is NOT the reason that was given for the
design with full-typing.
Instead, what was claimed was that it was done this way since "..using
the type-identifier system keeps the TProtocol interface incredibly flat
and obvious...". However, it should be obvious that even holding the
TProtocol interface constant, one can have alternate serialization
protocol that might yield better marshalling/demarshalling performance
whilst sending less bits across the wire.
The bottom line is that the way it works now, nothing is implied, which is
probably suitable for most applications. There's certainly the possibility
that for other applications, this doesn't make as much sense, so perhaps we
should explore those ideas more fully, but *definitely* in another thread
than this one.
One can get the same debuggability benefits by sending the type string
out-of-band, either before or after the data. In a strongly typed
system, in particular, the type string is a constant, so its zero cost
to generate and there are no copies - it can be a direct argument to
writev() [or its equivalent].