Bryan Duxbury wrote:


I don't mean to discourage you from thinking about more optimal structures,
but man, this doesn't strike you as overcomplicated? I can tell you from
experience that managing a stack of metadata (in the compact protocol) is
non-trivial both in terms of complexity and performance - and that's just
field IDs.

Not really; its as complicated as it needs to be to satisfy the goals of:
- allow fields to be present or absent
- allow for incremental encoding
- allow for a type-less representation.
It allows for a very efficient serializer and deserializer; in particular it may even make it possible to do in-place/zero-copy deserialization, something that is a little difficult to do with the types being interspersed with the data.

[And BTW - do you *REALLY* think this is complicated? I'd estimate that this is a few hundred lines of code at most - in fact, I'd call this pretty straightforward, if not trivial]

Additionally, including the type information gives us more than just the
ability to skip correctly. It also makes the serialized data fully
described.

That I agree - though its described at the duck-typing level, not the strong typing level.

This makes it easy to debug it when you might think something is
going wrong, or write a generic tool that can digest serialized Thrift for
some reason.

Again, I agree - but that is NOT the reason that was given for the design with full-typing.

Instead, what was claimed was that it was done this way since "..using the type-identifier system keeps the TProtocol interface incredibly flat and obvious...". However, it should be obvious that even holding the TProtocol interface constant, one can have alternate serialization protocol that might yield better marshalling/demarshalling performance whilst sending less bits across the wire.

The bottom line is that the way it works now, nothing is implied, which is
probably suitable for most applications. There's certainly the possibility
that for other applications, this doesn't make as much sense, so perhaps we
should explore those ideas more fully, but *definitely* in another thread
than this one.

One can get the same debuggability benefits by sending the type string out-of-band, either before or after the data. In a strongly typed system, in particular, the type string is a constant, so its zero cost to generate and there are no copies - it can be a direct argument to writev() [or its equivalent].


Reply via email to