Re: heterogeneous collections

Mayan Moudgill Tue, 04 May 2010 16:56:16 -0700


Bryan Duxbury wrote:

I don't mean to discourage you from thinking about more optimal structures,
but man, this doesn't strike you as overcomplicated? I can tell you from
experience that managing a stack of metadata (in the compact protocol) is
non-trivial both in terms of complexity and performance - and that's just
field IDs.


Not really; its as complicated as it needs to be to satisfy the goals of:
- allow fields to be present or absent
- allow for incremental encoding
- allow for a type-less representation.

It allows for a very efficient serializer and deserializer; inparticular it may even make it possible to do in-place/zero-copydeserialization, something that is a little difficult to do with thetypes being interspersed with the data.

[And BTW - do you *REALLY* think this is complicated? I'd estimate thatthis is a few hundred lines of code at most - in fact, I'd call thispretty straightforward, if not trivial]

Additionally, including the type information gives us more than just the
ability to skip correctly. It also makes the serialized data fully

described.

That I agree - though its described at the duck-typing level, not thestrong typing level.

This makes it easy to debug it when you might think something is
going wrong, or write a generic tool that can digest serialized Thrift for
some reason.

Again, I agree - but that is NOT the reason that was given for thedesign with full-typing.

Instead, what was claimed was that it was done this way since "..usingthe type-identifier system keeps the TProtocol interface incredibly flatand obvious...". However, it should be obvious that even holding theTProtocol interface constant, one can have alternate serializationprotocol that might yield better marshalling/demarshalling performancewhilst sending less bits across the wire.

The bottom line is that the way it works now, nothing is implied, which is
probably suitable for most applications. There's certainly the possibility
that for other applications, this doesn't make as much sense, so perhaps we
should explore those ideas more fully, but *definitely* in another thread
than this one.

One can get the same debuggability benefits by sending the type stringout-of-band, either before or after the data. In a strongly typedsystem, in particular, the type string is a constant, so its zero costto generate and there are no copies - it can be a direct argument towritev() [or its equivalent].

Re: heterogeneous collections

Reply via email to