His primary use case is the same as Hadoop recordio < big files
with lots of
similar records in them. So he wants to be able to put a data
description
header and then have a big stream of records that conform to that
header and
which don¹t need type fields interspersed.
Don't we call this TDenseProtocol?
In addition, he wants to have it all to work dynamically so, for
example, a
Python script used in Hadoop Streaming can read the header and pull
fields
out of records in the stream without needing to have the generated
bindings.
This is something we don't have, but would be trivial to add. It'd be
weird to pass the out-of-band header communications in the same
stream as the data when using Hadoop streaming though, so I'm not
sure that's going to be such a no-brainer.
Is someone going to get in touch with Doug directly, or are we just
going to try and jump on his mailing list thread?
-Bryan