His primary use case is the same as Hadoop recordio < big files with lots of similar records in them. So he wants to be able to put a data description header and then have a big stream of records that conform to that header and
which don¹t need type fields interspersed.

Don't we call this TDenseProtocol?

In addition, he wants to have it all to work dynamically so, for example, a Python script used in Hadoop Streaming can read the header and pull fields out of records in the stream without needing to have the generated bindings.

This is something we don't have, but would be trivial to add. It'd be weird to pass the out-of-band header communications in the same stream as the data when using Hadoop streaming though, so I'm not sure that's going to be such a no-brainer.

Is someone going to get in touch with Doug directly, or are we just going to try and jump on his mailing list thread?

-Bryan

Reply via email to