Reading http://thrift.apache.org/static/thrift-20070401.pdf, I find,
in section 4.2:

"Thrift structures are designed to support encoding into a streaming
protocol. The implementation should never need to frame or compute the
entire data length of a structure prior to encoding it."

However, due to the possible gap between principle and practice, I
started by asking about this on the IRC channel, and Bryan Duxbury
answered that streaming is not supported.

I am wondering if perhaps I am approaching my problem the wrong way,
so I am writing to describe it.

The job in question is to run a potentially large blob of text into a
set of analytical components and return a large number of small items
that result. Note this is an online process; it's not appropriate to
chop the blob into chunks and feed it to a map-reduce system.

There's no reason to ask Thrift or something like it to be involved in
pushing one giant string in one direction. Coming back the other way,
however, what I am looking at is a long sequence of the form [ type1,
struct1, type2, struct2 ... ]. I do not want to have all of them in
memory at once.

What would the readers of this list suggest as an approach to this?

Reply via email to