On Fri, Aug 26, 2011 at 1:38 PM, Joel Meyer <[email protected]> wrote: > On Fri, Aug 26, 2011 at 8:01 AM, Benson Margulies > <[email protected]>wrote: > >> Reading http://thrift.apache.org/static/thrift-20070401.pdf, I find, >> in section 4.2: >> >> "Thrift structures are designed to support encoding into a streaming >> protocol. The implementation should never need to frame or compute the >> entire data length of a structure prior to encoding it." >> >> However, due to the possible gap between principle and practice, I >> started by asking about this on the IRC channel, and Bryan Duxbury >> answered that streaming is not supported. >> >> I am wondering if perhaps I am approaching my problem the wrong way, >> so I am writing to describe it. >> >> The job in question is to run a potentially large blob of text into a >> set of analytical components and return a large number of small items >> that result. Note this is an online process; it's not appropriate to >> chop the blob into chunks and feed it to a map-reduce system. >> >> There's no reason to ask Thrift or something like it to be involved in >> pushing one giant string in one direction. Coming back the other way, >> however, what I am looking at is a long sequence of the form [ type1, >> struct1, type2, struct2 ... ]. I do not want to have all of them in >> memory at once. >> >> What would the readers of this list suggest as an approach to this? >> > > As you noted, there's no reason to use thrift for sending the long string, > but on receiving end you could have a service definition something like > this: > > service ParsedService { > void type1( 1: Struct1 s ), > void type2( 2: Struct2 s ), > ... > } > > Then when sending back the result of parsing/processing your string you'd > just end up with a lot of calls like: > > ParsedService.type1(struct1); > ParsedService.type2(struct2); > ParsedService.type1(struct1); >
Thanks, this illuminates Bryan's IRC note. How does a service manage state in this case, or will that be self-evident if I read some example services? OOh. I see, you have reversed the client and the server, with the server acting as a client. OK, this deserves some thought. > (You could add some sort of identifier if you need to tie the original > string back to the parts.) Because the rpc calls are void, you're not > waiting for a response from the server and it's very much like streaming the > results back. Anyway, that's just the first approach that comes to mind, no > doubt there are others that may work even better. > > HTH, > Joel >
