I'm attaching a useful discussion on how to build iterators into thrift. (Sorry -- can't link to the mail archives - they seem to be offline)/
In short, return an reference to an iterator object and have the client supply the reference in subsequent requests to the server. This will solve your use case (how to efficiently iterate through an infinite amount of data). If you leave the TCP connection open between requests, the overhead is not so large. Full streaming is not required for your use case -- streaming, I think, requires that the server push new data to the client as it becomes available. In most cases it fine to poll the server to say you are ready for new data The downside of this approach is that, in a full streaming implementation, the client may always find that it has more available in its network buffers when it looks for it. In the iterator pattern solution, the client has to poll the server to ask it to send more data, which may add latency. Its straightforward to work around this on the clients with a worker thread and a buffer / starting a future while processing the previous request. Network costs are comparable - in a real streaming protocol the client still needs to send packets back to the server to acknowledge or say its full. In a framed RPC protocol, the client needs to send its request for more data. - Stuart NB. It looks totally possible to build streaming on top of thrift. All the hard work of data serialization is done for you. However, you need to implement you own protocol and processor objects for the server and client. On Wed, Feb 11, 2015 at 12:16 PM, Ben Craig <[email protected]> wrote: > For C++ servers (and possibly others), it is possible to have per-connection > state. Assuming your service is named RPCIterator, you would want something > vaguely like this... > > > boost::shared_ptr<server::TThreadedServer> server = > boost::make_shared<server::TThreadedServer>( > boost::make_shared<RPCIteratorProcessorFactory /*code generated*/>( > boost::make_shared<RPCIteratorIfCloneFactory /* hand written > */>()), > boost::make_shared<transport::TServerSocket>(12345 /*your socket > number here*/), > boost::make_shared<transport::TBufferedTransportFactory>(), > boost::make_shared<protocol::TBinaryProtocolFactory>()); > > > The implementation of RPCIteratorIfCloneFactory would look something like > this... > > class RPCIteratorIfCloneFactory : public RPCIterator::RPCIteratorIfFactory { > public: > virtual ~RPCIteratorIfCloneFactory() {} > > virtual RPCIterator::RPCIteratorIf* getHandler(const > ::apache::thrift::TConnectionInfo& connInfo) > { > return new RPCIteratorHandler; /* build your handler object here */ > } > virtual void releaseHandler(RPCIterator::RPCIteratorIf* handler) > { > /*put any crazy cleanup customizations here */ > delete handler; > } > }; > > > This approach gives you a unique instance of your handler class per > connection. If the connection dies, then releaseHandler will be called on > it once all processing for that connection has completed. > > > > From: Stuart Reynolds <[email protected]> > To: [email protected], > Date: 02/11/2015 11:48 AM > Subject: Best patterns for providing an iterator via thrift RPC > Sent by: [email protected] > ________________________________ > > > > I'd like to provide access to something like a database iterator via a > thrift API (e.g. iterate through 1M+ DB records). This requires state > to be maintained on the server, although only for the duration of the > thrift TCP connection. (ie. you could argue that the server might > still be considered stateless). > > Here's my first pass: > > ---- > struct RPCIteratorID { > int id; > int dataTypeId; > } > > // Get an iterator of type T1. 0 = means no results. > RPCIteratorID queryT1( queryParams ); > > // supply the iterator id. list.size!=maxResults == end. > list<T> getNext_T2( RPCIteratorID iter, int maxResults ); > > > RPCIteratorID queryT2( queryParams ); > list<T2> getNext_T2( RPCIteratorID iter, int maxResults ); > > > RPCIteratorID queryT3( queryParams ); > list<T3> getNext_T3( RPCIteratorID iter, int maxResults ); > ... > void closeIterator( RPCIteratorID iter ); > ---- > > If the TCP socket timesout or is close, the server calls closeIterator > for all open iterators on the connection. > > Are there better design patterns for this? > > - Stu >
