Re: Streaming support for thrift in java

Stuart Reynolds Tue, 24 Mar 2015 10:07:51 -0700

I'm attaching a useful discussion on how to build iterators into
thrift. (Sorry -- can't link to the mail archives - they seem to be
offline)/

In short, return an reference to an iterator object and have the
client supply the reference in subsequent requests to the server.

This will solve your use case (how to efficiently iterate through an
infinite amount of data).
If you leave the TCP connection open between requests, the overhead is
not so large.

Full streaming is not required for your use case -- streaming, I
think, requires that the server push new data to the client as it
becomes available. In most cases it fine to poll the server to say you
are ready for new data

The downside of this approach is that, in a full streaming
implementation, the client may always find that it has more available
in its network buffers when it looks for it.
In the iterator pattern solution, the client has to poll the server to
ask it to send more data, which may add latency. Its straightforward
to work around this on the clients with a worker thread and a buffer /
starting a future while processing the previous request.

Network costs are comparable - in a real streaming protocol the client
still needs to send packets back to the server to acknowledge or say
its full. In a framed RPC protocol, the client needs to send its
request for more data.

- Stuart

NB. It looks totally possible to build streaming on top of thrift. All
the hard work of data serialization is done for you. However, you need
to implement you own protocol and processor objects for the server and
client.

On Wed, Feb 11, 2015 at 12:16 PM, Ben Craig <[email protected]> wrote:
> For C++ servers (and possibly others), it is possible to have per-connection
> state.  Assuming your service is named RPCIterator, you would want something
> vaguely like this...
>
>
> boost::shared_ptr<server::TThreadedServer> server =
>    boost::make_shared<server::TThreadedServer>(
>       boost::make_shared<RPCIteratorProcessorFactory /*code generated*/>(
>          boost::make_shared<RPCIteratorIfCloneFactory /* hand written
> */>()),
>       boost::make_shared<transport::TServerSocket>(12345 /*your socket
> number here*/),
>       boost::make_shared<transport::TBufferedTransportFactory>(),
>       boost::make_shared<protocol::TBinaryProtocolFactory>());
>
>
> The implementation of RPCIteratorIfCloneFactory would look something like
> this...
>
> class RPCIteratorIfCloneFactory : public RPCIterator::RPCIteratorIfFactory {
>  public:
>   virtual ~RPCIteratorIfCloneFactory() {}
>
>   virtual RPCIterator::RPCIteratorIf* getHandler(const
> ::apache::thrift::TConnectionInfo& connInfo)
>   {
>     return new RPCIteratorHandler; /* build your handler object here */
>   }
>   virtual void releaseHandler(RPCIterator::RPCIteratorIf* handler)
>   {
>     /*put any crazy cleanup customizations here */
>     delete handler;
>   }
> };
>
>
> This approach gives you a unique instance of your handler class per
> connection.  If the connection dies, then releaseHandler will be called on
> it once all processing for that connection has completed.
>
>
>
> From:        Stuart Reynolds <[email protected]>
> To:        [email protected],
> Date:        02/11/2015 11:48 AM
> Subject:        Best patterns for providing an iterator via thrift RPC
> Sent by:        [email protected]
> ________________________________
>
>
>
> I'd like to provide access to something like a database iterator via a
> thrift API (e.g. iterate through 1M+ DB records). This requires state
> to be maintained on the server, although only for the duration of the
> thrift TCP connection. (ie. you could argue that the server might
> still be considered stateless).
>
> Here's my first pass:
>
> ----
> struct RPCIteratorID {
>   int id;
>   int dataTypeId;
> }
>
> // Get an iterator of type T1. 0 = means no results.
> RPCIteratorID queryT1( queryParams );
>
> // supply the iterator id. list.size!=maxResults == end.
> list<T> getNext_T2( RPCIteratorID iter, int maxResults );
>
>
> RPCIteratorID queryT2( queryParams );
> list<T2> getNext_T2( RPCIteratorID iter, int maxResults );
>
>
> RPCIteratorID queryT3( queryParams );
> list<T3> getNext_T3( RPCIteratorID iter, int maxResults );
> ...
> void closeIterator( RPCIteratorID iter );
> ----
>
> If the TCP socket timesout or is close, the server calls closeIterator
> for all open iterators on the connection.
>
> Are there better design patterns for this?
>
> - Stu
>

Re: Streaming support for thrift in java

Reply via email to