Re: Best patterns for providing an iterator via thrift RPC

Wilm Schumacher Wed, 11 Feb 2015 10:48:59 -0800

I think you made a small spelling error on

list<T> getNext_T2( RPCIteratorID iter, int maxResults );


where the "_T2" is wrong?!?! Just for clarification.

Furthermore one could argue that the dataTypeID shouldn't be accessable
by the client and should be only used on the server. And when you make
something like getNext_T2( int id , int length ) add a "throws
WrongType" or so. You have to check this anyway, as the user could
change the "type".

Furthermore I personally think, that when you fetch 1M+ records your
plan is to somehow find something, or calculate some thing. Sum or Max
or so. And I personally would do that on the server. So you would only have
»int getSum( ... )«
as a function where only one int is supplied.

On the other hand: if you want to write a thrift gateway to a system
which provides such big numbers or records (e.g. you are trying to
create a mysql thrift gateway, which would be cool ;) ) then you have to
do something like that.

At last one could argue that this design is kind of not elegant as you
would have to add more and more functions and structs as the number of
"types" grow. And by this you always have to change the code in the
heart of the application. You could avoid this by using a blob type in
thrift to send the "blob" from the "database" to the client which parses
the real structure. Kind of mysql is doing. But this would kill the
advantage of using thift (typing etc.). So this argument was listed just
for completeness.

In my opinion this shouldn't be necessary by design. There is something
wrong with the design of the whole application if something like that is
necessary. But I do not know your requirements (which are often given by
others and you cannot change broken designs). Perhaps you could tell us
a little more why this is needed?!?

But if you are forced to supply a "iterator" to a very narrow number of
types, I would do it in a similar way (without the id, see above).

Just by first thoughts.

Best

Wilm

ps: btw. ... cool idea to use it like that! :D

Am 11.02.2015 um 18:47 schrieb Stuart Reynolds:
> I'd like to provide access to something like a database iterator via a
> thrift API (e.g. iterate through 1M+ DB records). This requires state
> to be maintained on the server, although only for the duration of the
> thrift TCP connection. (ie. you could argue that the server might
> still be considered stateless).
>
> Here's my first pass:
>
> ----
> struct RPCIteratorID {
>    int id;
>    int dataTypeId;
> }
>
> // Get an iterator of type T1. 0 = means no results.
> RPCIteratorID queryT1( queryParams );
>
> // supply the iterator id. list.size!=maxResults == end.
> list<T> getNext_T2( RPCIteratorID iter, int maxResults );
>
>
> RPCIteratorID queryT2( queryParams );
> list<T2> getNext_T2( RPCIteratorID iter, int maxResults );
>
>
> RPCIteratorID queryT3( queryParams );
> list<T3> getNext_T3( RPCIteratorID iter, int maxResults );
> ...
> void closeIterator( RPCIteratorID iter );
> ----
>
> If the TCP socket timesout or is close, the server calls closeIterator
> for all open iterators on the connection.
>
> Are there better design patterns for this?
>
> - Stu

Re: Best patterns for providing an iterator via thrift RPC

Reply via email to