It can be very tricky to detect failures in a distributed system. In fact, it's 
not always possible. Suppose your thrift RPC is just taking a really long time 
- at some point it will time out (based on some config parameter that you set). 
However the timeout is client-side and the server may have completed the 
request and was just about to respond when the client gave up. Or, maybe the 
request failed. Or maybe the server is still working on it. But, that 
discussion goes well beyond the scope of the original question in this thread :)

-- Ilya

Sent from my iPhone

On Jan 25, 2011, at 7:09, Ted Dunning <[email protected]> wrote:

> On Tue, Jan 25, 2011 at 12:01 AM, Phillip B Oldham <[email protected]
>> wrote:
> 
>> I suppose it would be left up to the client then to test whether a
>> failed response actually completed... adding a fair amount of work to
>> the client.
>> 
> 
> Yes. It does.  And if you don't design the system well, then you may not
> even be able to tell if it has completed.
> 
>> 
>> Would zookeeper be able to "buffer" requests? For instance, if there
>> were two nodes behind it and they were both momentarily unresponsive,
>> could zookeeper (& the client) keep the connection active and wait for
>> a node to signal itself available and complete the request?
>> 
> 
> ZK isn't really involved in your request.  It is merely helping you
> coordinate things.  It is completely reasonable to keep "last completed
> transaction id" in ZK, but that doesn't really solve the problem.  You don't
> want to wait forever, after all.

Reply via email to