In our case the zk client had been silently (well lots of log messages)
failing for 7 hours our so. So not exactly academic, but an unusual
situation to be sure.
On Feb 3, 2011 4:08 PM, "Ted Dunning" <[email protected]> wrote:
> Patrick,
>
> Is it really impossible for the client to say that soooo much time has
> passed in disconnected state that the session MUST have expired by now?
>
> I have heard this assertion before and it always irked me a bit, but
Ryan's
> scenario is a great thought experiment (well, though experiment for US,
not
> for him). Why can't those clients decide the session is expired after 3
> days when the timeout is 3 minutes?
>
> On Thu, Feb 3, 2011 at 4:01 PM, Patrick Hunt <[email protected]> wrote:
>
>> On Thu, Feb 3, 2011 at 2:57 PM, Ryan Rawson <[email protected]> wrote:
>> > The result was the client never realized that it's session was
>> > actually timed out, and the HBase processes continued to run. Kill -9
>> > and a restart fixed it.
>>
>> Hi Ryan,
>>
>> there are two issues at play here, session timeout and session
>> expiration. Correct me if I'm wrong but I think you meant to say "the
>> client never realized that it's session was actually _expired_". Which
>> is correct behavior. Clients can only determine if a session is
>> expired once they reconnect to the cluster. Session timeout on the
>> other hand happens when the server heartbeat is not received by the
>> client w/in the session timeout period. Clients who are disconnected
>> from the cluster will attempt to reconnect back to the cluster until
>> they are successful. When a client is disconnected the client's
>> watchers will be notified about the disconnect. (same for expiration).
>>
>> See questions 1 & 2 here in the faq, specifically "Example state
>> transitions" in question 2:
>> https://cwiki.apache.org/confluence/display/ZOOKEEPER/FAQ
>> Your clients were stuck btw steps 4 and 5 (which they will never reach
>> in your scenario).
>>
>> Does that help?
>>
>> Patrick
>>