ah dang, i should have said "generate a close request for the session and push that through the system."


On 09/10/2010 01:01 PM, Benjamin Reed wrote:
   the problem is that followers don't track session timeouts. they track
when they last heard from the sessions that are connected to them and
they periodically propagate this information to the leader. the leader
is the one that expires the session. your technique only works when the
client is connected to the leader.

one thing you can do is generate a close request for the socket and push
that through the system. that will cause it to get propagated through
the followers and processed at the leader. it would also allow you to
get your functionality without touching the processing pipeline.

the thing that worries me about this functionality in general is that
network anomalies can cause a whole raft of sessions to get expired in
this way. for example, you have 3 servers with load spread well; there
is a networking glitch that cause clients to abandon a server; suddenly
1/3 of your clients will get expired sessions.


On 09/10/2010 12:17 PM, Fournier, Camille F. [Tech] wrote:
Ben, could you explain a bit more why you think this won't work? I'm trying to 
decide if I should put in the work to take the POC I wrote and complete it, but 
I don't really want to waste my time if there's a fundamental reason it's a bad 


-----Original Message-----
From: Benjamin Reed [mailto:br...@yahoo-inc.com]
Sent: Wednesday, September 08, 2010 4:03 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: closing session on socket close vs waiting for timeout

unfortunately, that only works on the standalone server.


On 09/08/2010 12:52 PM, Fournier, Camille F. [Tech] wrote:
This would be the ideal solution to this problem I think.
Poking around the (3.3) code to figure out how hard it would be to implement, I 
figure one way to do it would be to modify the session timeout to the min 
session timeout and touch the connection before calling close when you get 
certain exceptions in NIOServerCnxn.doIO. I did this (removing the code in 
touch session that returns if the tickTime is greater than the expire time) and 
it worked (in the standalone server anyway). Interesting solution, or total 
hack that will not work beyond most basic test case?


(forgive lack of actual code in this email)

-----Original Message-----
From: Ted Dunning [mailto:ted.dunn...@gmail.com]
Sent: Tuesday, September 07, 2010 1:11 PM
To: zookeeper-user@hadoop.apache.org
Cc: Benjamin Reed
Subject: Re: closing session on socket close vs waiting for timeout

This really is, just as Ben says a problem of false positives and false
negatives in detecting session

On the other hand, the current algorithm isn't really using all the
information available.  The current algorithm is
using time since last client initiated heartbeat.  The new proposal is
somewhat worse in that it proposes to use
just the boolean "has-TCP-disconnect-happened".

Perhaps it would be better to use multiple features in order to decrease
both false positives and false negatives.

For instance, I could imagine that we use the following features:

- time since last client hearbeat or disconnect or reconnect

- what was the last event? (a heartbeat or a disconnect or a reconnect)

Then the expiration algorithm could use a relatively long time since last
heartbeat and a relatively short time since last disconnect to mark a
session as disconnected.

Wouldn't this avoid expiration during GC and cluster partition and cause
expiration quickly after a client disconnect?

On Mon, Sep 6, 2010 at 11:26 PM, Patrick Hunt<ph...@apache.org>    wrote:

That's a good point, however with suitable documentation, warnings and such
it seems like a reasonable feature to provide for those users who require
it. Used in moderation it seems fine to me. Perhaps we also make it
configurable at the server level for those administrators/ops who don't
to deal with it (disable the feature entirely, or only enable on particular
servers, etc...).


On Mon, Sep 6, 2010 at 2:10 PM, Benjamin Reed<br...@yahoo-inc.com>    wrote:

if this mechanism were used very often, we would get a huge number of
session expirations when a server fails. you are trading fast error
detection for the ability to tolerate temporary network and server


to be honest this seems like something that in theory sounds like it will
work in practice, but once deployed we start getting session expirations


cases that we really do not want or expect.


On 09/01/2010 12:47 PM, Patrick Hunt wrote:

Ben, in this case the session would be tied directly to the connection,
we'd explicitly deny session re-establishment for this session type (so
4 would fail). Would that address your concern, others?


On 09/01/2010 10:03 AM, Benjamin Reed wrote:

i'm a bit skeptical that this is going to work out properly. a server
may receive a socket reset even though the client is still alive:

1) client sends a request to a server
2) client is partitioned from the server
3) server starts trying to send response
4) client reconnects to a different server
5) partition heals
6) server gets a reset from client

at step 6 i don't think you want to delete the ephemeral nodes.


On 08/31/2010 01:41 PM, Fournier, Camille F. [Tech] wrote:

Yes that's right. Which network issues can cause the socket to close
without the initiating process closing the socket? In my limited
experience in this area network issues were more prone to leave dead
sockets open rather than vice versa so I don't know what to look out


-----Original Message-----
From: Dave Wright [mailto:wrig...@gmail.com]
Sent: Tuesday, August 31, 2010 1:14 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: closing session on socket close vs waiting for timeout

I think he's saying that if the socket closes because of a crash (i.e.
not a
normal zookeeper close request) then the session stays alive until the
session timeout, which is of course true since ZK allows reconnection
resumption of the session in case of disconnect due to network issues.

-Dave Wright

On Tue, Aug 31, 2010 at 1:03 PM, Ted Dunning<ted.dunn...@gmail.com>

That doesn't sound right to me.

Is there a Zookeeper expert in the house?

On Tue, Aug 31, 2010 at 8:58 AM, Fournier, Camille F. [Tech]<
camille.fourn...@gs.com>     wrote:

I foolishly did not investigate the ZK code closely enough and it
that closing the socket still waits for the session timeout to
remove the

Reply via email to