Thanks for the explaination! I suggest this goes to the wiki..

<quote>
the client only finds out about session expiration events when the client
reconnects to the cluster. if zk tells a client that its session is expired,
the ephemerals that correspond to that session will already be cleaned up.

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner
</quote>

So session expirations means two things here : server view(ephemeral clean
up) & client view(event delivery) , there are
no guarantee how long it will take in between, correct?

I guess the confusion rises from the documention which doesn't distinguish
these two concepts, e.g. in the javadoc
http://hadoop.apache.org/zookeeper/docs/r3.3.1/api/index.html

An ephemeral node will be removed by the ZooKeeper automatically when the
session associated with the creation of the node expires.

It is actually refering to the server view not the client view.



On Wed, Aug 18, 2010 at 1:12 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Uncharacteristically, I think that Ben's comments could use a little bit of
> amplification.
>
> First, ZK is designed with certain guarantees in mind and almost all
> operational characteristics flow logically from these guarantees.
>
> The guarantee that Ben mentioned here in passing is that if a client gets
> session expiration, it is *guaranteed* that the ephemerals have been
> cleaned
> up.  This guarantee is what drives the notification of session expiration
> after reconnection since while the client is disconnected, it cannot know
> if
> the cluster is operating correctly or not and thus cannot know if the
> ephemerals have been cleaned up yet.  The only way to have certain
> knowledge
> that the cluster has cleaned up the ephemerals is to get back in touch with
> an operating cluster.
>
> The client is not completely in the dark.  As Ben implied, it can know that
> the cluster is unavailable (it got a ConnectionLoss event, after all).
>  While the cluster is unavailable and before it gets a session expiration
> notification, the client can go into safe mode.
>
> The moral of this story is that to get the most out of ZK, it is best to
> adopt the same guarantee based design process that drove ZK in the first
> place.  The first step is that you have to decide what guarantees that you
> want to provide and then work from ZK's guarantees to get to yours.
>
> In the classic leader-election use of ZK, the key guarantee that we want
> is:
>
> - the number of leaders is less than or equal to 1
>
> Note that you can't guarantee that the number == 1, because other stuff
> could happen.  This has nothing to do with ZK.
>
> The pertinent ZK guarantees are:
>
> - an ephemeral file can only be created by a single session
>
> - deletion of an ephemeral file due to loss of client connection will occur
> after the client gets a connection loss
>
> - deletion of an ephemeral file will precede delivery of a session
> expiration event to the owner
>
> Phrased in terms of CSP-like constructs, the client has events
> BecomeMaster,
> EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
> according to this grammar:
>
> client := (
>   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
> EnterSafeMode?; RelinquishMaster)
>  | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
>  | Crash
>  )*
>
> To get the guarantees that we want, we can require the client to only do
> BecomeMaster after it creates an ephemeral file and require it to either
> Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
> deleted.  The only way that we can do that is to immediately do
> EnterSafeMode on connection loss and then do RelinquishMaster on session
> expiration or ExitSafeMode on connection restored.  It is involved, but you
> can actually do a proof of correctness from this that shows that your
> guarantee will be honored even in the presence of ZK or the client crashing
> or being partitioned.
>
>
>
> On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed <br...@yahoo-inc.com>
> wrote:
>
> > there are two things to keep in mind when thinking about this issue:
> >
> > 1) if a zk client is disconnected from the cluster, the client is
> > essentially in limbo. because the client cannot talk to a server it
> cannot
> > know if its session is still alive. it also cannot close its session.
> >
> > 2) the client only finds out about session expiration events when the
> > client reconnects to the cluster. if zk tells a client that its session
> is
> > expired, the ephemerals that correspond to that session will already be
> > cleaned up.
> >
> > one of the main design points about zk is that zk only gives correct
> > information. if zk cannot give correct information, it basically says "i
> > don't know". connection loss exceptions and disconnected states are
> > basically "i don't know".
> >
> > generally applications we design go into a "safe" mode, meaning they may
> > serve reads but reject changes, when disconnected from zk and only kill
> > themselves when they find out their session has expired.
> >
> > ben
> >
> > ps - session information is replicated to all zk servers, so if a leader
> > dies, all replicas know the sessions that are currently active and their
> > timeouts.
> >
> > On 08/16/2010 09:03 PM, Ted Dunning wrote:
> >
> >> Ben or somebody else will have to repeat some of the detailed logic for
> >> this, but it has
> >> to do with the fact that you can't be sure what has happened during the
> >> network partition.
> >> One possibility is the one you describe, but another is that the
> partition
> >> happened because
> >> a majority of the ZK cluster lost power and you can't see the remaining
> >> nodes.  Those nodes
> >> will continue to serve any files in a read-only fashion.  If the
> partition
> >> involves you losing
> >> contact with the entire cluster at the same time a partition of the
> >> cluster
> >> into a quorum and
> >> a minority happens, then your ephemeral files could continue to exist at
> >> least until the breach
> >> in the cluster itself is healed.
> >>
> >> Suffice it to say that there are only a few strategies that leave you
> with
> >> a
> >> coherent picture
> >> of the universe.  Importantly, you shouldn't assume that the ephemerals
> >> will
> >> disappear at
> >> the same time as the session expiration event is delivered.
> >>
> >> On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan<qing...@gmail.com>  wrote:
> >>
> >>
> >>
> >>> Ouch, is this the current ZK behavior? This is unexpected, if the
> >>> client get partitioned from ZK cluster, he should
> >>> get notified and take some action(e.g. commit suicide) otherwise how
> >>> to tell a ephemeral node is really
> >>> up or down? Zombie can create synchronization nightmares..
> >>>
> >>>
> >>>
> >>> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright<wrig...@gmail.com>
>  wrote:
> >>>
> >>>
> >>>> Another possible cause for this that I ran into recently with the c
> >>>>
> >>>>
> >>> client -
> >>>
> >>>
> >>>> you don't get the session expired notification until you are
> reconnected
> >>>>
> >>>>
> >>> to
> >>>
> >>>
> >>>> the quorum and it informs you the session is lost.  If you get
> >>>>
> >>>>
> >>> disconnected
> >>>
> >>>
> >>>> and can't reconnect you won't get the notification.  Personally I
> think
> >>>>
> >>>>
> >>> the
> >>>
> >>>
> >>>> client api should track the session expiration time locally and
> >>>>
> >>>>
> >>> information
> >>>
> >>>
> >>>> you once it's expired.
> >>>>
> >>>> On Aug 16, 2010 2:09 AM, "Qing Yan"<qing...@gmail.com>  wrote:
> >>>>
> >>>> Hi Ted,
> >>>>
> >>>>  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
> >>>> Hum...so you have met this problem before?
> >>>> I didn't see any OOM though, will look into it more.
> >>>>
> >>>>
> >>>> On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning<ted.dunn...@gmail.com>
> >>>>
> >>>>
> >>> wrote:
> >>>
> >>>
> >>>> I am assuming that y...
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
>

Reply via email to