Uncharacteristically, I think that Ben's comments could use a little bit of
amplification.

First, ZK is designed with certain guarantees in mind and almost all
operational characteristics flow logically from these guarantees.

The guarantee that Ben mentioned here in passing is that if a client gets
session expiration, it is *guaranteed* that the ephemerals have been cleaned
up.  This guarantee is what drives the notification of session expiration
after reconnection since while the client is disconnected, it cannot know if
the cluster is operating correctly or not and thus cannot know if the
ephemerals have been cleaned up yet.  The only way to have certain knowledge
that the cluster has cleaned up the ephemerals is to get back in touch with
an operating cluster.

The client is not completely in the dark.  As Ben implied, it can know that
the cluster is unavailable (it got a ConnectionLoss event, after all).
 While the cluster is unavailable and before it gets a session expiration
notification, the client can go into safe mode.

The moral of this story is that to get the most out of ZK, it is best to
adopt the same guarantee based design process that drove ZK in the first
place.  The first step is that you have to decide what guarantees that you
want to provide and then work from ZK's guarantees to get to yours.

In the classic leader-election use of ZK, the key guarantee that we want is:

- the number of leaders is less than or equal to 1

Note that you can't guarantee that the number == 1, because other stuff
could happen.  This has nothing to do with ZK.

The pertinent ZK guarantees are:

- an ephemeral file can only be created by a single session

- deletion of an ephemeral file due to loss of client connection will occur
after the client gets a connection loss

- deletion of an ephemeral file will precede delivery of a session
expiration event to the owner

Phrased in terms of CSP-like constructs, the client has events BecomeMaster,
EnterSafeMode, ExitSafeMode, RelinquishMaster and Crash that must occur
according to this grammar:

client := (
   (BecomeMaster; (EnterSafeMode; ExitSafeMode)*;
EnterSafeMode?; RelinquishMaster)
 | (BecomeMaster; (EnterSafeMode; ExitSafeMode)*; EnterSafeMode?; Crash)
 | Crash
 )*

To get the guarantees that we want, we can require the client to only do
BecomeMaster after it creates an ephemeral file and require it to either
Crash, RelinquishMaster or EnterSafeMode before that ephemeral file is
deleted.  The only way that we can do that is to immediately do
EnterSafeMode on connection loss and then do RelinquishMaster on session
expiration or ExitSafeMode on connection restored.  It is involved, but you
can actually do a proof of correctness from this that shows that your
guarantee will be honored even in the presence of ZK or the client crashing
or being partitioned.



On Tue, Aug 17, 2010 at 9:26 AM, Benjamin Reed <br...@yahoo-inc.com> wrote:

> there are two things to keep in mind when thinking about this issue:
>
> 1) if a zk client is disconnected from the cluster, the client is
> essentially in limbo. because the client cannot talk to a server it cannot
> know if its session is still alive. it also cannot close its session.
>
> 2) the client only finds out about session expiration events when the
> client reconnects to the cluster. if zk tells a client that its session is
> expired, the ephemerals that correspond to that session will already be
> cleaned up.
>
> one of the main design points about zk is that zk only gives correct
> information. if zk cannot give correct information, it basically says "i
> don't know". connection loss exceptions and disconnected states are
> basically "i don't know".
>
> generally applications we design go into a "safe" mode, meaning they may
> serve reads but reject changes, when disconnected from zk and only kill
> themselves when they find out their session has expired.
>
> ben
>
> ps - session information is replicated to all zk servers, so if a leader
> dies, all replicas know the sessions that are currently active and their
> timeouts.
>
> On 08/16/2010 09:03 PM, Ted Dunning wrote:
>
>> Ben or somebody else will have to repeat some of the detailed logic for
>> this, but it has
>> to do with the fact that you can't be sure what has happened during the
>> network partition.
>> One possibility is the one you describe, but another is that the partition
>> happened because
>> a majority of the ZK cluster lost power and you can't see the remaining
>> nodes.  Those nodes
>> will continue to serve any files in a read-only fashion.  If the partition
>> involves you losing
>> contact with the entire cluster at the same time a partition of the
>> cluster
>> into a quorum and
>> a minority happens, then your ephemeral files could continue to exist at
>> least until the breach
>> in the cluster itself is healed.
>>
>> Suffice it to say that there are only a few strategies that leave you with
>> a
>> coherent picture
>> of the universe.  Importantly, you shouldn't assume that the ephemerals
>> will
>> disappear at
>> the same time as the session expiration event is delivered.
>>
>> On Mon, Aug 16, 2010 at 8:31 PM, Qing Yan<qing...@gmail.com>  wrote:
>>
>>
>>
>>> Ouch, is this the current ZK behavior? This is unexpected, if the
>>> client get partitioned from ZK cluster, he should
>>> get notified and take some action(e.g. commit suicide) otherwise how
>>> to tell a ephemeral node is really
>>> up or down? Zombie can create synchronization nightmares..
>>>
>>>
>>>
>>> On Mon, Aug 16, 2010 at 7:22 PM, Dave Wright<wrig...@gmail.com>  wrote:
>>>
>>>
>>>> Another possible cause for this that I ran into recently with the c
>>>>
>>>>
>>> client -
>>>
>>>
>>>> you don't get the session expired notification until you are reconnected
>>>>
>>>>
>>> to
>>>
>>>
>>>> the quorum and it informs you the session is lost.  If you get
>>>>
>>>>
>>> disconnected
>>>
>>>
>>>> and can't reconnect you won't get the notification.  Personally I think
>>>>
>>>>
>>> the
>>>
>>>
>>>> client api should track the session expiration time locally and
>>>>
>>>>
>>> information
>>>
>>>
>>>> you once it's expired.
>>>>
>>>> On Aug 16, 2010 2:09 AM, "Qing Yan"<qing...@gmail.com>  wrote:
>>>>
>>>> Hi Ted,
>>>>
>>>>  Do you mean GC problem can prevent delivery of SESSION EXPIRE event?
>>>> Hum...so you have met this problem before?
>>>> I didn't see any OOM though, will look into it more.
>>>>
>>>>
>>>> On Mon, Aug 16, 2010 at 12:46 PM, Ted Dunning<ted.dunn...@gmail.com>
>>>>
>>>>
>>> wrote:
>>>
>>>
>>>> I am assuming that y...
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Reply via email to