Re: leader election, scheduled tasks, losing leadership

Jordan Zimmerman Sat, 08 Dec 2012 21:42:03 -0800

> My point is that by the time that VM sees SUSPENDED/LOST, another VM may
> have been elected leader and have started running another process.
There's no way around this, right? ZK is not a transactional system so this 
edge-case is unsolvable.


> The way
> around the problem is to either ensure that no work is done by you once you
> are no longer the leader

You only release leadership when your work is done. If the cluster becomes 
unstable then you cancel your work. Leadership is denoted by a ZNode. Curator 
has a top-level watcher that notifies on cluster instability. How does the 
fence make this better?

-JZ

On Dec 8, 2012, at 9:30 PM, Henry Robinson <[email protected]> wrote:

> On 8 December 2012 21:18, Jordan Zimmerman <[email protected]>wrote:
> 
>> If your ConnectionStateListener gets SUSPENDED or LOST you've lost
>> connection to ZooKeeper. Therefore you cannot use that same ZooKeeper
>> connection to manage a node that denotes the process is running or not.
>> Only 1 VM at a time will be running the process. That process can watch for
>> SUSPENDED/LOST and wind down the task.
>> 
>> 
> My point is that by the time that VM sees SUSPENDED/LOST, another VM may
> have been elected leader and have started running another process.
> 
> It's a classic problem - you need some mechanism to fence a node that
> thinks its the leader, but isn't and hasn't got the memo yet. The way
> around the problem is to either ensure that no work is done by you once you
> are no longer the leader (perhaps by checking every time you want to do
> work), or that the work you do does not affect the system (e.g. by
> idempotent work units).
> 
> ZK itself solves this internally by checking with that it has a quorum for
> every operation, which forces an ordering between the disconnection event
> and trying to do something that relies upon being the leader. Other systems
> forcibly terminate old leaders before allowing a new leader to take the
> throne.
> 
> Henry
> 
> 
>>> You can't assume that the notification is received locally before another
>>> leader election finishes elsewhere
>> Which notification? The ConnectionStateListener is an abstraction on
>> ZooKeeper's watcher mechanism. It's only significant for the VM that is the
>> leader. Non-leaders don't need to be concerned.
> 
> 
>> -JZ
>> 
>> On Dec 8, 2012, at 9:12 PM, Henry Robinson <[email protected]> wrote:
>> 
>>> You can't assume that the notification is received locally before another
>>> leader election finishes elsewhere (particularly if you are running
>> slowly
>>> for some reason!), so it's not sufficient to guarantee that the process
>>> that is running locally has finished before someone else starts another.
>>> 
>>> It's usually best - if possible - to restructure the system so that
>>> processes are idempotent to work around these kinds of problem, in
>>> conjunction with using the kind of primitives that Curator provides.
>>> 
>>> Henry
>>> 
>>> On 8 December 2012 21:04, Jordan Zimmerman <[email protected]
>>> wrote:
>>> 
>>>> This is why you need a ConnectionStateListener. You'll get a notice that
>>>> the connection has been suspended and you should assume all
>> locks/leaders
>>>> are invalid.
>>>> 
>>>> -JZ
>>>> 
>>>> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[email protected]> wrote:
>>>> 
>>>>> What about a network disconnection? Presumably leadership is revoked
>> when
>>>>> the leader appears to have failed, which can be for more reasons than a
>>>> VM
>>>>> crash (VM running slow, network event, GC pause etc).
>>>>> 
>>>>> Henry
>>>>> 
>>>>> On 8 December 2012 21:00, Jordan Zimmerman <[email protected]
>>>>> wrote:
>>>>> 
>>>>>> The leader latch lock is the equivalent of task in progress. I assume
>>>> the
>>>>>> task is running in the same VM as the leader lock. The only reason the
>>>> VM
>>>>>> would lose leadership is if it crashes in which case the process would
>>>> die
>>>>>> anyway.
>>>>>> 
>>>>>> -JZ
>>>>>> 
>>>>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]> wrote:
>>>>>> 
>>>>>>> If I recall correctly it was Henry Robinson that gave me the advice
>> to
>>>>>> have
>>>>>>> a "task in progress" check.
>>>>>>> 
>>>>>>> 
>>>>>>> -- Eric
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> I am using Curator LeaderLatch :)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- Eric
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
>>>>>>>> [email protected]> wrote:
>>>>>>>> 
>>>>>>>>> You might check your leader implementation. Writing a correct
>> leader
>>>>>>>>> recipe is actually quite challenging due to edge cases. Have a look
>>>> at
>>>>>>>>> Curator (disclosure: I wrote it) for an example.
>>>>>>>>> 
>>>>>>>>> -JZ
>>>>>>>>> 
>>>>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[email protected]>
>> wrote:
>>>>>>>>> 
>>>>>>>>>> Actually I had the same thought and didn't consider having to do
>>>> this
>>>>>>>>> until
>>>>>>>>>> I talked about my project at a Zookeeper User Group a month or so
>>>> ago
>>>>>>>>> and I
>>>>>>>>>> was given this advice.
>>>>>>>>>> 
>>>>>>>>>> I know that I do see leadership being lost/transferred when one of
>>>> the
>>>>>>>>> ZK
>>>>>>>>>> servers is restarted (not the whole ensemble).   And it seems like
>>>>>> I've
>>>>>>>>>> seen it happen even when the ensemble stays totally stable
>> (though I
>>>>>> am
>>>>>>>>> not
>>>>>>>>>> 100% sure as it's been a while since I have worked on this
>>>> particular
>>>>>>>>>> application).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> -- Eric
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Why would it lose leadership? The only reason I can think of is
>> if
>>>>>> the
>>>>>>>>> ZK
>>>>>>>>>>> cluster goes down. In normal use, the ZK cluster won't go down (I
>>>>>>>>> assume
>>>>>>>>>>> you're running 3 or 5 instances).
>>>>>>>>>>> 
>>>>>>>>>>> -JZ
>>>>>>>>>>> 
>>>>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[email protected]>
>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> During the time the task is running a cluster member could lose
>>>> its
>>>>>>>>>>>> leadership.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Henry Robinson
>>>>> Software Engineer
>>>>> Cloudera
>>>>> 415-994-6679
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Henry Robinson
>>> Software Engineer
>>> Cloudera
>>> 415-994-6679
>> 
>> 
> 
> 
> -- 
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679

Re: leader election, scheduled tasks, losing leadership

Reply via email to