Re: leader election, scheduled tasks, losing leadership

Eric Pederson Sun, 09 Dec 2012 14:11:28 -0800

Also sometimes the app leadership (via LeaderLatch) will get lost - I will
follow up about this on the Curator list:
https://gist.github.com/4247226


So back to my previous question, what is the best way to implement the
"fence"?

-- Eric



On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[email protected]> wrote:

> The irony is that I am using leader election to convert non-idempotent
> operations into idempotent operations :)   For example, once a night a
> report is emailed out to a set of addresses.   We don't want the report to
> go to the same person more than once.
>
> Prior to using leader election one of the cluster members was designated
> as the scheduled task "leader" using a system property.  But if that
> cluster member crashed it required a manual operation to failover the
> "leader" responsibility to another cluster member.   I considered using
> app-specific techniques to make the scheduled tasks idempotent (for example
> using "select for update" / database locking) but I wanted a general
> solution and I needed clustering support for other reasons (cluster
> membership, etc).
>
> Anyway, here is the code that I'm using.
>
> Application startup (using Curator LeaderLatch):
> https://gist.github.com/3936162
> https://gist.github.com/3935895
> https://gist.github.com/3935889
>
> ClusterStatus:
> https://gist.github.com/3943149
> https://gist.github.com/3935861
>
> Scheduled task:
> https://gist.github.com/4246388
>
> In the last gist the "distribute" scheduled task is run every 30 seconds.
>   It checks clusterStatus.isLeader to see if the current cluster member is
> the leader before running the real method (which sends email).
> clusterStatus() calls methods on LeaderLatch.
>
> Here is the output that I am seeing if I kill the ZK quorum leader and the
> app cluster member that was the leader loses its LeaderLatch leadership to
> another cluster member:
> https://gist.github.com/4247058
>
>
> -- Eric
>
>
>
> On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <[email protected]>wrote:
>
>> On 8 December 2012 21:18, Jordan Zimmerman <[email protected]
>> >wrote:
>>
>> > If your ConnectionStateListener gets SUSPENDED or LOST you've lost
>> > connection to ZooKeeper. Therefore you cannot use that same ZooKeeper
>> > connection to manage a node that denotes the process is running or not.
>> > Only 1 VM at a time will be running the process. That process can watch
>> for
>> > SUSPENDED/LOST and wind down the task.
>> >
>> >
>> My point is that by the time that VM sees SUSPENDED/LOST, another VM may
>> have been elected leader and have started running another process.
>>
>> It's a classic problem - you need some mechanism to fence a node that
>> thinks its the leader, but isn't and hasn't got the memo yet. The way
>> around the problem is to either ensure that no work is done by you once
>> you
>> are no longer the leader (perhaps by checking every time you want to do
>> work), or that the work you do does not affect the system (e.g. by
>> idempotent work units).
>>
>> ZK itself solves this internally by checking with that it has a quorum for
>> every operation, which forces an ordering between the disconnection event
>> and trying to do something that relies upon being the leader. Other
>> systems
>> forcibly terminate old leaders before allowing a new leader to take the
>> throne.
>>
>> Henry
>>
>>
>> > > You can't assume that the notification is received locally before
>> another
>> > > leader election finishes elsewhere
>> > Which notification? The ConnectionStateListener is an abstraction on
>> > ZooKeeper's watcher mechanism. It's only significant for the VM that is
>> the
>> > leader. Non-leaders don't need to be concerned.
>>
>>
>> > -JZ
>> >
>> > On Dec 8, 2012, at 9:12 PM, Henry Robinson <[email protected]> wrote:
>> >
>> > > You can't assume that the notification is received locally before
>> another
>> > > leader election finishes elsewhere (particularly if you are running
>> > slowly
>> > > for some reason!), so it's not sufficient to guarantee that the
>> process
>> > > that is running locally has finished before someone else starts
>> another.
>> > >
>> > > It's usually best - if possible - to restructure the system so that
>> > > processes are idempotent to work around these kinds of problem, in
>> > > conjunction with using the kind of primitives that Curator provides.
>> > >
>> > > Henry
>> > >
>> > > On 8 December 2012 21:04, Jordan Zimmerman <
>> [email protected]
>> > >wrote:
>> > >
>> > >> This is why you need a ConnectionStateListener. You'll get a notice
>> that
>> > >> the connection has been suspended and you should assume all
>> > locks/leaders
>> > >> are invalid.
>> > >>
>> > >> -JZ
>> > >>
>> > >> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[email protected]>
>> wrote:
>> > >>
>> > >>> What about a network disconnection? Presumably leadership is revoked
>> > when
>> > >>> the leader appears to have failed, which can be for more reasons
>> than a
>> > >> VM
>> > >>> crash (VM running slow, network event, GC pause etc).
>> > >>>
>> > >>> Henry
>> > >>>
>> > >>> On 8 December 2012 21:00, Jordan Zimmerman <
>> [email protected]
>> > >>> wrote:
>> > >>>
>> > >>>> The leader latch lock is the equivalent of task in progress. I
>> assume
>> > >> the
>> > >>>> task is running in the same VM as the leader lock. The only reason
>> the
>> > >> VM
>> > >>>> would lose leadership is if it crashes in which case the process
>> would
>> > >> die
>> > >>>> anyway.
>> > >>>>
>> > >>>> -JZ
>> > >>>>
>> > >>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]>
>> wrote:
>> > >>>>
>> > >>>>> If I recall correctly it was Henry Robinson that gave me the
>> advice
>> > to
>> > >>>> have
>> > >>>>> a "task in progress" check.
>> > >>>>>
>> > >>>>>
>> > >>>>> -- Eric
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[email protected]
>> >
>> > >>>> wrote:
>> > >>>>>
>> > >>>>>> I am using Curator LeaderLatch :)
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> -- Eric
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
>> > >>>>>> [email protected]> wrote:
>> > >>>>>>
>> > >>>>>>> You might check your leader implementation. Writing a correct
>> > leader
>> > >>>>>>> recipe is actually quite challenging due to edge cases. Have a
>> look
>> > >> at
>> > >>>>>>> Curator (disclosure: I wrote it) for an example.
>> > >>>>>>>
>> > >>>>>>> -JZ
>> > >>>>>>>
>> > >>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[email protected]>
>> > wrote:
>> > >>>>>>>
>> > >>>>>>>> Actually I had the same thought and didn't consider having to
>> do
>> > >> this
>> > >>>>>>> until
>> > >>>>>>>> I talked about my project at a Zookeeper User Group a month or
>> so
>> > >> ago
>> > >>>>>>> and I
>> > >>>>>>>> was given this advice.
>> > >>>>>>>>
>> > >>>>>>>> I know that I do see leadership being lost/transferred when
>> one of
>> > >> the
>> > >>>>>>> ZK
>> > >>>>>>>> servers is restarted (not the whole ensemble).   And it seems
>> like
>> > >>>> I've
>> > >>>>>>>> seen it happen even when the ensemble stays totally stable
>> > (though I
>> > >>>> am
>> > >>>>>>> not
>> > >>>>>>>> 100% sure as it's been a while since I have worked on this
>> > >> particular
>> > >>>>>>>> application).
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> -- Eric
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
>> > >>>>>>>> [email protected]> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> Why would it lose leadership? The only reason I can think of
>> is
>> > if
>> > >>>> the
>> > >>>>>>> ZK
>> > >>>>>>>>> cluster goes down. In normal use, the ZK cluster won't go
>> down (I
>> > >>>>>>> assume
>> > >>>>>>>>> you're running 3 or 5 instances).
>> > >>>>>>>>>
>> > >>>>>>>>> -JZ
>> > >>>>>>>>>
>> > >>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[email protected]>
>> > >> wrote:
>> > >>>>>>>>>
>> > >>>>>>>>>> During the time the task is running a cluster member could
>> lose
>> > >> its
>> > >>>>>>>>>> leadership.
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>
>> > >>>>>>>
>> > >>>>>>
>> > >>>>
>> > >>>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Henry Robinson
>> > >>> Software Engineer
>> > >>> Cloudera
>> > >>> 415-994-6679
>> > >>
>> > >>
>> > >
>> > >
>> > > --
>> > > Henry Robinson
>> > > Software Engineer
>> > > Cloudera
>> > > 415-994-6679
>> >
>> >
>>
>>
>> --
>> Henry Robinson
>> Software Engineer
>> Cloudera
>> 415-994-6679
>>
>
>

Re: leader election, scheduled tasks, losing leadership

Reply via email to