Re: leader election, scheduled tasks, losing leadership

Eric Pederson Sun, 09 Dec 2012 13:44:06 -0800

The irony is that I am using leader election to convert non-idempotent
operations into idempotent operations :)   For example, once a night a
report is emailed out to a set of addresses.   We don't want the report to
go to the same person more than once.


Prior to using leader election one of the cluster members was designated as
the scheduled task "leader" using a system property.  But if that cluster
member crashed it required a manual operation to failover the "leader"
responsibility to another cluster member.   I considered using app-specific
techniques to make the scheduled tasks idempotent (for example using
"select for update" / database locking) but I wanted a general solution and
I needed clustering support for other reasons (cluster membership, etc).

Anyway, here is the code that I'm using.

Application startup (using Curator LeaderLatch):
https://gist.github.com/3936162
https://gist.github.com/3935895
https://gist.github.com/3935889

ClusterStatus:
https://gist.github.com/3943149
https://gist.github.com/3935861

Scheduled task:
https://gist.github.com/4246388

In the last gist the "distribute" scheduled task is run every 30 seconds.
It checks clusterStatus.isLeader to see if the current cluster member is
the leader before running the real method (which sends email).
clusterStatus() calls methods on LeaderLatch.

Here is the output that I am seeing if I kill the ZK quorum leader and the
app cluster member that was the leader loses its LeaderLatch leadership to
another cluster member:
https://gist.github.com/4247058


-- Eric



On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <[email protected]> wrote:

> On 8 December 2012 21:18, Jordan Zimmerman <[email protected]
> >wrote:
>
> > If your ConnectionStateListener gets SUSPENDED or LOST you've lost
> > connection to ZooKeeper. Therefore you cannot use that same ZooKeeper
> > connection to manage a node that denotes the process is running or not.
> > Only 1 VM at a time will be running the process. That process can watch
> for
> > SUSPENDED/LOST and wind down the task.
> >
> >
> My point is that by the time that VM sees SUSPENDED/LOST, another VM may
> have been elected leader and have started running another process.
>
> It's a classic problem - you need some mechanism to fence a node that
> thinks its the leader, but isn't and hasn't got the memo yet. The way
> around the problem is to either ensure that no work is done by you once you
> are no longer the leader (perhaps by checking every time you want to do
> work), or that the work you do does not affect the system (e.g. by
> idempotent work units).
>
> ZK itself solves this internally by checking with that it has a quorum for
> every operation, which forces an ordering between the disconnection event
> and trying to do something that relies upon being the leader. Other systems
> forcibly terminate old leaders before allowing a new leader to take the
> throne.
>
> Henry
>
>
> > > You can't assume that the notification is received locally before
> another
> > > leader election finishes elsewhere
> > Which notification? The ConnectionStateListener is an abstraction on
> > ZooKeeper's watcher mechanism. It's only significant for the VM that is
> the
> > leader. Non-leaders don't need to be concerned.
>
>
> > -JZ
> >
> > On Dec 8, 2012, at 9:12 PM, Henry Robinson <[email protected]> wrote:
> >
> > > You can't assume that the notification is received locally before
> another
> > > leader election finishes elsewhere (particularly if you are running
> > slowly
> > > for some reason!), so it's not sufficient to guarantee that the process
> > > that is running locally has finished before someone else starts
> another.
> > >
> > > It's usually best - if possible - to restructure the system so that
> > > processes are idempotent to work around these kinds of problem, in
> > > conjunction with using the kind of primitives that Curator provides.
> > >
> > > Henry
> > >
> > > On 8 December 2012 21:04, Jordan Zimmerman <[email protected]
> > >wrote:
> > >
> > >> This is why you need a ConnectionStateListener. You'll get a notice
> that
> > >> the connection has been suspended and you should assume all
> > locks/leaders
> > >> are invalid.
> > >>
> > >> -JZ
> > >>
> > >> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[email protected]>
> wrote:
> > >>
> > >>> What about a network disconnection? Presumably leadership is revoked
> > when
> > >>> the leader appears to have failed, which can be for more reasons
> than a
> > >> VM
> > >>> crash (VM running slow, network event, GC pause etc).
> > >>>
> > >>> Henry
> > >>>
> > >>> On 8 December 2012 21:00, Jordan Zimmerman <
> [email protected]
> > >>> wrote:
> > >>>
> > >>>> The leader latch lock is the equivalent of task in progress. I
> assume
> > >> the
> > >>>> task is running in the same VM as the leader lock. The only reason
> the
> > >> VM
> > >>>> would lose leadership is if it crashes in which case the process
> would
> > >> die
> > >>>> anyway.
> > >>>>
> > >>>> -JZ
> > >>>>
> > >>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]>
> wrote:
> > >>>>
> > >>>>> If I recall correctly it was Henry Robinson that gave me the advice
> > to
> > >>>> have
> > >>>>> a "task in progress" check.
> > >>>>>
> > >>>>>
> > >>>>> -- Eric
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[email protected]>
> > >>>> wrote:
> > >>>>>
> > >>>>>> I am using Curator LeaderLatch :)
> > >>>>>>
> > >>>>>>
> > >>>>>> -- Eric
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
> > >>>>>> [email protected]> wrote:
> > >>>>>>
> > >>>>>>> You might check your leader implementation. Writing a correct
> > leader
> > >>>>>>> recipe is actually quite challenging due to edge cases. Have a
> look
> > >> at
> > >>>>>>> Curator (disclosure: I wrote it) for an example.
> > >>>>>>>
> > >>>>>>> -JZ
> > >>>>>>>
> > >>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[email protected]>
> > wrote:
> > >>>>>>>
> > >>>>>>>> Actually I had the same thought and didn't consider having to do
> > >> this
> > >>>>>>> until
> > >>>>>>>> I talked about my project at a Zookeeper User Group a month or
> so
> > >> ago
> > >>>>>>> and I
> > >>>>>>>> was given this advice.
> > >>>>>>>>
> > >>>>>>>> I know that I do see leadership being lost/transferred when one
> of
> > >> the
> > >>>>>>> ZK
> > >>>>>>>> servers is restarted (not the whole ensemble).   And it seems
> like
> > >>>> I've
> > >>>>>>>> seen it happen even when the ensemble stays totally stable
> > (though I
> > >>>> am
> > >>>>>>> not
> > >>>>>>>> 100% sure as it's been a while since I have worked on this
> > >> particular
> > >>>>>>>> application).
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> -- Eric
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
> > >>>>>>>> [email protected]> wrote:
> > >>>>>>>>
> > >>>>>>>>> Why would it lose leadership? The only reason I can think of is
> > if
> > >>>> the
> > >>>>>>> ZK
> > >>>>>>>>> cluster goes down. In normal use, the ZK cluster won't go down
> (I
> > >>>>>>> assume
> > >>>>>>>>> you're running 3 or 5 instances).
> > >>>>>>>>>
> > >>>>>>>>> -JZ
> > >>>>>>>>>
> > >>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[email protected]>
> > >> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> During the time the task is running a cluster member could
> lose
> > >> its
> > >>>>>>>>>> leadership.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Henry Robinson
> > >>> Software Engineer
> > >>> Cloudera
> > >>> 415-994-6679
> > >>
> > >>
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679
> >
> >
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>

Re: leader election, scheduled tasks, losing leadership

Reply via email to