Re: leader election, scheduled tasks, losing leadership

Vitalii Tymchyshyn Tue, 11 Dec 2012 12:09:37 -0800

I am asking because you have this "at most once" vs "at least one" problem.
I don't think you can have "exactly one" unless your jobs are transactional
and you can synhronize your transaction commits to zookeeper (and better
with two-phase commit). So, you need to decide


What I'd recommend  to you is to make queue-like architecture, not
lock-based. This way you can:
a) Do parallel task processing
b) Try increasing timeouts to be larger than maximum task time.
    E.g. set it to one hour. This would mean that task running will restart
in an hour if client fails.

But this would mean moving from database to zookeeper for task
status/queueing. As for me this would be good as database is SPOF for you.

Best regards, Vitalii Tymchyshyn


2012/12/10 Eric Pederson <[email protected]>

> It depends on the scheduled task.  Some have status fields in the database
> that indicate new/in-progress/done, but others do not.
>
>
> -- Eric
>
>
>
> On Mon, Dec 10, 2012 at 1:49 AM, Vitalii Tymchyshyn <[email protected]
> >wrote:
>
> > How are you going to ensure atomicity? I mean, if you processor dies in
> the
> > middle of the operation, how do you know if it is done or not?
> >
> > --
> > Best regards,
> > Vitalii Tymchyshyn
> > 10 груд. 2012 00:11, "Eric Pederson" <[email protected]> напис.
> >
> > > Also sometimes the app leadership (via LeaderLatch) will get lost - I
> > will
> > > follow up about this on the Curator list:
> > > https://gist.github.com/4247226
> > >
> > > So back to my previous question, what is the best way to implement the
> > > "fence"?
> > >
> > > -- Eric
> > >
> > >
> > >
> > > On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[email protected]>
> wrote:
> > >
> > > > The irony is that I am using leader election to convert
> non-idempotent
> > > > operations into idempotent operations :)   For example, once a night
> a
> > > > report is emailed out to a set of addresses.   We don't want the
> report
> > > to
> > > > go to the same person more than once.
> > > >
> > > > Prior to using leader election one of the cluster members was
> > designated
> > > > as the scheduled task "leader" using a system property.  But if that
> > > > cluster member crashed it required a manual operation to failover the
> > > > "leader" responsibility to another cluster member.   I considered
> using
> > > > app-specific techniques to make the scheduled tasks idempotent (for
> > > example
> > > > using "select for update" / database locking) but I wanted a general
> > > > solution and I needed clustering support for other reasons (cluster
> > > > membership, etc).
> > > >
> > > > Anyway, here is the code that I'm using.
> > > >
> > > > Application startup (using Curator LeaderLatch):
> > > > https://gist.github.com/3936162
> > > > https://gist.github.com/3935895
> > > > https://gist.github.com/3935889
> > > >
> > > > ClusterStatus:
> > > > https://gist.github.com/3943149
> > > > https://gist.github.com/3935861
> > > >
> > > > Scheduled task:
> > > > https://gist.github.com/4246388
> > > >
> > > > In the last gist the "distribute" scheduled task is run every 30
> > seconds.
> > > >   It checks clusterStatus.isLeader to see if the current cluster
> member
> > > is
> > > > the leader before running the real method (which sends email).
> > > > clusterStatus() calls methods on LeaderLatch.
> > > >
> > > > Here is the output that I am seeing if I kill the ZK quorum leader
> and
> > > the
> > > > app cluster member that was the leader loses its LeaderLatch
> leadership
> > > to
> > > > another cluster member:
> > > > https://gist.github.com/4247058
> > > >
> > > >
> > > > -- Eric
> > > >
> > > >
> > > >
> > > > On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <[email protected]
> > > >wrote:
> > > >
> > > >> On 8 December 2012 21:18, Jordan Zimmerman <
> > [email protected]
> > > >> >wrote:
> > > >>
> > > >> > If your ConnectionStateListener gets SUSPENDED or LOST you've lost
> > > >> > connection to ZooKeeper. Therefore you cannot use that same
> > ZooKeeper
> > > >> > connection to manage a node that denotes the process is running or
> > > not.
> > > >> > Only 1 VM at a time will be running the process. That process can
> > > watch
> > > >> for
> > > >> > SUSPENDED/LOST and wind down the task.
> > > >> >
> > > >> >
> > > >> My point is that by the time that VM sees SUSPENDED/LOST, another VM
> > may
> > > >> have been elected leader and have started running another process.
> > > >>
> > > >> It's a classic problem - you need some mechanism to fence a node
> that
> > > >> thinks its the leader, but isn't and hasn't got the memo yet. The
> way
> > > >> around the problem is to either ensure that no work is done by you
> > once
> > > >> you
> > > >> are no longer the leader (perhaps by checking every time you want to
> > do
> > > >> work), or that the work you do does not affect the system (e.g. by
> > > >> idempotent work units).
> > > >>
> > > >> ZK itself solves this internally by checking with that it has a
> quorum
> > > for
> > > >> every operation, which forces an ordering between the disconnection
> > > event
> > > >> and trying to do something that relies upon being the leader. Other
> > > >> systems
> > > >> forcibly terminate old leaders before allowing a new leader to take
> > the
> > > >> throne.
> > > >>
> > > >> Henry
> > > >>
> > > >>
> > > >> > > You can't assume that the notification is received locally
> before
> > > >> another
> > > >> > > leader election finishes elsewhere
> > > >> > Which notification? The ConnectionStateListener is an abstraction
> on
> > > >> > ZooKeeper's watcher mechanism. It's only significant for the VM
> that
> > > is
> > > >> the
> > > >> > leader. Non-leaders don't need to be concerned.
> > > >>
> > > >>
> > > >> > -JZ
> > > >> >
> > > >> > On Dec 8, 2012, at 9:12 PM, Henry Robinson <[email protected]>
> > > wrote:
> > > >> >
> > > >> > > You can't assume that the notification is received locally
> before
> > > >> another
> > > >> > > leader election finishes elsewhere (particularly if you are
> > running
> > > >> > slowly
> > > >> > > for some reason!), so it's not sufficient to guarantee that the
> > > >> process
> > > >> > > that is running locally has finished before someone else starts
> > > >> another.
> > > >> > >
> > > >> > > It's usually best - if possible - to restructure the system so
> > that
> > > >> > > processes are idempotent to work around these kinds of problem,
> in
> > > >> > > conjunction with using the kind of primitives that Curator
> > provides.
> > > >> > >
> > > >> > > Henry
> > > >> > >
> > > >> > > On 8 December 2012 21:04, Jordan Zimmerman <
> > > >> [email protected]
> > > >> > >wrote:
> > > >> > >
> > > >> > >> This is why you need a ConnectionStateListener. You'll get a
> > notice
> > > >> that
> > > >> > >> the connection has been suspended and you should assume all
> > > >> > locks/leaders
> > > >> > >> are invalid.
> > > >> > >>
> > > >> > >> -JZ
> > > >> > >>
> > > >> > >> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[email protected]
> >
> > > >> wrote:
> > > >> > >>
> > > >> > >>> What about a network disconnection? Presumably leadership is
> > > revoked
> > > >> > when
> > > >> > >>> the leader appears to have failed, which can be for more
> reasons
> > > >> than a
> > > >> > >> VM
> > > >> > >>> crash (VM running slow, network event, GC pause etc).
> > > >> > >>>
> > > >> > >>> Henry
> > > >> > >>>
> > > >> > >>> On 8 December 2012 21:00, Jordan Zimmerman <
> > > >> [email protected]
> > > >> > >>> wrote:
> > > >> > >>>
> > > >> > >>>> The leader latch lock is the equivalent of task in progress.
> I
> > > >> assume
> > > >> > >> the
> > > >> > >>>> task is running in the same VM as the leader lock. The only
> > > reason
> > > >> the
> > > >> > >> VM
> > > >> > >>>> would lose leadership is if it crashes in which case the
> > process
> > > >> would
> > > >> > >> die
> > > >> > >>>> anyway.
> > > >> > >>>>
> > > >> > >>>> -JZ
> > > >> > >>>>
> > > >> > >>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]
> >
> > > >> wrote:
> > > >> > >>>>
> > > >> > >>>>> If I recall correctly it was Henry Robinson that gave me the
> > > >> advice
> > > >> > to
> > > >> > >>>> have
> > > >> > >>>>> a "task in progress" check.
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>> -- Eric
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>>
> > > >> > >>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <
> > > [email protected]
> > > >> >
> > > >> > >>>> wrote:
> > > >> > >>>>>
> > > >> > >>>>>> I am using Curator LeaderLatch :)
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>> -- Eric
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>>
> > > >> > >>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman <
> > > >> > >>>>>> [email protected]> wrote:
> > > >> > >>>>>>
> > > >> > >>>>>>> You might check your leader implementation. Writing a
> > correct
> > > >> > leader
> > > >> > >>>>>>> recipe is actually quite challenging due to edge cases.
> > Have a
> > > >> look
> > > >> > >> at
> > > >> > >>>>>>> Curator (disclosure: I wrote it) for an example.
> > > >> > >>>>>>>
> > > >> > >>>>>>> -JZ
> > > >> > >>>>>>>
> > > >> > >>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <
> > [email protected]>
> > > >> > wrote:
> > > >> > >>>>>>>
> > > >> > >>>>>>>> Actually I had the same thought and didn't consider
> having
> > to
> > > >> do
> > > >> > >> this
> > > >> > >>>>>>> until
> > > >> > >>>>>>>> I talked about my project at a Zookeeper User Group a
> month
> > > or
> > > >> so
> > > >> > >> ago
> > > >> > >>>>>>> and I
> > > >> > >>>>>>>> was given this advice.
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> I know that I do see leadership being lost/transferred
> when
> > > >> one of
> > > >> > >> the
> > > >> > >>>>>>> ZK
> > > >> > >>>>>>>> servers is restarted (not the whole ensemble).   And it
> > seems
> > > >> like
> > > >> > >>>> I've
> > > >> > >>>>>>>> seen it happen even when the ensemble stays totally
> stable
> > > >> > (though I
> > > >> > >>>> am
> > > >> > >>>>>>> not
> > > >> > >>>>>>>> 100% sure as it's been a while since I have worked on
> this
> > > >> > >> particular
> > > >> > >>>>>>>> application).
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> -- Eric
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>
> > > >> > >>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman <
> > > >> > >>>>>>>> [email protected]> wrote:
> > > >> > >>>>>>>>
> > > >> > >>>>>>>>> Why would it lose leadership? The only reason I can
> think
> > of
> > > >> is
> > > >> > if
> > > >> > >>>> the
> > > >> > >>>>>>> ZK
> > > >> > >>>>>>>>> cluster goes down. In normal use, the ZK cluster won't
> go
> > > >> down (I
> > > >> > >>>>>>> assume
> > > >> > >>>>>>>>> you're running 3 or 5 instances).
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> -JZ
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <
> > > [email protected]>
> > > >> > >> wrote:
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>> During the time the task is running a cluster member
> > could
> > > >> lose
> > > >> > >> its
> > > >> > >>>>>>>>>> leadership.
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>>>
> > > >> > >>>>>>>
> > > >> > >>>>>>>
> > > >> > >>>>>>
> > > >> > >>>>
> > > >> > >>>>
> > > >> > >>>
> > > >> > >>>
> > > >> > >>> --
> > > >> > >>> Henry Robinson
> > > >> > >>> Software Engineer
> > > >> > >>> Cloudera
> > > >> > >>> 415-994-6679
> > > >> > >>
> > > >> > >>
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Henry Robinson
> > > >> > > Software Engineer
> > > >> > > Cloudera
> > > >> > > 415-994-6679
> > > >> >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> Henry Robinson
> > > >> Software Engineer
> > > >> Cloudera
> > > >> 415-994-6679
> > > >>
> > > >
> > > >
> > >
> >
>



-- 
Best regards,
 Vitalii Tymchyshyn

Re: leader election, scheduled tasks, losing leadership

Reply via email to