Also sometimes the app leadership (via LeaderLatch) will get lost - I will follow up about this on the Curator list: https://gist.github.com/4247226
So back to my previous question, what is the best way to implement the "fence"? -- Eric On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[email protected]> wrote: > The irony is that I am using leader election to convert non-idempotent > operations into idempotent operations :) For example, once a night a > report is emailed out to a set of addresses. We don't want the report to > go to the same person more than once. > > Prior to using leader election one of the cluster members was designated > as the scheduled task "leader" using a system property. But if that > cluster member crashed it required a manual operation to failover the > "leader" responsibility to another cluster member. I considered using > app-specific techniques to make the scheduled tasks idempotent (for example > using "select for update" / database locking) but I wanted a general > solution and I needed clustering support for other reasons (cluster > membership, etc). > > Anyway, here is the code that I'm using. > > Application startup (using Curator LeaderLatch): > https://gist.github.com/3936162 > https://gist.github.com/3935895 > https://gist.github.com/3935889 > > ClusterStatus: > https://gist.github.com/3943149 > https://gist.github.com/3935861 > > Scheduled task: > https://gist.github.com/4246388 > > In the last gist the "distribute" scheduled task is run every 30 seconds. > It checks clusterStatus.isLeader to see if the current cluster member is > the leader before running the real method (which sends email). > clusterStatus() calls methods on LeaderLatch. > > Here is the output that I am seeing if I kill the ZK quorum leader and the > app cluster member that was the leader loses its LeaderLatch leadership to > another cluster member: > https://gist.github.com/4247058 > > > -- Eric > > > > On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <[email protected]>wrote: > >> On 8 December 2012 21:18, Jordan Zimmerman <[email protected] >> >wrote: >> >> > If your ConnectionStateListener gets SUSPENDED or LOST you've lost >> > connection to ZooKeeper. Therefore you cannot use that same ZooKeeper >> > connection to manage a node that denotes the process is running or not. >> > Only 1 VM at a time will be running the process. That process can watch >> for >> > SUSPENDED/LOST and wind down the task. >> > >> > >> My point is that by the time that VM sees SUSPENDED/LOST, another VM may >> have been elected leader and have started running another process. >> >> It's a classic problem - you need some mechanism to fence a node that >> thinks its the leader, but isn't and hasn't got the memo yet. The way >> around the problem is to either ensure that no work is done by you once >> you >> are no longer the leader (perhaps by checking every time you want to do >> work), or that the work you do does not affect the system (e.g. by >> idempotent work units). >> >> ZK itself solves this internally by checking with that it has a quorum for >> every operation, which forces an ordering between the disconnection event >> and trying to do something that relies upon being the leader. Other >> systems >> forcibly terminate old leaders before allowing a new leader to take the >> throne. >> >> Henry >> >> >> > > You can't assume that the notification is received locally before >> another >> > > leader election finishes elsewhere >> > Which notification? The ConnectionStateListener is an abstraction on >> > ZooKeeper's watcher mechanism. It's only significant for the VM that is >> the >> > leader. Non-leaders don't need to be concerned. >> >> >> > -JZ >> > >> > On Dec 8, 2012, at 9:12 PM, Henry Robinson <[email protected]> wrote: >> > >> > > You can't assume that the notification is received locally before >> another >> > > leader election finishes elsewhere (particularly if you are running >> > slowly >> > > for some reason!), so it's not sufficient to guarantee that the >> process >> > > that is running locally has finished before someone else starts >> another. >> > > >> > > It's usually best - if possible - to restructure the system so that >> > > processes are idempotent to work around these kinds of problem, in >> > > conjunction with using the kind of primitives that Curator provides. >> > > >> > > Henry >> > > >> > > On 8 December 2012 21:04, Jordan Zimmerman < >> [email protected] >> > >wrote: >> > > >> > >> This is why you need a ConnectionStateListener. You'll get a notice >> that >> > >> the connection has been suspended and you should assume all >> > locks/leaders >> > >> are invalid. >> > >> >> > >> -JZ >> > >> >> > >> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[email protected]> >> wrote: >> > >> >> > >>> What about a network disconnection? Presumably leadership is revoked >> > when >> > >>> the leader appears to have failed, which can be for more reasons >> than a >> > >> VM >> > >>> crash (VM running slow, network event, GC pause etc). >> > >>> >> > >>> Henry >> > >>> >> > >>> On 8 December 2012 21:00, Jordan Zimmerman < >> [email protected] >> > >>> wrote: >> > >>> >> > >>>> The leader latch lock is the equivalent of task in progress. I >> assume >> > >> the >> > >>>> task is running in the same VM as the leader lock. The only reason >> the >> > >> VM >> > >>>> would lose leadership is if it crashes in which case the process >> would >> > >> die >> > >>>> anyway. >> > >>>> >> > >>>> -JZ >> > >>>> >> > >>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]> >> wrote: >> > >>>> >> > >>>>> If I recall correctly it was Henry Robinson that gave me the >> advice >> > to >> > >>>> have >> > >>>>> a "task in progress" check. >> > >>>>> >> > >>>>> >> > >>>>> -- Eric >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[email protected] >> > >> > >>>> wrote: >> > >>>>> >> > >>>>>> I am using Curator LeaderLatch :) >> > >>>>>> >> > >>>>>> >> > >>>>>> -- Eric >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman < >> > >>>>>> [email protected]> wrote: >> > >>>>>> >> > >>>>>>> You might check your leader implementation. Writing a correct >> > leader >> > >>>>>>> recipe is actually quite challenging due to edge cases. Have a >> look >> > >> at >> > >>>>>>> Curator (disclosure: I wrote it) for an example. >> > >>>>>>> >> > >>>>>>> -JZ >> > >>>>>>> >> > >>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[email protected]> >> > wrote: >> > >>>>>>> >> > >>>>>>>> Actually I had the same thought and didn't consider having to >> do >> > >> this >> > >>>>>>> until >> > >>>>>>>> I talked about my project at a Zookeeper User Group a month or >> so >> > >> ago >> > >>>>>>> and I >> > >>>>>>>> was given this advice. >> > >>>>>>>> >> > >>>>>>>> I know that I do see leadership being lost/transferred when >> one of >> > >> the >> > >>>>>>> ZK >> > >>>>>>>> servers is restarted (not the whole ensemble). And it seems >> like >> > >>>> I've >> > >>>>>>>> seen it happen even when the ensemble stays totally stable >> > (though I >> > >>>> am >> > >>>>>>> not >> > >>>>>>>> 100% sure as it's been a while since I have worked on this >> > >> particular >> > >>>>>>>> application). >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> -- Eric >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman < >> > >>>>>>>> [email protected]> wrote: >> > >>>>>>>> >> > >>>>>>>>> Why would it lose leadership? The only reason I can think of >> is >> > if >> > >>>> the >> > >>>>>>> ZK >> > >>>>>>>>> cluster goes down. In normal use, the ZK cluster won't go >> down (I >> > >>>>>>> assume >> > >>>>>>>>> you're running 3 or 5 instances). >> > >>>>>>>>> >> > >>>>>>>>> -JZ >> > >>>>>>>>> >> > >>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[email protected]> >> > >> wrote: >> > >>>>>>>>> >> > >>>>>>>>>> During the time the task is running a cluster member could >> lose >> > >> its >> > >>>>>>>>>> leadership. >> > >>>>>>>>> >> > >>>>>>>>> >> > >>>>>>> >> > >>>>>>> >> > >>>>>> >> > >>>> >> > >>>> >> > >>> >> > >>> >> > >>> -- >> > >>> Henry Robinson >> > >>> Software Engineer >> > >>> Cloudera >> > >>> 415-994-6679 >> > >> >> > >> >> > > >> > > >> > > -- >> > > Henry Robinson >> > > Software Engineer >> > > Cloudera >> > > 415-994-6679 >> > >> > >> >> >> -- >> Henry Robinson >> Software Engineer >> Cloudera >> 415-994-6679 >> > >
