> My point is that by the time that VM sees SUSPENDED/LOST, another VM may > have been elected leader and have started running another process. There's no way around this, right? ZK is not a transactional system so this edge-case is unsolvable.
> The way > around the problem is to either ensure that no work is done by you once you > are no longer the leader You only release leadership when your work is done. If the cluster becomes unstable then you cancel your work. Leadership is denoted by a ZNode. Curator has a top-level watcher that notifies on cluster instability. How does the fence make this better? -JZ On Dec 8, 2012, at 9:30 PM, Henry Robinson <[email protected]> wrote: > On 8 December 2012 21:18, Jordan Zimmerman <[email protected]>wrote: > >> If your ConnectionStateListener gets SUSPENDED or LOST you've lost >> connection to ZooKeeper. Therefore you cannot use that same ZooKeeper >> connection to manage a node that denotes the process is running or not. >> Only 1 VM at a time will be running the process. That process can watch for >> SUSPENDED/LOST and wind down the task. >> >> > My point is that by the time that VM sees SUSPENDED/LOST, another VM may > have been elected leader and have started running another process. > > It's a classic problem - you need some mechanism to fence a node that > thinks its the leader, but isn't and hasn't got the memo yet. The way > around the problem is to either ensure that no work is done by you once you > are no longer the leader (perhaps by checking every time you want to do > work), or that the work you do does not affect the system (e.g. by > idempotent work units). > > ZK itself solves this internally by checking with that it has a quorum for > every operation, which forces an ordering between the disconnection event > and trying to do something that relies upon being the leader. Other systems > forcibly terminate old leaders before allowing a new leader to take the > throne. > > Henry > > >>> You can't assume that the notification is received locally before another >>> leader election finishes elsewhere >> Which notification? The ConnectionStateListener is an abstraction on >> ZooKeeper's watcher mechanism. It's only significant for the VM that is the >> leader. Non-leaders don't need to be concerned. > > >> -JZ >> >> On Dec 8, 2012, at 9:12 PM, Henry Robinson <[email protected]> wrote: >> >>> You can't assume that the notification is received locally before another >>> leader election finishes elsewhere (particularly if you are running >> slowly >>> for some reason!), so it's not sufficient to guarantee that the process >>> that is running locally has finished before someone else starts another. >>> >>> It's usually best - if possible - to restructure the system so that >>> processes are idempotent to work around these kinds of problem, in >>> conjunction with using the kind of primitives that Curator provides. >>> >>> Henry >>> >>> On 8 December 2012 21:04, Jordan Zimmerman <[email protected] >>> wrote: >>> >>>> This is why you need a ConnectionStateListener. You'll get a notice that >>>> the connection has been suspended and you should assume all >> locks/leaders >>>> are invalid. >>>> >>>> -JZ >>>> >>>> On Dec 8, 2012, at 9:02 PM, Henry Robinson <[email protected]> wrote: >>>> >>>>> What about a network disconnection? Presumably leadership is revoked >> when >>>>> the leader appears to have failed, which can be for more reasons than a >>>> VM >>>>> crash (VM running slow, network event, GC pause etc). >>>>> >>>>> Henry >>>>> >>>>> On 8 December 2012 21:00, Jordan Zimmerman <[email protected] >>>>> wrote: >>>>> >>>>>> The leader latch lock is the equivalent of task in progress. I assume >>>> the >>>>>> task is running in the same VM as the leader lock. The only reason the >>>> VM >>>>>> would lose leadership is if it crashes in which case the process would >>>> die >>>>>> anyway. >>>>>> >>>>>> -JZ >>>>>> >>>>>> On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]> wrote: >>>>>> >>>>>>> If I recall correctly it was Henry Robinson that gave me the advice >> to >>>>>> have >>>>>>> a "task in progress" check. >>>>>>> >>>>>>> >>>>>>> -- Eric >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>>> I am using Curator LeaderLatch :) >>>>>>>> >>>>>>>> >>>>>>>> -- Eric >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> You might check your leader implementation. Writing a correct >> leader >>>>>>>>> recipe is actually quite challenging due to edge cases. Have a look >>>> at >>>>>>>>> Curator (disclosure: I wrote it) for an example. >>>>>>>>> >>>>>>>>> -JZ >>>>>>>>> >>>>>>>>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[email protected]> >> wrote: >>>>>>>>> >>>>>>>>>> Actually I had the same thought and didn't consider having to do >>>> this >>>>>>>>> until >>>>>>>>>> I talked about my project at a Zookeeper User Group a month or so >>>> ago >>>>>>>>> and I >>>>>>>>>> was given this advice. >>>>>>>>>> >>>>>>>>>> I know that I do see leadership being lost/transferred when one of >>>> the >>>>>>>>> ZK >>>>>>>>>> servers is restarted (not the whole ensemble). And it seems like >>>>>> I've >>>>>>>>>> seen it happen even when the ensemble stays totally stable >> (though I >>>>>> am >>>>>>>>> not >>>>>>>>>> 100% sure as it's been a while since I have worked on this >>>> particular >>>>>>>>>> application). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- Eric >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Why would it lose leadership? The only reason I can think of is >> if >>>>>> the >>>>>>>>> ZK >>>>>>>>>>> cluster goes down. In normal use, the ZK cluster won't go down (I >>>>>>>>> assume >>>>>>>>>>> you're running 3 or 5 instances). >>>>>>>>>>> >>>>>>>>>>> -JZ >>>>>>>>>>> >>>>>>>>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[email protected]> >>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> During the time the task is running a cluster member could lose >>>> its >>>>>>>>>>>> leadership. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Henry Robinson >>>>> Software Engineer >>>>> Cloudera >>>>> 415-994-6679 >>>> >>>> >>> >>> >>> -- >>> Henry Robinson >>> Software Engineer >>> Cloudera >>> 415-994-6679 >> >> > > > -- > Henry Robinson > Software Engineer > Cloudera > 415-994-6679
