What about a network disconnection? Presumably leadership is revoked when the leader appears to have failed, which can be for more reasons than a VM crash (VM running slow, network event, GC pause etc).
Henry On 8 December 2012 21:00, Jordan Zimmerman <[email protected]>wrote: > The leader latch lock is the equivalent of task in progress. I assume the > task is running in the same VM as the leader lock. The only reason the VM > would lose leadership is if it crashes in which case the process would die > anyway. > > -JZ > > On Dec 8, 2012, at 8:56 PM, Eric Pederson <[email protected]> wrote: > > > If I recall correctly it was Henry Robinson that gave me the advice to > have > > a "task in progress" check. > > > > > > -- Eric > > > > > > > > On Sat, Dec 8, 2012 at 11:54 PM, Eric Pederson <[email protected]> > wrote: > > > >> I am using Curator LeaderLatch :) > >> > >> > >> -- Eric > >> > >> > >> > >> > >> On Sat, Dec 8, 2012 at 11:52 PM, Jordan Zimmerman < > >> [email protected]> wrote: > >> > >>> You might check your leader implementation. Writing a correct leader > >>> recipe is actually quite challenging due to edge cases. Have a look at > >>> Curator (disclosure: I wrote it) for an example. > >>> > >>> -JZ > >>> > >>> On Dec 8, 2012, at 8:49 PM, Eric Pederson <[email protected]> wrote: > >>> > >>>> Actually I had the same thought and didn't consider having to do this > >>> until > >>>> I talked about my project at a Zookeeper User Group a month or so ago > >>> and I > >>>> was given this advice. > >>>> > >>>> I know that I do see leadership being lost/transferred when one of the > >>> ZK > >>>> servers is restarted (not the whole ensemble). And it seems like > I've > >>>> seen it happen even when the ensemble stays totally stable (though I > am > >>> not > >>>> 100% sure as it's been a while since I have worked on this particular > >>>> application). > >>>> > >>>> > >>>> > >>>> -- Eric > >>>> > >>>> > >>>> > >>>> On Sat, Dec 8, 2012 at 11:25 PM, Jordan Zimmerman < > >>>> [email protected]> wrote: > >>>> > >>>>> Why would it lose leadership? The only reason I can think of is if > the > >>> ZK > >>>>> cluster goes down. In normal use, the ZK cluster won't go down (I > >>> assume > >>>>> you're running 3 or 5 instances). > >>>>> > >>>>> -JZ > >>>>> > >>>>> On Dec 8, 2012, at 8:17 PM, Eric Pederson <[email protected]> wrote: > >>>>> > >>>>>> During the time the task is running a cluster member could lose its > >>>>>> leadership. > >>>>> > >>>>> > >>> > >>> > >> > > -- Henry Robinson Software Engineer Cloudera 415-994-6679
