This sounds like a variation of https://issues.apache.org/jira/browse/CURATOR-54 - The next release of Curator (later this week) provides a more robust way of canceling leadership that doesn’t require thread interruption.
-Jordan On Nov 5, 2013, at 1:47 AM, Henrik Nordvik <[email protected]> wrote: > Hi, > > I'm getting some strange behaviour when stopping zookeeper in one environment > that I can't reproduce locally. > The result is that the leader selector "quits" even though it is set as > auto-requeue. (I think that happens because the retry loop inside > LeaderSelector checks the interrupt-flag, which is set again even when I > cleared it). > > I think it boils down to getting > > 2013-11-04 18:22:32,501 INFO [main-EventThread ] > c.n.c.f.state.ConnectionStateManager - State change: LOST > 2013-11-04 18:22:32,501 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener > - Interrupting thread Thread[LeaderSelector-0,5,main] > 2013-11-04 18:22:32,503 INFO [main-EventThread ] > c.n.c.f.state.ConnectionStateManager - State change: SUSPENDED > 2013-11-04 18:22:32,504 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener > - Interrupting thread Thread[LeaderSelector-0,5,main] > > ... then I handle the interrupt in the leader thread. > > Then I get this: > 2013-11-04 18:22:36,465 INFO [main-EventThread ] > c.n.c.f.state.ConnectionStateManager - State change: LOST > 2013-11-04 18:22:36,465 INFO [main-EventThread ] > c.n.c.f.state.ConnectionStateManager - State change: SUSPENDED > 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener > - StateChanged: LOST > 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener > - Interrupting thread Thread[LeaderSelector-0,5,main] > 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener > - StateChanged: SUSPENDED > 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener > - Interrupting thread Thread[LeaderSelector-0,5,main] > > > Full log is here: https://gist.github.com/zerd/7316258 > > The code follows the old leader selector example pretty well: > > @Override > public void takeLeadership(CuratorFramework curatorFramework) throws > Exception { > ourThread = Thread.currentThread(); > logger.debug(format("(%s) Got leadership", ourThread)); > try { > waitForAndPerformWork(); > } catch (InterruptedException e) { > logger.debug(format("(%s) Interrupted ", ourThread), e); > } finally { > logger.debug(format("(%s) No longer leader", ourThread)); > } > } > > @Override > public void stateChanged(CuratorFramework curatorFramework, > ConnectionState newState) { > logger.debug("StateChanged: " + newState); > > if ((newState == ConnectionState.LOST) || (newState == > ConnectionState.SUSPENDED)) { > if (ourThread != null) { > logger.debug("Interrupting thread " + ourThread); > ourThread.interrupt(); > } else { > logger.debug("Thread is null"); > } > } > } > > Is it supposed to go back and forth from lost to suspended? > My goal is to get it to resume trying to get the leadership when zookeeper > comes back. Do I have to requeue it manually when this happens? > Would upgrading to latest curator with CancelLeadershipException fix this? > > Thank you very much for your time. > > -- > Henrik Nordvik
