Hi, I'm getting some strange behaviour when stopping zookeeper in one environment that I can't reproduce locally. The result is that the leader selector "quits" even though it is set as auto-requeue. (I think that happens because the retry loop inside LeaderSelector checks the interrupt-flag, which is set again even when I cleared it).
I think it boils down to getting 2013-11-04 18:22:32,501 INFO [main-EventThread ] c.n.c.f.state.ConnectionStateManager - State change: LOST 2013-11-04 18:22:32,501 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener - Interrupting thread Thread[LeaderSelector-0,5,main] 2013-11-04 18:22:32,503 INFO [main-EventThread ] c.n.c.f.state.ConnectionStateManager - State change: SUSPENDED 2013-11-04 18:22:32,504 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener - Interrupting thread Thread[LeaderSelector-0,5,main] ... then I handle the interrupt in the leader thread. Then I get this: 2013-11-04 18:22:36,465 INFO [main-EventThread ] c.n.c.f.state.ConnectionStateManager - State change: LOST 2013-11-04 18:22:36,465 INFO [main-EventThread ] c.n.c.f.state.ConnectionStateManager - State change: SUSPENDED 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener - StateChanged: LOST 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener - Interrupting thread Thread[LeaderSelector-0,5,main] 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener - StateChanged: SUSPENDED 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener - Interrupting thread Thread[LeaderSelector-0,5,main] Full log is here: https://gist.github.com/zerd/7316258 The code follows the old leader selector example pretty well: @Override public void takeLeadership(CuratorFramework curatorFramework) throws Exception { ourThread = Thread.currentThread(); logger.debug(format("(%s) Got leadership", ourThread)); try { waitForAndPerformWork(); } catch (InterruptedException e) { logger.debug(format("(%s) Interrupted ", ourThread), e); } finally { logger.debug(format("(%s) No longer leader", ourThread)); } } @Override public void stateChanged(CuratorFramework curatorFramework, ConnectionState newState) { logger.debug("StateChanged: " + newState); if ((newState == ConnectionState.LOST) || (newState == ConnectionState.SUSPENDED)) { if (ourThread != null) { logger.debug("Interrupting thread " + ourThread); ourThread.interrupt(); } else { logger.debug("Thread is null"); } } } Is it supposed to go back and forth from lost to suspended? My goal is to get it to resume trying to get the leadership when zookeeper comes back. Do I have to requeue it manually when this happens? Would upgrading to latest curator with CancelLeadershipException fix this? Thank you very much for your time. -- Henrik Nordvik
