This sounds like a variation of 
https://issues.apache.org/jira/browse/CURATOR-54 - The next release of Curator 
(later this week) provides a more robust way of canceling leadership that 
doesn’t require thread interruption.

-Jordan

On Nov 5, 2013, at 1:47 AM, Henrik Nordvik <[email protected]> wrote:

> Hi,
> 
> I'm getting some strange behaviour when stopping zookeeper in one environment 
> that I can't reproduce locally.
> The result is that the leader selector "quits" even though it is set as 
> auto-requeue. (I think that happens because the retry loop inside 
> LeaderSelector checks the interrupt-flag, which is set again even when I 
> cleared it).
> 
> I think it boils down to getting
> 
> 2013-11-04 18:22:32,501 INFO  [main-EventThread    ] 
> c.n.c.f.state.ConnectionStateManager      - State change: LOST
> 2013-11-04 18:22:32,501 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener  
>       - Interrupting thread Thread[LeaderSelector-0,5,main]
> 2013-11-04 18:22:32,503 INFO  [main-EventThread    ] 
> c.n.c.f.state.ConnectionStateManager      - State change: SUSPENDED
> 2013-11-04 18:22:32,504 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener  
>       - Interrupting thread Thread[LeaderSelector-0,5,main]
> 
> ... then I handle the interrupt in the leader thread.
> 
> Then I get this:
> 2013-11-04 18:22:36,465 INFO  [main-EventThread    ] 
> c.n.c.f.state.ConnectionStateManager      - State change: LOST
> 2013-11-04 18:22:36,465 INFO  [main-EventThread    ] 
> c.n.c.f.state.ConnectionStateManager      - State change: SUSPENDED
> 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener  
>       - StateChanged: LOST 
> 2013-11-04 18:22:36,465 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener  
>       - Interrupting thread Thread[LeaderSelector-0,5,main]
> 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener  
>       - StateChanged: SUSPENDED 
> 2013-11-04 18:22:36,466 DEBUG [ectionStateManager-0] s.f.s.a.feed.MyListener  
>       - Interrupting thread Thread[LeaderSelector-0,5,main]
> 
> 
> Full log is here: https://gist.github.com/zerd/7316258
> 
> The code follows the old leader selector example pretty well:
> 
>     @Override
>     public void takeLeadership(CuratorFramework curatorFramework) throws 
> Exception {
>         ourThread = Thread.currentThread();
>         logger.debug(format("(%s) Got leadership", ourThread));
>         try {
>             waitForAndPerformWork();
>         } catch (InterruptedException e) {
>             logger.debug(format("(%s) Interrupted ", ourThread), e);
>         } finally {
>             logger.debug(format("(%s) No longer leader", ourThread));
>         }
>     }
> 
>     @Override
>     public void stateChanged(CuratorFramework curatorFramework, 
> ConnectionState newState) {
>         logger.debug("StateChanged: " + newState);
> 
>         if ((newState == ConnectionState.LOST) || (newState == 
> ConnectionState.SUSPENDED)) {
>             if (ourThread != null) {
>                 logger.debug("Interrupting thread " + ourThread);
>                 ourThread.interrupt();
>             } else {
>                 logger.debug("Thread is null");
>             }
>         }
>     }
> 
> Is it supposed to go back and forth from lost to suspended?
> My goal is to get it to resume trying to get the leadership when zookeeper 
> comes back. Do I have to requeue it manually when this happens?
> Would upgrading to latest curator with CancelLeadershipException fix this?
> 
> Thank you very much for your time.
> 
> --
> Henrik Nordvik

Reply via email to