The process of leadership loss due to network issues is this: * The current leader's ZooKeeper client will try to send its heartbeat * The send will fail due to network failure * The ZK client will pass KeeperState.Disconnected and cause Curator to go into SUSPENDED mode. Any users of leader selectors, etc. should see this and consider themselves no longer leader * Because the heartbeat is 2/3 of a session, there will be no other leader at this point * After session expiration, the ZK server will delete the ephemeral node and cause the process of leader selection, etc.
So, as you can see, if you properly monitor the connection state you never have to worry about proper leadership. -Jordan On January 27, 2015 at 4:21:40 AM, Ricardo Ferreira ([email protected]) wrote: Hello all, I'm using Zookeeper (through Curator) for leadership election on an application I'm currently developing. Some operations can only be written by one of the members of the cluster, so I'm using a node's leadership status to this effect. For example, let's say a message arrives to all nodes but only one should do something about it. What I do is to check if the node is leader (through LeaderSelector#hasLeadership()), and if it is then it does whatever it needs to do with the message. If it's not, it simply ignores it. My question is this: How can I be sure a node really is the leader? What if the leader node is actually down but this hasn't been detected by Zookeeper yet? Then the non-leader nodes will ignore the message and it will never be processed. How is this usually dealt with? Regards
