This is CAP. In a distributed system, each node must be able to deal with 
network partitions. If you are partitioned, you cannot know the status of the 
cluster. Therefore, your code must be able to deal with a situation where it’s 
not possible to know leaders, locks, etc.

If an action which is the leader's responsibility to execute (e.g. write to DB) 
must be performed during that time it will be lost.
If you can’t determine the leader, your application must go into a suspended 
state until the partition is repaired. There is no way around this.

-JZ


On January 27, 2015 at 12:50:43 PM, Ricardo Ferreira ([email protected]) 
wrote:

Jordan,

That is exactly the problem I'm facing: at a given moment you can be 
leader-less.

If an action which is the leader's responsibility to execute (e.g. write to DB) 
must be performed during that time it will be lost.

And from what you described, it gets worse: If a non-leader dies, leadership 
election will take place and a new leader might be elected, right?

On Tue, Jan 27, 2015 at 5:23 PM, Jordan Zimmerman <[email protected]> 
wrote:
The process of leadership loss due to network issues is this:

* The current leader's ZooKeeper client will try to send its heartbeat
* The send will fail due to network failure
* The ZK client will pass KeeperState.Disconnected and cause Curator to go into 
SUSPENDED mode. Any users of leader selectors, etc. should see this and 
consider themselves no longer leader
* Because the heartbeat is 2/3 of a session, there will be no other leader at 
this point
* After session expiration, the ZK server will delete the ephemeral node and 
cause the process of leader selection, etc.

So, as you can see, if you properly monitor the connection state you never have 
to worry about proper leadership.

-Jordan


On January 27, 2015 at 4:21:40 AM, Ricardo Ferreira ([email protected]) 
wrote:

Hello all,


I'm using Zookeeper (through Curator) for leadership election on an application 
I'm currently developing.

Some operations can only be written by one of the members of the cluster, so 
I'm using a node's leadership
status to this effect. 

For example, let's say a message arrives to all nodes but only one should do 
something about it. What I do
is to check if the node is leader (through LeaderSelector#hasLeadership()), and 
if it is then it does whatever
it needs to do with the message. If it's not, it simply ignores it.

My question is this: How can I be sure a node really is the leader? What if the 
leader node is actually down but
this hasn't been detected by Zookeeper yet? Then the non-leader nodes will 
ignore the message and it will 
never be processed.

How is this usually dealt with?


Regards

Reply via email to