See here: http://curator.apache.org/errors.html

You should set a ConnectionStateListener. If you get SUSPENDED, assume that 
your lock is lost and cancel whatever activity you are doing.

-Jordan


On February 11, 2015 at 2:32:31 AM, Jipeng Tan ([email protected]) wrote:

Hi All,

I would like ask a question about  the best practice to handle leader 
re-election when using InterProcessSemaphoreMutex.

Here is the background:

I am using curator to as a task coordinator for a list of concurrent running 
jobs.
For every minute, every worker (each work run on a separate VM) try to acquire 
a task (lock) from zk by calling:
    
    lock = new InterProcessSemaphoreMutex(zkClient, task); 
    boolean hasLock = false;
    hasLock = lock.acquire(1, TimeUnit.SECONDS);

If the work get the lock, it will do the task.

The issue is somehow the Zookeeper Quorum is not stable so that the leader 
re-election happens around twice a week. If leader re-election happens, all the 
VMs lost connection to zk quorum and they all need to re-reconnect to zk quorum 
again. Sometime I observed one VM could own two locks at the same time.

So I am wonder what should do in order to avoid the impact of leader 
re-election.

Thanks,
Jipeng

Reply via email to