Hi,

We have a Zookeeper ensemble that spend across multiple data centers (each 
participant is in a different datacenter). Recently, we ran into an issue when 
trying to support low session time (5 seconds). We set tickTime to be 2 seconds 
and syncLimit to 25.

The using case is a single master. We can only have one master at any given 
time. The active master create an ephemeral node. The backup master watch of 
this ephemeral node to be deleted before it take over the master role.

The active master is connecting to the follower (F1) in its data center. We 
believe that a network delay between F1 and the leader cause the touchTable to 
not propagate in a timely manner. The leader decide to close the session due to 
timeout.  Ephemeral node delete event reach the other follower (F2) before the 
close session event reach F1. The backup master which is connecting to F2  got 
the ephemeral delete and assume the role of the active master.

>From our log,  the active master saw session expire event 14 seconds after the 
>backup master receive ephemeral node delete event.

I tried to looked at code, but from my current understanding. We don't have 
logic that enforce upper bound in which a particular follower can lag behind 
(in term of data tree processing). This means some part of the system may see 
that the lock is release is before the previous owner release them.

Another issue that I saw is in this case that,  the client maintains internal 
clock on when its session should expire based on its connectivity with the 
follow. However, the leader internal clock (session tracker) use information 
that get relayed from the follower via touchTable.  As a result, the both party 
may decide when the session is expired differently if there are network issue 
between follower and leader.

Our internal Zookeeper is based on 3.4.3.

--
Thawan Kooburat

Reply via email to