Yes. It is related to session expiration. The key trade-off is how real-time your client heartbeats can be and how many false positive detections of failure you can stand to have. If your client can be very reliable about delivering heartbeats, then you can reliably detect failure very quickly. In a normal Linux environment, however, it is very hard to guarantee detection without false positives with expiration less than about 2 seconds. Remember that it isn't just your application that has to be pretty precise here, but also the secondary ZK and the ZK leader. If you have an unfortunate concatenation of short delays for whatever reason, you can blow out a short deadline pretty easily.
I do not know what limits Zookeeper sets on lower bounds for session expiration time. In general, if you need sub-second fail-over, you probably need to use an active-active architecture. For instance, if you are recording streamed data, the master can write the data normally and the non-master can buffer several seconds of data. On failure, the non-master can become master and write however much of the buffered data as necessary to complete the record. If you need real-time response from a master, you generally have to go with something like duplicate masters. In such a system, requests go to multiple machines and both reply as quickly as possible. Duplicate replies are dropped on the floor. On failure, you run with the survivor while you recruit a new duplicate master. On Tue, Jul 19, 2011 at 9:30 AM, zookenthu <[email protected]> wrote: > Is it related to session timeout? if so what is the recommended minimum for > session timeout? > > -- > View this message in context: > http://zookeeper-user.578899.n2.nabble.com/What-is-the-time-it-takes-for-one-of-the-follower-nodes-to-be-elected-as-leader-tp6599284p6599379.html > Sent from the zookeeper-user mailing list archive at Nabble.com. >
