Failure Detector Model

                 Key: ZOOKEEPER-702
             Project: Zookeeper
          Issue Type: Wish
            Reporter: Henry Robinson

Failure Detector Module
Possible Mentor
Henry Robinson (henry at apache dot org)

Java, some distributed systems knowledge, comfort implementing distributed 
systems protocols

ZooKeeper servers detects the failure of other servers and clients by counting 
the number of 'ticks' for which it doesn't get a heartbeat from other machines. 
This is the 'timeout' method of failure detection and works very well; however 
it is possible that it is too aggressive and not easily tuned for some more 
unusual ZooKeeper installations (such as in a wide-area network, or even in a 
mobile ad-hoc network).

This project would abstract the notion of failure detection to a dedicated Java 
module, and implement several failure detectors to compare and contrast their 
appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual 
failure detector ( which is much more 
tunable and has some very interesting properties. This is a great project if 
you are interested in distributed algorithms, or want to help re-factor some of 
ZooKeeper's internal code.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to