[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909119#action_12909119 ]
Flavio Junqueira commented on ZOOKEEPER-702: -------------------------------------------- Hi Abmar, I believe my comments from Aug 18 have not been addressed. In particular, I still see the same javadoc issues and no test exercising the failure detectors with a running ensemble. I believe the default failure detector is naturally exercised in various tests, but it would be good to have tests that also exercise the other failure detectors in a running ensemble. Isn't it right? There are still several references in javadocs and in the documentation to "heartbeat". Should we replace them with "ping" for consistency? I'm also seeing some tests failing, like ObserverTest and NioNettySuiteHammerTest, but I'm not sure if this is related to this patch. I will explore a little further. > GSoC 2010: Failure Detector Model > --------------------------------- > > Key: ZOOKEEPER-702 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 > Project: Zookeeper > Issue Type: Wish > Reporter: Henry Robinson > Assignee: Abmar Barros > Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, > chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, > ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, > ZOOKEEPER-702.patch, ZOOKEEPER-702.patch > > > Failure Detector Module > Possible Mentor > Henry Robinson (henry at apache dot org) > Requirements > Java, some distributed systems knowledge, comfort implementing distributed > systems protocols > Description > ZooKeeper servers detects the failure of other servers and clients by > counting the number of 'ticks' for which it doesn't get a heartbeat from > other machines. This is the 'timeout' method of failure detection and works > very well; however it is possible that it is too aggressive and not easily > tuned for some more unusual ZooKeeper installations (such as in a wide-area > network, or even in a mobile ad-hoc network). > This project would abstract the notion of failure detection to a dedicated > Java module, and implement several failure detectors to compare and contrast > their appropriateness for ZooKeeper. For example, Apache Cassandra uses a > phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which > is much more tunable and has some very interesting properties. This is a > great project if you are interested in distributed algorithms, or want to > help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.