[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

Abmar Barros (JIRA) Tue, 22 Jun 2010 20:59:21 -0700

     [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Abmar Barros updated ZOOKEEPER-702:
-----------------------------------

    Attachment: ZOOKEEPER-702.patch

Agreed with what Flavio said about the second point. The application scheduling 
interval is, indeed, much lower than the FD pinging interval.

Attached the implementations of all initially suggested FDs and their unit 
tests. Also have included the suggestions Flavio gave concerning package naming 
and method scope. 

Once the Phi Accrual implementation needs to compute the Normal Distribution 
cdf, I have used the math-commons API, however, it still has to be added as an 
Ivy dependency.

So, before experimenting, there are a few steps that need to be accomplished:

* Create a receiveAppHeartbeat() method on the interface so that we can keep 
using zookeeper messages as heartbeats and analyze its impact and adapt it to 
each FD implementation
* Adapt server side code to use the proposed FD interface
* Add comments to pseudo codes
* Expand unit tests
* Enhance code documentation
* Add math-commons dependency to Ivy

> GSoC 2010: Failure Detector Model
> ---------------------------------
>
>                 Key: ZOOKEEPER-702
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
>             Project: Zookeeper
>          Issue Type: Wish
>            Reporter: Henry Robinson
>            Assignee: Abmar Barros
>         Attachments: bertier-pseudo.txt, chen-pseudo.txt, 
> phiaccrual-pseudo.txt, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch
>
>
> Failure Detector Module
> Possible Mentor
> Henry Robinson (henry at apache dot org)
> Requirements
> Java, some distributed systems knowledge, comfort implementing distributed 
> systems protocols
> Description
> ZooKeeper servers detects the failure of other servers and clients by 
> counting the number of 'ticks' for which it doesn't get a heartbeat from 
> other machines. This is the 'timeout' method of failure detection and works 
> very well; however it is possible that it is too aggressive and not easily 
> tuned for some more unusual ZooKeeper installations (such as in a wide-area 
> network, or even in a mobile ad-hoc network).
> This project would abstract the notion of failure detection to a dedicated 
> Java module, and implement several failure detectors to compare and contrast 
> their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
> phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
> is much more tunable and has some very interesting properties. This is a 
> great project if you are interested in distributed algorithms, or want to 
> help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

Reply via email to