[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-11-23 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-11-23 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

This patch:
* Fixes the findbugs warnings
* Fixes the testSessionTimeout() failure on the SessionTest 

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-11-23 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Patch Available  (was: Open)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-11-02 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

Hi Flavio, Thanks for the comments. In this patch I have removed the normal 
distribution option for the PhiAccrual, and also the math-commons dependency in 
the ivy.xml file. With this modification, the PhiTimeoutEvaluator (and its 
implementations) are no longer needed.

I also have refactored some of the common code regarding the ping sampling 
window. This made the code of the adaptive FDs clearer.

Regarding my last experiments, I will post the results on the wiki asap.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-11-02 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

* Updated with trunk.
* Fixed the standard deviation calculus in the InterArrivalSamplingWindow.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-10-28 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

Hi Abmar, Thanks for the addition to the patch. I was wondering if it is really 
a good idea to have both options, normal and exponential, implemented. Since 
your experiments have shown that exponential performs better, why don't use it 
only? Also, I was wondering if you have posted expertimental numbers showing 
that exponential performs better. 

In the case we go with exponential only, then we don't need the modification to 
ivy.xml, right?

And last comment, it doesn't look like the classes implementing 
PhiTimeoutEvaluator need to be public. Is this right?  

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-10-26 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-702:


Fix Version/s: 3.4.0

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Fix For: 3.4.0

 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-10-19 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

After making some more experiments with the Phi Accrual, I have noticed that 
the exponential distribution fits the ping inter-arrival sampling window 
better. 
Then, I have added a new option for the PhiAccrual called 'dist', that is the 
distribution used to model the inter-arrivals. 
Two possible values for this parameter are 'norm' and 'exp', and the default is 
'exp'. When we set the PhiAccrual to use the exponential distribution, it will 
work similar to the Cassandra's failure detector.


 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-10-19 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Patch Available  (was: Open)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-28 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Patch Available  (was: Open)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-28 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

I forgot to mention that the patch does not apply cleanly. I had to delete the 
first two lines (generated by eclipse), but once I did it applied cleanly. 
Abmar, could you upload a new patch? My +1 still holds...

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-28 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

Fixed the issue on the patch generation that Flavio mentioned.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-26 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

Thanks for the feedback Flavio, I have fixed the issue you mentioned. Only the 
new quorum hammer tests were not working propertly, they were creating the 
default failure detector in all the test cases. This patch addresses this issue.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-25 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

Thanks for the updated patch, Abmar. The new tests, however, are working as 
expected. More specifically, the methods in QuorumBase (createLearnersFD and 
createSessionsFD) are not being overridden as expected, which affects all new 
hammer tests. I haven't checked the other tests, but I suspect they suffer from 
the same problem.

I'm canceling the patch for now. 

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-23 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Patch Available  (was: Open)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-21 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch:
* Replaced 'heartbeat' for 'ping' in the forrest documentation
* Revised javadocs
* Expanded the hammer (client and quorum) and recovery tests to exercise all 
failure detectors implementations.

Flavio, I have run the tests you mentioned and they seem to fail in the trunk 
also.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-13 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch:
* Added support for declaring new failure detectors, as Ivan suggested.
* Updated documentation to reflect this.
* Replaced all 'heartbeat' mentions in the API and tests for 'ping'.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-09-07 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira updated ZOOKEEPER-702:
---

Status: Open  (was: Patch Available)

Cancelling patch until comments are addressed.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-08-30 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

I this patch I have addressed all the issues Ivan and Flavio raised, but the 
reflection code on the FaiilureDetectorFactory, in which I am working.

What do you think is the most user-friendly way of configuring the failure 
detector? (using reflection on the FailureDetectorFactory). 

Should we require the fully classified class name in the fd.classname option 
and the parameters should be like fd.param1, fd.param2...? In this sense, 
should we force the failure detectors classes to have methods like 
setParam1(String param1), setParam2(String param2)...?

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-08-15 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch
ZOOKEEPER-702-code.patch
ZOOKEEPER-702-doc.patch

In this patch:
* Fixed a delay in client heartbeat sending, that was causing unit tests to run 
slowly.
* Added a new parameter to phi accrual FD - the minimum window size to begin 
timeout adaptation.
* Added a moderation step to Bertier FD, to comply with second level task 
described in his paper.
* Made Chen's alpha parameter configurable, and not a quarter of the timeout.
* Removed ^M characters from the patch lines.

I also have splitted the patch in two parts: one containing the failure 
detector module source code (ZOOKEEPER-702-code.patch) and another containing 
the documentation (ZOOKEEPER-702-doc.patch).

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-08-14 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch:
* Added forrest docs
* Modified some initialization parameters of the algorithms

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-08-10 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

* Synchronized patch with updates from trunk
* Put all changes from subtasks together in the this patch

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-08-10 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Status: Patch Available  (was: Open)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-29 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch I removed some tests modifications that were not intended to be 
committed.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-28 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch:
 * Refactored SessionTracker and Heartbeat tracking on learners.
 * Fixed an issue concerning server recovery.

Patches of the subtasks use this patch as a baseline. 

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-19 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch: 
* Learners report clients heartbeat sample information (mean and standard 
deviation) to Leaders. 

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-07 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

In this patch:

* Adapted failure detectors to work on both client and server sides of the 
client-server monitoring.
* Refactored server-side code to use the FailureDetector interface.
* Created a new FailureDetector, which groups monitoreds by their tick time, 
similar to what the SessionTracking does.
* Made the failure detector (and its parameters) options of the ZooKeeper 
configuration file.
* Corrected indentation (spaces instead of tabs).

Next steps are:
* As suggested by Flavio, create sub tasks in this Jira for:
** Experimentation.
** Refactor server-server monitoring to use FailureDetector interface and make 
it configurable.
** Forrest documentation.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, 
 ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-02 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: bertier-pseudo.txt
chen-pseudo.txt
phiaccrual-pseudo.txt

Added more comments to pseudo codes.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-07-02 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

Patch changes are:
* Created appMessageReceived() and appMessageSent() methods. These methods 
allow the Failure Detector to use application messages as heartbeats, which 
reflects the ZooKeeper case.
* Added command line options to configure failure detector method and its 
parameters.
* Unit tests expanded to comply with new methods.
* Enhanced javadocs for each Failure Detector implementation
* Added commons-math dependency to ivy.xml

Next things to work on are:
* Adapt server side code to use the proposed FD interface
* Start experimenting

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, 
 chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, 
 ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-06-22 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

Agreed with what Flavio said about the second point. The application scheduling 
interval is, indeed, much lower than the FD pinging interval.

Attached the implementations of all initially suggested FDs and their unit 
tests. Also have included the suggestions Flavio gave concerning package naming 
and method scope. 

Once the Phi Accrual implementation needs to compute the Normal Distribution 
cdf, I have used the math-commons API, however, it still has to be added as an 
Ivy dependency.

So, before experimenting, there are a few steps that need to be accomplished:

* Create a receiveAppHeartbeat() method on the interface so that we can keep 
using zookeeper messages as heartbeats and analyze its impact and adapt it to 
each FD implementation
* Adapt server side code to use the proposed FD interface
* Add comments to pseudo codes
* Expand unit tests
* Enhance code documentation
* Add math-commons dependency to Ivy

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, chen-pseudo.txt, 
 phiaccrual-pseudo.txt, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-06-11 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: bertier-pseudo.txt
chen-pseudo.txt
phiaccrual-pseudo.txt

Attached the pseudo-codes for the failure detectors I have proposed. It seems I 
need to include the heartbeat sequence number in the receiveHeartbeat method of 
the FailureDetector interface in order to implement some of these FDs. 

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: bertier-pseudo.txt, chen-pseudo.txt, 
 phiaccrual-pseudo.txt, ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-06-05 Thread Abmar Barros (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abmar Barros updated ZOOKEEPER-702:
---

Attachment: ZOOKEEPER-702.patch

This patch proposes a first version of the FailureDetector interface. Its 
implementations are intended to run in the same thread of the application. 
Also, I have implemented a simple failure detector based on the ClientCnxn 
class and adapted this class to use this FD implementation. 
This is a preliminary patch, just for analysis and feedback, it is not intended 
to be committed.

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson
Assignee: Abmar Barros
 Attachments: ZOOKEEPER-702.patch


 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model

2010-03-12 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-702:
-

Summary: GSoC 2010: Failure Detector Model  (was: Failure Detector Model)

 GSoC 2010: Failure Detector Model
 -

 Key: ZOOKEEPER-702
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702
 Project: Zookeeper
  Issue Type: Wish
Reporter: Henry Robinson

 Failure Detector Module
 Possible Mentor
 Henry Robinson (henry at apache dot org)
 Requirements
 Java, some distributed systems knowledge, comfort implementing distributed 
 systems protocols
 Description
 ZooKeeper servers detects the failure of other servers and clients by 
 counting the number of 'ticks' for which it doesn't get a heartbeat from 
 other machines. This is the 'timeout' method of failure detection and works 
 very well; however it is possible that it is too aggressive and not easily 
 tuned for some more unusual ZooKeeper installations (such as in a wide-area 
 network, or even in a mobile ad-hoc network).
 This project would abstract the notion of failure detection to a dedicated 
 Java module, and implement several failure detectors to compare and contrast 
 their appropriateness for ZooKeeper. For example, Apache Cassandra uses a 
 phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which 
 is much more tunable and has some very interesting properties. This is a 
 great project if you are interested in distributed algorithms, or want to 
 help re-factor some of ZooKeeper's internal code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.