[jira] Commented: (ZOOKEEPER-855) clientPortBindAddress should be clientPortAddress
[ https://issues.apache.org/jira/browse/ZOOKEEPER-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922512#action_12922512 ] Hudson commented on ZOOKEEPER-855: -- Integrated in ZooKeeper-trunk #971 (See [https://hudson.apache.org/hudson/job/ZooKeeper-trunk/971/]) ZOOKEEPER-855. clientPortBindAddress should be clientPortAddress (Jared Cantwell via fpj) clientPortBindAddress should be clientPortAddress - Key: ZOOKEEPER-855 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-855 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.3.0, 3.3.1 Reporter: Jared Cantwell Assignee: Jared Cantwell Priority: Trivial Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-855.patch, ZOOKEEPER-855.patch The server documentation states that the configuration parameter for binding to a specific ip address is clientPortBindAddress. The code believes the parameter is clientPortAddress. The documentation for 3.3.X versions needs changed to reflect the correct parameter . This parameter was added in ZOOKEEPER-635. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Status: Open (was: Patch Available) Missing a test. ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Attachment: ZOOKEEPER-893.patch Adding a test and removing an if statement that became unnecessary with this patch from RecvWorker.run(). I'll be adding a patch for the 3.3 branch shortly. ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922539#action_12922539 ] Thijs Terlouw commented on ZOOKEEPER-893: - Thanks Flavio! I have been too busy to add a testcase and yours looks great! ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Attachment: ZOOKEEPER-893-3.3.patch Thanks, Thijs. Adding 3.3 patch. ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Flavio Junqueira updated ZOOKEEPER-893: --- Status: Patch Available (was: Open) ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - Hadoop Flags: [Reviewed] I just committed this to origin/branch-3.3 and origin/trunk. Thanks both! c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-888) c-client / zkpython: Double free corruption on node watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-888: - Resolution: Fixed Status: Resolved (was: Patch Available) c-client / zkpython: Double free corruption on node watcher --- Key: ZOOKEEPER-888 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-888 Project: Zookeeper Issue Type: Bug Components: c client, contrib-bindings Affects Versions: 3.3.1 Reporter: Lukas Assignee: Lukas Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: resume-segfault.py, ZOOKEEPER-888-3.3.patch, ZOOKEEPER-888.patch the c-client / zkpython wrapper invokes already freed watcher callback steps to reproduce: 0. start a zookeper server on your machine 1. run the attached python script 2. suspend the zookeeper server process (e.g. using `pkill -STOP -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) 3. wait until the connection and the node observer fired with a session event 4. resume the zookeeper server process (e.g. using `pkill -CONT -f org.apache.zookeeper.server.quorum.QuorumPeerMain` ) - the client tries to dispatch the node observer function again, but it was already freed - double free corruption -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-904) super digest is not actually acting as a full superuser
super digest is not actually acting as a full superuser --- Key: ZOOKEEPER-904 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-904 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Reporter: Camille Fournier The documentation states: New in 3.2: Enables a ZooKeeper ensemble administrator to access the znode hierarchy as a super user. In particular no ACL checking occurs for a user authenticated as super. However, if a super user does something like: zk.setACL(/, Ids.READ_ACL_UNSAFE, -1); the super user is now bound by read-only ACL. This is not what I would expect to see given the documentation. It can be fixed by moving the chec for the super authId in PrepRequestProcessor.checkACL to before the for(ACL a : acl) loop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Affects Versions: 3.3.1 Reporter: Nicholas Harteau Priority: Minor Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Harteau updated ZOOKEEPER-905: --- Attachment: zkServer.sh.diff patch to bin/zkserver...@r1024408 enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Affects Versions: 3.3.1 Reporter: Nicholas Harteau Priority: Minor Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-905) enhance zkServer.sh for easier zookeeper automation-izing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Harteau updated ZOOKEEPER-905: --- Affects Version/s: (was: 3.3.1) Release Note: hm, is it easier to attach a patch here? Status: Patch Available (was: Open) patch against zkserver...@r1024408 enhance zkServer.sh for easier zookeeper automation-izing - Key: ZOOKEEPER-905 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-905 Project: Zookeeper Issue Type: Improvement Components: scripts Reporter: Nicholas Harteau Priority: Minor Attachments: zkServer.sh.diff zkServer.sh is good at starting zookeeper and figuring out the right options to pass along. unfortunately if you want to wrap zookeeper startup/shutdown in any significant way, you have to reimplement a bunch of the logic there. the attached patch addresses a couple simple issues: 1. add a 'start-foreground' option to zkServer.sh - this allows things that expect to manage a foregrounded process (daemontools, launchd, etc) to use zkServer.sh instead of rolling their own to launch zookeeper 2. add a 'print-cmd' option to zkServer.sh - rather than launching zookeeper from the script, just give me the command you'd normally use to exec zookeeper. I found this useful when writing automation to start/stop zookeeper as part of smoke testing zookeeper-based applications 3. Deal more gracefully with supplying alternate configuration files to zookeeper - currently the script assumes all config files reside in $ZOOCFGDIR - also useful for smoke testing 4. communicate extra info (JMX enabled) about zookeeper on STDERR rather than STDOUT (necessary for #2) 5. fixes an issue on macos where readlink doesn't have the '-f' option. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-893) ZooKeeper high cpu usage when invalid requests
[ https://issues.apache.org/jira/browse/ZOOKEEPER-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-893: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 Great work, thanks! ZooKeeper high cpu usage when invalid requests -- Key: ZOOKEEPER-893 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-893 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.3.1 Environment: Linux 2.6.16 4x Intel(R) Xeon(R) CPU X3320 @ 2.50GHz java version 1.6.0_17 Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode) Reporter: Thijs Terlouw Assignee: Thijs Terlouw Priority: Critical Fix For: 3.3.2, 3.4.0 Attachments: ZOOKEEPER-893-3.3.patch, ZOOKEEPER-893.patch, ZOOKEEPER-893.patch Original Estimate: 1h Remaining Estimate: 1h When ZooKeeper receives certain illegally formed messages on the internal communication port (:4181 by default), it's possible for ZooKeeper to enter an infinite loop which causes 100% cpu usage. It's related to ZOOKEEPER-427, but that patch does not resolve all issues. from: src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java the two affected parts: === int length = msgLength.getInt(); if(length = 0) { throw new IOException(Invalid packet length: + length); } === === while (message.hasRemaining()) { temp_numbytes = channel.read(message); if(temp_numbytes 0) { throw new IOException(Channel eof before end); } numbytes += temp_numbytes; } === how to replicate this bug: perform an nmap portscan against your zookeeper server: nmap -sV -n your.ip.here -p4181 wait for a while untill you see some messages in the logfile and then you will see 100% cpu usage. It does not recover from this situation. With my patch, it does not occur anymore -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abmar Barros updated ZOOKEEPER-702: --- Attachment: ZOOKEEPER-702.patch After making some more experiments with the Phi Accrual, I have noticed that the exponential distribution fits the ping inter-arrival sampling window better. Then, I have added a new option for the PhiAccrual called 'dist', that is the distribution used to model the inter-arrivals. Two possible values for this parameter are 'norm' and 'exp', and the default is 'exp'. When we set the PhiAccrual to use the exponential distribution, it will work similar to the Cassandra's failure detector. GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-702) GSoC 2010: Failure Detector Model
[ https://issues.apache.org/jira/browse/ZOOKEEPER-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abmar Barros updated ZOOKEEPER-702: --- Status: Patch Available (was: Open) GSoC 2010: Failure Detector Model - Key: ZOOKEEPER-702 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-702 Project: Zookeeper Issue Type: Wish Reporter: Henry Robinson Assignee: Abmar Barros Attachments: bertier-pseudo.txt, bertier-pseudo.txt, chen-pseudo.txt, chen-pseudo.txt, phiaccrual-pseudo.txt, phiaccrual-pseudo.txt, ZOOKEEPER-702-code.patch, ZOOKEEPER-702-doc.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch, ZOOKEEPER-702.patch Failure Detector Module Possible Mentor Henry Robinson (henry at apache dot org) Requirements Java, some distributed systems knowledge, comfort implementing distributed systems protocols Description ZooKeeper servers detects the failure of other servers and clients by counting the number of 'ticks' for which it doesn't get a heartbeat from other machines. This is the 'timeout' method of failure detection and works very well; however it is possible that it is too aggressive and not easily tuned for some more unusual ZooKeeper installations (such as in a wide-area network, or even in a mobile ad-hoc network). This project would abstract the notion of failure detection to a dedicated Java module, and implement several failure detectors to compare and contrast their appropriateness for ZooKeeper. For example, Apache Cassandra uses a phi-accrual failure detector (http://ddsg.jaist.ac.jp/pub/HDY+04.pdf) which is much more tunable and has some very interesting properties. This is a great project if you are interested in distributed algorithms, or want to help re-factor some of ZooKeeper's internal code. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-835) Refactoring Zookeeper Client Code
[ https://issues.apache.org/jira/browse/ZOOKEEPER-835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12922813#action_12922813 ] Benjamin Reed commented on ZOOKEEPER-835: - how do you see any of these things as related to ZOOKEEPER-22? Refactoring Zookeeper Client Code - Key: ZOOKEEPER-835 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-835 Project: Zookeeper Issue Type: Improvement Components: java client Affects Versions: 3.3.1 Reporter: Patrick Datko Assignee: Thomas Koch Thomas Koch asked me to fill individual issues for the points raised in his mail to zookeeper-dev: [Mail of Thomas Koch| http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-dev/201008.mbox/%3c20100845.17507.tho...@koch.ro%3e ] He published several issues, which are present in the current zookeeper client, so a refactoring of the code would be an facility for other developers working with zookeeper. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-906) Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client
Improve C client connection reliability by making it sleep between reconnect attempts as in Java Client --- Key: ZOOKEEPER-906 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-906 Project: Zookeeper Issue Type: Improvement Components: c client Affects Versions: 3.3.1 Reporter: Radu Marin Currently, when a C client get disconnected, it retries a couple of hosts (not all) with no delay between attempts and then if it doesn't succeed it sleeps for 1/3 session expiration timeout period before trying again. In the worst case the disconnect event can occur after 2/3 of session expiration timeout has past, and sleeping for even more 1/3 session timeout will cause a session loss in most of the times. A better approach is to check all hosts but with random delay between reconnect attempts. Also the delay must be independent of session timeout so if we increase the session timeout we also increase the number of available attempts. This improvement covers the case when the C client experiences network problems for a short period of time and is not able to reach any zookeeper hosts. Java client already uses this logic and works very good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.