[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738964#action_12738964 ] Hudson commented on ZOOKEEPER-481: -- Integrated in ZooKeeper-trunk #404 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/]) . Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) Add lastMessageSent to QuorumCnxManager --- Key: ZOOKEEPER-481 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.1.1, 3.2.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-481-branch3.2.patch, ZOOKEEPER-481-branch3.2.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch Currently we rely on TCP for reliable delivery of FLE messages. However, as we concurrently drop and create new connections, it is possible that a message is sent but never received. With this patch, cnx manager keeps a list of last messages sent, and resends the last one sent. Receiving multiples copies is harmless. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-479) QuorumHierarchical does not count groups correctly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738966#action_12738966 ] Hudson commented on ZOOKEEPER-479: -- Integrated in ZooKeeper-trunk #404 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/]) . QuorumHierarchical does not count groups correctly (flavio via mahadev) QuorumHierarchical does not count groups correctly -- Key: ZOOKEEPER-479 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-479 Project: Zookeeper Issue Type: Bug Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-479-branch3.2.patch, ZOOKEEPER-479.patch, ZOOKEEPER-479.patch, ZOOKEEPER-479.patch QuorumHierarchical::containsQuorum should not verify if all groups represented in the input set have more than half of the total weight. Instead, it should check only for an overall majority of groups. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower
[ https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-480. - Resolution: Fixed Hadoop Flags: [Reviewed] I just committed this. thanks flavio. FLE should perform leader check when node is not leading and add vote of follower - Key: ZOOKEEPER-480 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480 Project: Zookeeper Issue Type: Bug Affects Versions: 3.2.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch As a server may join leader election while others have already elected a leader, it is necessary that a server handles some special cases of leader election when notifications are from servers that are either LEADING or FOLLOWING. In such special cases, we check if we have received a message from the leader to declare a leader elected. This check does not consider the case that the process performing the check might be a recently elected leader, and consequently the check fails. This patch also adds a new case, which corresponds to adding a vote to recvset when the notification is from a process LEADING or FOLLOWING. This fixes the case raised in ZOOKEEPER-475. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-491) Prevent zero-weight servers from being elected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-491: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 for the patch. I just committed this. thanks flavio! Prevent zero-weight servers from being elected -- Key: ZOOKEEPER-491 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491 Project: Zookeeper Issue Type: New Feature Components: leaderElection Affects Versions: 3.2.0 Reporter: Flavio Paiva Junqueira Assignee: Flavio Paiva Junqueira Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch This is a fix to prevent zero-weight servers from being elected leaders. This will allow in wide-area scenarios to restrict the set of servers that can lead the ensemble. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-475. - Resolution: Fixed given ZOOKEEPER-479, ZOOKEEPER-480, ZOOKEEPER-481 have been fixed, this should be fixed. FLENewEpochTest failed on nightly builds. - Key: ZOOKEEPER-475 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 Project: Zookeeper Issue Type: Bug Components: quorum Affects Versions: 3.2.0 Reporter: Mahadev konar Assignee: Flavio Paiva Junqueira Priority: Blocker Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-475.patch, ZOOKEEPER-475.patch THe flenewepochtest failed on one of the nightly builds - http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-368: - Attachment: observers.patch obs-refactor.patch Here is both a slightly modified version of the refactor patch, and a patch containing the new code for Observers. I have included some tests now as well. The Observer implementation is simplified from previous patches. I have added new methods to QuorumPeer to get at both the entire view of the ensemble, the voting view (containing Followers) and the observing view. To use an Observer, in the ensemble config file append :observer to the description for any server you want to be an Observer. So for example write: server.3:localhost:2181:3181:observer In the Observer's own config file, add a line with the option peerType=observer I will probably in the future remove these slightly redundant specifications, but for now you will need both. You must apply the patches in order; the refactor patch first. Both patches apply cleanly for me using patch -p0 against a clean checkout of trunk as of tonight (Aug 4th). Observers - Key: ZOOKEEPER-368 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 Project: Zookeeper Issue Type: New Feature Components: quorum Reporter: Flavio Paiva Junqueira Assignee: Henry Robinson Attachments: obs-refactor.patch, observer-refactor.patch, observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch Currently, all servers of an ensemble participate actively in reaching agreement on the order of ZooKeeper transactions. That is, all followers receive proposals, acknowledge them, and receive commit messages from the leader. A leader issues commit messages once it receives acknowledgments from a quorum of followers. For cross-colo operation, it would be useful to have a third role: observer. Using Paxos terminology, observers are similar to learners. An observer does not participate actively in the agreement step of the atomic broadcast protocol. Instead, it only commits proposals that have been accepted by some quorum of followers. One simple solution to implement observers is to have the leader forwarding commit messages not only to followers but also to observers, and have observers applying transactions according to the order followers agreed upon. In the current implementation of the protocol, however, commit messages do not carry their corresponding transaction payload because all servers different from the leader are followers and followers receive such a payload first through a proposal message. Just forwarding commit messages as they currently are to an observer consequently is not sufficient. We have a couple of options: 1- Include the transaction payload along in commit messages to observers; 2- Send proposals to observers as well. Number 2 is simpler to implement because it doesn't require changing the protocol implementation, but it increases traffic slightly. The performance impact due to such an increase might be insignificant, though. For scalability purposes, we may consider having followers also forwarding commit messages to observers. With this option, observers can connect to followers, and receive messages from followers. This choice is important to avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Unending Leader Elections in WAN deploy
Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, Most of the patches that you mention should be in the branch 3.2 by tomm or so. 481, 479 are already in. 480 and 491 should be in by tomm. Would that suffice for you? Thanks mahadev On 8/3/09 4:21 PM, Todd Greenwood to...@audiencescience.com wrote: Another problem...I've reverted to the latest versions of the patches that are not specific to branch-3.2, and I'm getting two compilation errors: build-generated: [javac] Compiling 44 source files to /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch-3.2/build/classes compile-main: [javac] Compiling 2 source files to /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch-3.2/build/classes [javac] /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch- 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru mStats.java:30: name clash: getQuorumPeers() and getQuorumPeers() have the same erasure [javac] public String[] getQuorumPeers(); [javac] ^ [javac] /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch- 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru mStats.java:31: name clash: getServerState() and getServerState() have the same erasure [javac] public String getServerState(); [javac] ^ [javac] 2 errors My build process is pretty simple: 1. copy the branch-3.2 source to a temp directory (src/patched/branch-3.2) 2. apply the ZOOKEEPER patches in my patches directory 3. build zookeeper in the temp directory -Todd -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Monday, August 03, 2009 4:09 PM To: zookeeper-u...@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Flavio, I notice that you've updated the patches referenced for the WAN deployment. There appears to be an order dependency w/ respect to these four patches... ZOOKEEPER-473.patch ZOOKEEPER-479-branch3.2.patch ZOOKEEPER-481-branch3.2.patch ZOOKEEPER-491.patch 473 - 479 (479 fails) to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/patched/branch-3.2$ patch -p0 ../patches/ZOOKEEPER-479-branch3.2.patch patching file src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch ical.java patching file src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java patching file
Re: Unending Leader Elections in WAN deploy
Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, Most of the patches that you mention should be in the branch 3.2 by tomm or so. 481, 479 are already in. 480 and 491 should be in by tomm. Would that suffice for you? Thanks mahadev On 8/3/09 4:21 PM, Todd Greenwood to...@audiencescience.com wrote: Another problem...I've reverted to the latest versions of the patches that are not specific to branch-3.2, and I'm getting two compilation errors: build-generated: [javac] Compiling 44 source files to /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch-3.2/build/classes compile-main: [javac] Compiling 2 source files to /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch-3.2/build/classes [javac] /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch- 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru mStats.java:30: name clash: getQuorumPeers() and getQuorumPeers() have the same erasure [javac] public String[] getQuorumPeers(); [javac] ^ [javac] /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p atched/branch- 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru mStats.java:31: name clash: getServerState() and getServerState() have the same erasure [javac] public String getServerState(); [javac] ^ [javac] 2 errors My build process is pretty simple: 1. copy the branch-3.2 source to a temp directory (src/patched/branch-3.2) 2. apply the ZOOKEEPER patches in my patches directory 3. build zookeeper in the temp directory -Todd -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Monday, August 03, 2009 4:09 PM To: zookeeper-u...@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Flavio, I notice that you've updated the patches referenced for the WAN deployment. There appears to be an order dependency w/ respect to these four patches... ZOOKEEPER-473.patch ZOOKEEPER-479-branch3.2.patch ZOOKEEPER-481-branch3.2.patch ZOOKEEPER-491.patch 473 - 479 (479 fails) to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/patched/branch-3.2$ patch -p0 ../patches/ZOOKEEPER-479-branch3.2.patch patching file
RE: Unending Leader Elections in WAN deploy
Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, Most of the patches that you mention should be in the branch 3.2 by tomm or so. 481, 479 are already in. 480 and 491 should be in by tomm. Would that suffice for you? Thanks mahadev On 8/3/09 4:21 PM, Todd Greenwood
Re: Unending Leader Elections in WAN deploy
Hi Todd, What is the synclimit you are using? Can you post your config? For WAN's you will have to use much bigger values for synclimit and others. Thanks mahadev On 8/4/09 1:24 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi
Re: Unending Leader Elections in WAN deploy
It would be better to create a JIRA with configs as well as logs. Patrick Mahadev Konar wrote: Hi Todd, What is the synclimit you are using? Can you post your config? For WAN's you will have to use much bigger values for synclimit and others. Thanks mahadev On 8/4/09 1:24 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, Most of the patches that you mention should be
RE: Unending Leader Elections in WAN deploy
Will do. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 1:34 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy It would be better to create a JIRA with configs as well as logs. Patrick Mahadev Konar wrote: Hi Todd, What is the synclimit you are using? Can you post your config? For WAN's you will have to use much bigger values for synclimit and others. Thanks mahadev On 8/4/09 1:24 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader
[jira] Created: (ZOOKEEPER-497) api and forrest docs should mention if classes are thread safe
api and forrest docs should mention if classes are thread safe -- Key: ZOOKEEPER-497 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-497 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.2.0 Reporter: Patrick Hunt Priority: Minor Fix For: 3.3.0 the api (c/java clients) and the forrest docs should talk about thread safety - in particular we don't mention that ZooKeeper class is thread safe (etc...) Docs should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Priority: Critical Fix For: 3.2.1 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Greenwood-Geer updated ZOOKEEPER-498: -- Attachment: zoo.cfg pod-zook-logs-01.tar.gz dc-zook-logs-01.tar.gz Zookeeper Logs and configuration files: dc1-zook01.log dc1-zook02.log dc1-zook03.log dc1-zook04.log dc1-zook05.log pd1-zook01.log pd1-zook02.log pd4-zook01.log pd4-zook02.log zoo.cfg Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Priority: Critical Fix For: 3.2.1 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-447: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, thanks Henry! Committed to 3.2.1 and 3.3 zkServer.sh doesn't allow different config files to be specified on the command line Key: ZOOKEEPER-447 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447 Project: Zookeeper Issue Type: Improvement Affects Versions: 3.1.1, 3.2.0 Reporter: Henry Robinson Assignee: Henry Robinson Priority: Minor Fix For: 3.2.1, 3.3.0 Attachments: ZOOKEEPER-447.patch Unless I'm missing something, you can change the directory that the zoo.cfg file is in by setting ZOOCFGDIR but not the name of the file itself. I find it convenient myself to specify the config file on the command line, but we should also let it be specified by environment variable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-490: -- Assignee: Patrick Hunt the java docs for session creation are misleading/incomplete Key: ZOOKEEPER-490 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490 Project: Zookeeper Issue Type: Bug Affects Versions: 3.1.1, 3.2.0 Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.2.1, 3.3.0 the javadoc for ZooKeeper constructor says: * The client object will pick an arbitrary server and try to connect to it. * If failed, it will try the next one in the list, until a connection is * established, or all the servers have been tried. the or all server tried phrase is misleading, it should indicate that we retry until success, con closed, or session expired. we also need ot mention that connection is async, that constructor returns immed and you need to look for connection event in watcher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-485: --- Fix Version/s: (was: 3.2.1) need ops documentation that details supervision of ZK server processes -- Key: ZOOKEEPER-485 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485 Project: Zookeeper Issue Type: Bug Components: documentation, server Reporter: Patrick Hunt Fix For: 3.3.0 We need ops documentation detailing what to do if the ZK server VM fails - by fail I mean the jvm process exits/dies/crashes/etc... In general a supervisor process should be used to start/stop/restart/etc... the ZK server vm. Something like daemontools http://cr.yp.to/daemontools.html could be used, or more simply a wrapper script should monitor the status of the pid and restart if the jvm fails. It's up to the operator, if this is not done automatically then it will have to be done manually, by operator restarting the ZK server jvm The inherent behavior of ZK wrt to failures - ie that it automatically recovers as long as quorum is maintained - fits into this nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-493) patch for command line setquota
[ https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-493: --- Attachment: ZOOKEEPER-493.patch updated patch to cleanup a bit in addition to fix. ZOOKEEPER-493.patch supersedes previous patch (fixed naming of patch file) patch for command line setquota Key: ZOOKEEPER-493 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.0 Reporter: steve bendiola Assignee: steve bendiola Priority: Minor Fix For: 3.2.1, 3.3.0 Attachments: quotafix.patch, ZOOKEEPER-493.patch the command line setquota tries to use argument 3 as both a path and a value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-493) patch for command line setquota
[ https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-493: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, thanks Steve! Applied to 3.2.1 and 3.3 patch for command line setquota Key: ZOOKEEPER-493 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493 Project: Zookeeper Issue Type: Bug Components: java client Affects Versions: 3.2.0 Reporter: steve bendiola Assignee: steve bendiola Priority: Minor Fix For: 3.2.1, 3.3.0 Attachments: quotafix.patch, ZOOKEEPER-493.patch the command line setquota tries to use argument 3 as both a path and a value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Unending Leader Elections in WAN deploy
Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-498: --- Fix Version/s: 3.3.0 Assignee: Patrick Hunt Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Assignee: Patrick Hunt Priority: Critical Fix For: 3.2.1, 3.3.0 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Unending Leader Elections in WAN deploy
Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 3:55 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll
Re: Unending Leader Elections in WAN deploy
Hi Todd, Can you attach the files to the jira? I will takea look at this and will get back to you by end of day today. Thanks mahadev On 8/4/09 4:56 PM, Todd Greenwood to...@audiencescience.com wrote: Looks like we're not getting *any* leader elected now Logs attached. -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Tuesday, August 04, 2009 4:07 PM To: zookeeper-dev@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 3:55 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec libtoolize: No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation
Re: Unending Leader Elections in WAN deploy
Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: - Standard Error - Exception in thread Thread-108 junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) - --- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider successful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 5 server quorums by hand from the command line with 0 weight and they see similar issues to what Todd is seeing. I'm using the latest code in mainline btw. Patrick Mahadev Konar wrote: Hi todd, I see a lot of java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana ger.java:324) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. java:304) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .process(FastLeaderElection.java:317) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .run(FastLeaderElection.java:290) at java.lang.Thread.run(Thread.java:619) Is it possible that there is some firewall? Can all the servers 1-9 connect to all the others using ports that you specified in zoo.cfg i.e 2888/3888? Thanks mahadev On 8/4/09 4:56 PM, Todd Greenwood to...@audiencescience.com wrote: Looks like we're not getting *any* leader elected now Logs attached. -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Tuesday, August 04, 2009 4:07 PM To: zookeeper-dev@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 3:55 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not
Re: Unending Leader Elections in WAN deploy
(I see the same error in fle0weighttest using latest 3.2 btw) Patrick Hunt wrote: Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: - Standard Error - Exception in thread Thread-108 junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) - --- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider successful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 5 server quorums by hand from the command line with 0 weight and they see similar issues to what Todd is seeing. I'm using the latest code in mainline btw. Patrick Mahadev Konar wrote: Hi todd, I see a lot of java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana ger.java:324) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. java:304) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .process(FastLeaderElection.java:317) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .run(FastLeaderElection.java:290) at java.lang.Thread.run(Thread.java:619) Is it possible that there is some firewall? Can all the servers 1-9 connect to all the others using ports that you specified in zoo.cfg i.e 2888/3888? Thanks mahadev On 8/4/09 4:56 PM, Todd Greenwood to...@audiencescience.com wrote: Looks like we're not getting *any* leader elected now Logs attached. -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Tuesday, August 04, 2009 4:07 PM To: zookeeper-dev@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 3:55 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add