Hi Todd, What is the synclimit you are using? Can you post your config? For WAN's you will have to use much bigger values for synclimit and others.
Thanks mahadev On 8/4/09 1:24 PM, "Todd Greenwood" <[email protected]> wrote: > Mahadev, > > I just heard from IT that this build behaves in exactly the same way as > previous versions, e.g. we get continuous leader elections that > disconnect the followers and then get re-elected, and disconnect...etc. > > This is from a fresh sync to the 3.2 branch: > > svn co > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 > ./branch-3.2 > > CHANGES.TXT show the various fixes included: > > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > /src/original$ head -n 50 branch-3.2/CHANGES.txt > Release 3.2.1 > > Backward compatibile changes: > > BUGFIXES: > ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via > flavio) > > ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via > mahadev) > > ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) > > ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via > mahadev) > > ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) > (giri via mahadev) > > ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) > > ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate > failure. (chris via mahadev) > > ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via > phunt) > > ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and > other) > embedded clients (ryan rawson via phunt) > > ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via > mahadev) > > ZOOKEEPER-479. QuorumHierarchical does not count groups correctly > (flavio via mahadev) > > ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty > cert > (Chris Darroch via phunt) > > ZOOKEEPER-480. FLE should perform leader check when node is not > leading and > add vote of follower (flavio via mahadev) > > ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio > via > mahadev) > > What can I do to assist you with this issue? > > -Todd > >> -----Original Message----- >> From: Mahadev Konar [mailto:[email protected]] >> Sent: Tuesday, August 04, 2009 12:43 PM >> To: [email protected] >> Subject: Re: Unending Leader Elections in WAN deploy >> >> Hi todd, >> comments in line >> >> >> On 8/4/09 12:38 PM, "Todd Greenwood" <[email protected]> > wrote: >> >>> Mahadev, >>> >>> Some quick questions: >>> >>> 1. Version >>> >>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is > still >>> calling this 3.2.0. Should this be rev'd, and am I correct in > calling >>> this release 3.2.1? >> Yes the release is 3.2.1. The build.xml will be fixed as soon as we > tag >> the >> release. >> >>> >>> 2. Build targets >>> >>> The package target fails b/c the create-cppunit-configure target > fails >>> due to various problems w/ respect to autoconf. Are these > dependencies >>> documented somewhere ? I'd like to have a fully building system. >>> >>> create-cppunit-configure: >>> [exec] Can't exec "libtoolize": No such file or directory at >>> /usr/bin/autoreconf line 188. >>> [exec] Use of uninitialized value $libtoolize in pattern match >>> (m//) at /usr/bin/autoreconf line 188. >>> [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not > found >>> in library >>> [exec] configure.ac:33: error: possibly undefined macro: >>> AM_PATH_CPPUNIT >>> [exec] If this token and others are legitimate, please > use >>> m4_pattern_allow. >>> [exec] See the Autoconf documentation. >>> [exec] configure.ac:53: error: possibly undefined macro: >>> AC_PROG_LIBTOOL >>> [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 >>> >> You need auto tools to run this. Please read the README for building c >> client library at src/c/ for the installation requirements. >>> >>> 3. Sync failure: >>> >>> This is still failing. >>> >>> svn: URL >>> 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' >>> doesn't exist >>> >> >> Yes this hasn't been fixed yet! >> >> Thanks >> mahadev >>> -Todd >>> >>>> -----Original Message----- >>>> From: Todd Greenwood >>>> Sent: Tuesday, August 04, 2009 11:26 AM >>>> To: '[email protected]' >>>> Subject: RE: Unending Leader Elections in WAN deploy >>>> >>>> Great news. Thank you Mahadev. I'll report our findings later > today. >>>> -Todd >>>> >>>>> -----Original Message----- >>>>> From: Mahadev Konar [mailto:[email protected]] >>>>> Sent: Tuesday, August 04, 2009 11:20 AM >>>>> To: [email protected] >>>>> Subject: Re: Unending Leader Elections in WAN deploy >>>>> >>>>> Hi Todd, >>>>> I just committed 480 and 491. You can checkout the 3.2 branch > now. >>>>> >>>>> Thanks >>>>> mahadev >>>>> >>>>> >>>>> On 8/3/09 4:29 PM, "Todd Greenwood" <[email protected]> >>> wrote: >>>>> >>>>>> That'd be perfect. Thanks! >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Mahadev Konar [mailto:[email protected]] >>>>>>> Sent: Monday, August 03, 2009 4:24 PM >>>>>>> To: [email protected] >>>>>>> Subject: Re: Unending Leader Elections in WAN deploy >>>>>>> >>>>>>> Hi Todd, >>>>>>> Most of the patches that you mention should be in the branch >>> 3.2 by >>>>>> tomm >>>>>>> or so. 481, 479 are already in. 480 and 491 should be in by > tomm. >>>>>> Would >>>>>>> that >>>>>>> suffice for you? >>>>>>> >>>>>>> Thanks >>>>>>> mahadev >>>>>>> >>>>>>> >>>>>>> On 8/3/09 4:21 PM, "Todd Greenwood" <[email protected]> >>>> wrote: >>>>>>> >>>>>>>> Another problem...I've reverted to the latest versions of the >>>>>> patches >>>>>>>> that are not specific to branch-3.2, and I'm getting two >>> compilation >>>>>>>> errors: >>>>>>>> >>>>>>>> build-generated: >>>>>>>> [javac] Compiling 44 source files to >>>>>>>> >>>>>> >>>> >>> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >>>>>>>> atched/branch-3.2/build/classes >>>>>>>> >>>>>>>> compile-main: >>>>>>>> [javac] Compiling 2 source files to >>>>>>>> >>>>>> >>>> >>> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >>>>>>>> atched/branch-3.2/build/classes >>>>>>>> [javac] >>>>>>>> >>>>>> >>>> >>> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >>>>>>>> >>>>>> atched/branch- >>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru >>>>>>>> mStats.java:30: name clash: getQuorumPeers() and >>> getQuorumPeers() >>>>>> have >>>>>>>> the same erasure >>>>>>>> [javac] public String[] getQuorumPeers(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> >>>>>> >>>> >>> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >>>>>>>> >>>>>> atched/branch- >>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru >>>>>>>> mStats.java:31: name clash: getServerState() and >>> getServerState() >>>>>> have >>>>>>>> the same erasure >>>>>>>> [javac] public String getServerState(); >>>>>>>> [javac] ^ >>>>>>>> [javac] 2 errors >>>>>>>> >>>>>>>> My build process is pretty simple: >>>>>>>> >>>>>>>> 1. copy the branch-3.2 source to a temp directory >>>>>>>> (src/patched/branch-3.2) >>>>>>>> 2. apply the ZOOKEEPER patches in my patches directory >>>>>>>> 3. build zookeeper in the temp directory >>>>>>>> >>>>>>>> -Todd >>>>>>>>> -----Original Message----- >>>>>>>>> From: Todd Greenwood [mailto:[email protected]] >>>>>>>>> Sent: Monday, August 03, 2009 4:09 PM >>>>>>>>> To: [email protected] >>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy >>>>>>>>> >>>>>>>>> Flavio, >>>>>>>>> I notice that you've updated the patches referenced for the > WAN >>>>>>>>> deployment. There appears to be an order dependency w/ respect >>> to >>>>>>>> these >>>>>>>>> four patches... >>>>>>>>> >>>>>>>>> ZOOKEEPER-473.patch ZOOKEEPER-479-branch3.2.patch >>>>>>>>> ZOOKEEPER-481-branch3.2.patch ZOOKEEPER-491.patch >>>>>>>>> >>>>>>>>> 473 -> 479 (479 fails) >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper >>>>>>>>> /src/patched/branch-3.2$ patch -p0 < >>>>>>>>> ../patches/ZOOKEEPER-479-branch3.2.patch >>>>>>>>> patching file >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch >>>>>>>>> ical.java >>>>>>>>> patching file >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java >>>>>>>>> patching file >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier >>>>>>>>> .java >>>>>>>>> patching file >>>>>>>>> >>> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java >>>>>>>>> Hunk #1 FAILED at 93. >>>>>>>>> Hunk #2 FAILED at 145. >>>>>>>>> 2 out of 2 hunks FAILED -- saving rejects to file >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper >>>>>>>>> /src/patched/branch-3.2$ h ../patches/ >>>>>>>>> >>>>>>>>> Could you advise as to which patches I need to apply, and in >>> what >>>>>>>> order? >>>>>>>>> >>>>>>>>> -Todd >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: Flavio Junqueira [mailto:[email protected]] >>>>>>>>>> Sent: Friday, July 31, 2009 9:51 PM >>>>>>>>>> To: [email protected] >>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy >>>>>>>>>> >>>>>>>>>> Perfect! Thanks for the update, Todd. >>>>>>>>>> >>>>>>>>>> -Flavio >>>>>>>>>> >>>>>>>>>> On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote: >>>>>>>>>> >>>>>>>>>>> Thanks. You were right, I had a stale version of 479. >>> Compilation >>>>>>>>>>> succeeds and all tests pass on branch-3.2 with the latest >>> patches >>>>>>>>> 473, >>>>>>>>>>> 479, 481, and 491. >>>>>>>>>>> >>>>>>>>>>> -Todd >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Flavio Junqueira [mailto:[email protected]] >>>>>>>>>>>> Sent: Friday, July 31, 2009 7:48 PM >>>>>>>>>>>> To: [email protected] >>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy >>>>>>>>>>>> >>>>>>>>>>>> It should be in 479. Perhaps you have a stale version of > the >>>>>>>> patch. >>>>>>>>>>>> >>>>>>>>>>>> -Flavio >>>>>>>>>>>> >>>>>>>>>>>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote: >>>>>>>>>>>> >>>>>>>>>>>> Flavio, >>>>>>>>>>>> >>>>>>>>>>>> I'm getting a compilation error for patch 491: >>>>>>>>>>>> >>>>>>>>>>>> compile-main: >>>>>>>>>>>> [javac] Compiling 1 source file to >>>>>>>>>>>> >>>>>>>>> >>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/ >>>>>>>>>>>> src/p >>>>>>>>>>>> atched/branch-3.2/build/classes >>>>>>>>>>>> [javac] >>>>>>>>>>>> >>>>>>>>> >>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/ >>>>>>>>>>>> src/p >>>>>>>>>>>> >>>>>>>>> >>> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/ >>>>>>>>>>>> FastL >>>>>>>>>>>> eaderElection.java:601: cannot find symbol >>>>>>>>>>>> [javac] symbol : method getWeight(long) >>>>>>>>>>>> [javac] location: interface >>>>>>>>>>>> org.apache.zookeeper.server.quorum.flexible.QuorumVerifier >>>>>>>>>>>> [javac] >>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0) >>>>>>>>>>>> [javac] >>> ^ >>>>>>>>>>>> [javac] 1 error >>>>>>>>>>>> >>>>>>>>>>>> I see a reference to getWeight in both >>> FastLeaderElection.java >>>>>>>> in >>>>>>>>>>>> patch >>>>>>>>>>>> 491: >>>>>>>>>>>> >>>>>>>>>>>> patches/ZOOKEEPER-491.patch:+ >>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0) >>>>>>>>>>>> src/java/main/org/apache/zookeeper/server/quorum/ >>>>>>>>>>>> FastLeaderElection.java >>>>>>>>>>>> : >>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != >>>>>>>>>>>> 0) >>>>>>>>>>>> >>>>>>>>>>>> However, I don't see a reference to this method in patches >>> 473, >>>>>>>>> 479, >>>>>>>>>>>> or >>>>>>>>>>>> 481. I also don't see a reference to this method in the >>>>>> trunk... >>>>>>>>>>>> >>>>>>>>>>>> -Todd >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Todd Greenwood [mailto:[email protected]] >>>>>>>>>>>> Sent: Friday, July 31, 2009 7:30 PM >>>>>>>>>>>> To: [email protected] >>>>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy >>>>>>>>>>>> >>>>>>>>>>>> Ok, I'll apply that patch and report back. >>>>>>>>>>>> -Todd >>>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: Flavio Junqueira [mailto:[email protected]] >>>>>>>>>>>> Sent: Friday, July 31, 2009 7:18 PM >>>>>>>>>>>> To: [email protected] >>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy >>>>>>>>>>>> >>>>>>>>>>>> You're missing 491 from your set of patches. >>>>>>>>>>>> >>>>>>>>>>>> -Flavio >>>>>>>>>>>> >>>>>>>>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote: >>>>>>>>>>>> >>>>>>>>>>>> This repro's in both branch-3.2, and >>> branch-3.2+patches(473, >>>>>>>>> 479, >>>>>>>>>>>> 481). >>>>>>>>>>>> >>>>>>>>>>>> Basically, it seems like the nodes are electing >>> pd4-zook02 >>>>>> to >>>>>>>>> be >>>>>>>>>>>> the >>>>>>>>>>>> leader. However, pd4-zook02 seems to realize it's not >>>>>>>> supposed >>>>>>>>> to >>>>>>>>>>>> be >>>>>>>>>>>> and >>>>>>>>>>>> then disconnects everyone. Then they re-elect it again, >>> and >>>>>>>> it >>>>>>>>>>>> loops >>>>>>>>>>>> over and over. >>>>>>>>>>>> >>>>>>>>>>>> ------------- >>>>>>>>>>>> Server config >>>>>>>>>>>> ------------- >>>>>>>>>>>> >>>>>>>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888 >>>>>>>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888 >>>>>>>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888 >>>>>>>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888 >>>>>>>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888 >>>>>>>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888 >>>>>>>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888 >>>>>>>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888 >>>>>>>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888 >>>>>>>>>>>> >>>>>>>>>>>> group.1:1:2:3:4:5 >>>>>>>>>>>> weight.1=1 >>>>>>>>>>>> weight.2=1 >>>>>>>>>>>> weight.3=1 >>>>>>>>>>>> weight.4=1 >>>>>>>>>>>> weight.5=1 >>>>>>>>>>>> >>>>>>>>>>>> group.2:6:7:8:9 >>>>>>>>>>>> weight.6=0 >>>>>>>>>>>> weight.7=0 >>>>>>>>>>>> weight.8=0 >>>>>>>>>>>> weight.9=0 >>>>>>>>>>>> >>>>>>>>>>>> Note that we have 2 groups, composed of machines in 3 >>>>>>>> different >>>>>>>>>>>> locations (dc1, pd1, and pd4). The idea is that only >>>>>> machines >>>>>>>>> in >>>>>>>>>>>> dc1 >>>>>>>>>>>> have voting rights, and the ability to become a leader. >>> The >>>>>>>>>>>> machines >>>>>>>>>>>> in >>>>>>>>>>>> the pods all have a weight of zero, and are not expected >>> to >>>>>>>>>>> become >>>>>>>>>>>> leaders, or to vote on transactions. >>>>>>>>>>>> >>>>>>>>>>>> Let me know what I can do to help resolve this issue. >>>>>>>>>>>> >>>>>>>>>>>> -Todd >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>>> >>> >
