Will do.
> -----Original Message----- > From: Patrick Hunt [mailto:ph...@apache.org] > Sent: Tuesday, August 04, 2009 1:34 PM > To: zookeeper-dev@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > It would be better to create a JIRA with configs as well as logs. > > Patrick > > Mahadev Konar wrote: > > Hi Todd, > > > > What is the synclimit you are using? Can you post your config? For > WAN's > > you will have to use much bigger values for synclimit and others. > > > > Thanks > > mahadev > > > > > > On 8/4/09 1:24 PM, "Todd Greenwood" <to...@audiencescience.com> wrote: > > > >> Mahadev, > >> > >> I just heard from IT that this build behaves in exactly the same way as > >> previous versions, e.g. we get continuous leader elections that > >> disconnect the followers and then get re-elected, and disconnect...etc. > >> > >> This is from a fresh sync to the 3.2 branch: > >> > >> svn co > >> http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 > >> ./branch-3.2 > >> > >> CHANGES.TXT show the various fixes included: > >> > >> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > >> /src/original$ head -n 50 branch-3.2/CHANGES.txt > >> Release 3.2.1 > >> > >> Backward compatibile changes: > >> > >> BUGFIXES: > >> ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via > >> flavio) > >> > >> ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via > >> mahadev) > >> > >> ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) > >> > >> ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via > >> mahadev) > >> > >> ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) > >> (giri via mahadev) > >> > >> ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) > >> > >> ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate > >> failure. (chris via mahadev) > >> > >> ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via > >> phunt) > >> > >> ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and > >> other) > >> embedded clients (ryan rawson via phunt) > >> > >> ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via > >> mahadev) > >> > >> ZOOKEEPER-479. QuorumHierarchical does not count groups correctly > >> (flavio via mahadev) > >> > >> ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty > >> cert > >> (Chris Darroch via phunt) > >> > >> ZOOKEEPER-480. FLE should perform leader check when node is not > >> leading and > >> add vote of follower (flavio via mahadev) > >> > >> ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio > >> via > >> mahadev) > >> > >> What can I do to assist you with this issue? > >> > >> -Todd > >> > >>> -----Original Message----- > >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > >>> Sent: Tuesday, August 04, 2009 12:43 PM > >>> To: zookeeper-dev@hadoop.apache.org > >>> Subject: Re: Unending Leader Elections in WAN deploy > >>> > >>> Hi todd, > >>> comments in line > >>> > >>> > >>> On 8/4/09 12:38 PM, "Todd Greenwood" <to...@audiencescience.com> > >> wrote: > >>>> Mahadev, > >>>> > >>>> Some quick questions: > >>>> > >>>> 1. Version > >>>> > >>>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is > >> still > >>>> calling this 3.2.0. Should this be rev'd, and am I correct in > >> calling > >>>> this release 3.2.1? > >>> Yes the release is 3.2.1. The build.xml will be fixed as soon as we > >> tag > >>> the > >>> release. > >>> > >>>> 2. Build targets > >>>> > >>>> The package target fails b/c the create-cppunit-configure target > >> fails > >>>> due to various problems w/ respect to autoconf. Are these > >> dependencies > >>>> documented somewhere ? I'd like to have a fully building system. > >>>> > >>>> create-cppunit-configure: > >>>> [exec] Can't exec "libtoolize": No such file or directory at > >>>> /usr/bin/autoreconf line 188. > >>>> [exec] Use of uninitialized value $libtoolize in pattern match > >>>> (m//) at /usr/bin/autoreconf line 188. > >>>> [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not > >> found > >>>> in library > >>>> [exec] configure.ac:33: error: possibly undefined macro: > >>>> AM_PATH_CPPUNIT > >>>> [exec] If this token and others are legitimate, please > >> use > >>>> m4_pattern_allow. > >>>> [exec] See the Autoconf documentation. > >>>> [exec] configure.ac:53: error: possibly undefined macro: > >>>> AC_PROG_LIBTOOL > >>>> [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 > >>>> > >>> You need auto tools to run this. Please read the README for building c > >>> client library at src/c/ for the installation requirements. > >>>> 3. Sync failure: > >>>> > >>>> This is still failing. > >>>> > >>>> svn: URL > >>>> 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' > >>>> doesn't exist > >>>> > >>> Yes this hasn't been fixed yet! > >>> > >>> Thanks > >>> mahadev > >>>> -Todd > >>>> > >>>>> -----Original Message----- > >>>>> From: Todd Greenwood > >>>>> Sent: Tuesday, August 04, 2009 11:26 AM > >>>>> To: 'zookeeper-u...@hadoop.apache.org' > >>>>> Subject: RE: Unending Leader Elections in WAN deploy > >>>>> > >>>>> Great news. Thank you Mahadev. I'll report our findings later > >> today. > >>>>> -Todd > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > >>>>>> Sent: Tuesday, August 04, 2009 11:20 AM > >>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>> Subject: Re: Unending Leader Elections in WAN deploy > >>>>>> > >>>>>> Hi Todd, > >>>>>> I just committed 480 and 491. You can checkout the 3.2 branch > >> now. > >>>>>> Thanks > >>>>>> mahadev > >>>>>> > >>>>>> > >>>>>> On 8/3/09 4:29 PM, "Todd Greenwood" <to...@audiencescience.com> > >>>> wrote: > >>>>>>> That'd be perfect. Thanks! > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > >>>>>>>> Sent: Monday, August 03, 2009 4:24 PM > >>>>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy > >>>>>>>> > >>>>>>>> Hi Todd, > >>>>>>>> Most of the patches that you mention should be in the branch > >>>> 3.2 by > >>>>>>> tomm > >>>>>>>> or so. 481, 479 are already in. 480 and 491 should be in by > >> tomm. > >>>>>>> Would > >>>>>>>> that > >>>>>>>> suffice for you? > >>>>>>>> > >>>>>>>> Thanks > >>>>>>>> mahadev > >>>>>>>> > >>>>>>>> > >>>>>>>> On 8/3/09 4:21 PM, "Todd Greenwood" <to...@audiencescience.com> > >>>>> wrote: > >>>>>>>>> Another problem...I've reverted to the latest versions of the > >>>>>>> patches > >>>>>>>>> that are not specific to branch-3.2, and I'm getting two > >>>> compilation > >>>>>>>>> errors: > >>>>>>>>> > >>>>>>>>> build-generated: > >>>>>>>>> [javac] Compiling 44 source files to > >>>>>>>>> > >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > >>>>>>>>> atched/branch-3.2/build/classes > >>>>>>>>> > >>>>>>>>> compile-main: > >>>>>>>>> [javac] Compiling 2 source files to > >>>>>>>>> > >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > >>>>>>>>> atched/branch-3.2/build/classes > >>>>>>>>> [javac] > >>>>>>>>> > >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > >>>>>>> atched/branch- > >>>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru > >>>>>>>>> mStats.java:30: name clash: getQuorumPeers() and > >>>> getQuorumPeers() > >>>>>>> have > >>>>>>>>> the same erasure > >>>>>>>>> [javac] public String[] getQuorumPeers(); > >>>>>>>>> [javac] ^ > >>>>>>>>> [javac] > >>>>>>>>> > >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > >>>>>>> atched/branch- > >>>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru > >>>>>>>>> mStats.java:31: name clash: getServerState() and > >>>> getServerState() > >>>>>>> have > >>>>>>>>> the same erasure > >>>>>>>>> [javac] public String getServerState(); > >>>>>>>>> [javac] ^ > >>>>>>>>> [javac] 2 errors > >>>>>>>>> > >>>>>>>>> My build process is pretty simple: > >>>>>>>>> > >>>>>>>>> 1. copy the branch-3.2 source to a temp directory > >>>>>>>>> (src/patched/branch-3.2) > >>>>>>>>> 2. apply the ZOOKEEPER patches in my patches directory > >>>>>>>>> 3. build zookeeper in the temp directory > >>>>>>>>> > >>>>>>>>> -Todd > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com] > >>>>>>>>>> Sent: Monday, August 03, 2009 4:09 PM > >>>>>>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy > >>>>>>>>>> > >>>>>>>>>> Flavio, > >>>>>>>>>> I notice that you've updated the patches referenced for the > >> WAN > >>>>>>>>>> deployment. There appears to be an order dependency w/ respect > >>>> to > >>>>>>>>> these > >>>>>>>>>> four patches... > >>>>>>>>>> > >>>>>>>>>> ZOOKEEPER-473.patch ZOOKEEPER-479-branch3.2.patch > >>>>>>>>>> ZOOKEEPER-481-branch3.2.patch ZOOKEEPER-491.patch > >>>>>>>>>> > >>>>>>>>>> 473 -> 479 (479 fails) > >>>>>>>>>> > >>>>>>>>>> > >> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > >>>>>>>>>> /src/patched/branch-3.2$ patch -p0 < > >>>>>>>>>> ../patches/ZOOKEEPER-479-branch3.2.patch > >>>>>>>>>> patching file > >>>>>>>>>> > >> > src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch > >>>>>>>>>> ical.java > >>>>>>>>>> patching file > >>>>>>>>>> > >> > src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java > >>>>>>>>>> patching file > >>>>>>>>>> > >> > src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier > >>>>>>>>>> .java > >>>>>>>>>> patching file > >>>>>>>>>> > >>>> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java > >>>>>>>>>> Hunk #1 FAILED at 93. > >>>>>>>>>> Hunk #2 FAILED at 145. > >>>>>>>>>> 2 out of 2 hunks FAILED -- saving rejects to file > >>>>>>>>>> > >> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej > >> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > >>>>>>>>>> /src/patched/branch-3.2$ h ../patches/ > >>>>>>>>>> > >>>>>>>>>> Could you advise as to which patches I need to apply, and in > >>>> what > >>>>>>>>> order? > >>>>>>>>>> -Todd > >>>>>>>>>> > >>>>>>>>>>> -----Original Message----- > >>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com] > >>>>>>>>>>> Sent: Friday, July 31, 2009 9:51 PM > >>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy > >>>>>>>>>>> > >>>>>>>>>>> Perfect! Thanks for the update, Todd. > >>>>>>>>>>> > >>>>>>>>>>> -Flavio > >>>>>>>>>>> > >>>>>>>>>>> On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Thanks. You were right, I had a stale version of 479. > >>>> Compilation > >>>>>>>>>>>> succeeds and all tests pass on branch-3.2 with the latest > >>>> patches > >>>>>>>>>> 473, > >>>>>>>>>>>> 479, 481, and 491. > >>>>>>>>>>>> > >>>>>>>>>>>> -Todd > >>>>>>>>>>>> > >>>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com] > >>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:48 PM > >>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy > >>>>>>>>>>>>> > >>>>>>>>>>>>> It should be in 479. Perhaps you have a stale version of > >> the > >>>>>>>>> patch. > >>>>>>>>>>>>> -Flavio > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Flavio, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm getting a compilation error for patch 491: > >>>>>>>>>>>>> > >>>>>>>>>>>>> compile-main: > >>>>>>>>>>>>> [javac] Compiling 1 source file to > >>>>>>>>>>>>> > >>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/ > >>>>>>>>>>>>> src/p > >>>>>>>>>>>>> atched/branch-3.2/build/classes > >>>>>>>>>>>>> [javac] > >>>>>>>>>>>>> > >>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/ > >>>>>>>>>>>>> src/p > >>>>>>>>>>>>> > >>>> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/ > >>>>>>>>>>>>> FastL > >>>>>>>>>>>>> eaderElection.java:601: cannot find symbol > >>>>>>>>>>>>> [javac] symbol : method getWeight(long) > >>>>>>>>>>>>> [javac] location: interface > >>>>>>>>>>>>> org.apache.zookeeper.server.quorum.flexible.QuorumVerifier > >>>>>>>>>>>>> [javac] > >>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0) > >>>>>>>>>>>>> [javac] > >>>> ^ > >>>>>>>>>>>>> [javac] 1 error > >>>>>>>>>>>>> > >>>>>>>>>>>>> I see a reference to getWeight in both > >>>> FastLeaderElection.java > >>>>>>>>> in > >>>>>>>>>>>>> patch > >>>>>>>>>>>>> 491: > >>>>>>>>>>>>> > >>>>>>>>>>>>> patches/ZOOKEEPER-491.patch:+ > >>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0) > >>>>>>>>>>>>> src/java/main/org/apache/zookeeper/server/quorum/ > >>>>>>>>>>>>> FastLeaderElection.java > >>>>>>>>>>>>> : > >>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != > >>>>>>>>>>>>> 0) > >>>>>>>>>>>>> > >>>>>>>>>>>>> However, I don't see a reference to this method in patches > >>>> 473, > >>>>>>>>>> 479, > >>>>>>>>>>>>> or > >>>>>>>>>>>>> 481. I also don't see a reference to this method in the > >>>>>>> trunk... > >>>>>>>>>>>>> -Todd > >>>>>>>>>>>>> > >>>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com] > >>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:30 PM > >>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy > >>>>>>>>>>>>> > >>>>>>>>>>>>> Ok, I'll apply that patch and report back. > >>>>>>>>>>>>> -Todd > >>>>>>>>>>>>> > >>>>>>>>>>>>> -----Original Message----- > >>>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com] > >>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:18 PM > >>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org > >>>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy > >>>>>>>>>>>>> > >>>>>>>>>>>>> You're missing 491 from your set of patches. > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Flavio > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> This repro's in both branch-3.2, and > >>>> branch-3.2+patches(473, > >>>>>>>>>> 479, > >>>>>>>>>>>>> 481). > >>>>>>>>>>>>> > >>>>>>>>>>>>> Basically, it seems like the nodes are electing > >>>> pd4-zook02 > >>>>>>> to > >>>>>>>>>> be > >>>>>>>>>>>>> the > >>>>>>>>>>>>> leader. However, pd4-zook02 seems to realize it's not > >>>>>>>>> supposed > >>>>>>>>>> to > >>>>>>>>>>>>> be > >>>>>>>>>>>>> and > >>>>>>>>>>>>> then disconnects everyone. Then they re-elect it again, > >>>> and > >>>>>>>>> it > >>>>>>>>>>>>> loops > >>>>>>>>>>>>> over and over. > >>>>>>>>>>>>> > >>>>>>>>>>>>> ------------- > >>>>>>>>>>>>> Server config > >>>>>>>>>>>>> ------------- > >>>>>>>>>>>>> > >>>>>>>>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888 > >>>>>>>>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888 > >>>>>>>>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888 > >>>>>>>>>>>>> > >>>>>>>>>>>>> group.1:1:2:3:4:5 > >>>>>>>>>>>>> weight.1=1 > >>>>>>>>>>>>> weight.2=1 > >>>>>>>>>>>>> weight.3=1 > >>>>>>>>>>>>> weight.4=1 > >>>>>>>>>>>>> weight.5=1 > >>>>>>>>>>>>> > >>>>>>>>>>>>> group.2:6:7:8:9 > >>>>>>>>>>>>> weight.6=0 > >>>>>>>>>>>>> weight.7=0 > >>>>>>>>>>>>> weight.8=0 > >>>>>>>>>>>>> weight.9=0 > >>>>>>>>>>>>> > >>>>>>>>>>>>> Note that we have 2 groups, composed of machines in 3 > >>>>>>>>> different > >>>>>>>>>>>>> locations (dc1, pd1, and pd4). The idea is that only > >>>>>>> machines > >>>>>>>>>> in > >>>>>>>>>>>>> dc1 > >>>>>>>>>>>>> have voting rights, and the ability to become a leader. > >>>> The > >>>>>>>>>>>>> machines > >>>>>>>>>>>>> in > >>>>>>>>>>>>> the pods all have a weight of zero, and are not expected > >>>> to > >>>>>>>>>>>> become > >>>>>>>>>>>>> leaders, or to vote on transactions. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Let me know what I can do to help resolve this issue. > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Todd > >>>>>>>>>>>>> > >