Will do.

> -----Original Message-----
> From: Patrick Hunt [mailto:ph...@apache.org]
> Sent: Tuesday, August 04, 2009 1:34 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> It would be better to create a JIRA with configs as well as logs.
> 
> Patrick
> 
> Mahadev Konar wrote:
> > Hi Todd,
> >
> >   What is the synclimit you are using? Can you post your config? For
> WAN's
> > you will have to use much bigger values for synclimit and others.
> >
> > Thanks
> > mahadev
> >
> >
> > On 8/4/09 1:24 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
> >
> >> Mahadev,
> >>
> >> I just heard from IT that this build behaves in exactly the same
way as
> >> previous versions, e.g. we get continuous leader elections that
> >> disconnect the followers and then get re-elected, and
disconnect...etc.
> >>
> >> This is from a fresh sync to the 3.2 branch:
> >>
> >> svn co
> >>
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
> >> ./branch-3.2
> >>
> >> CHANGES.TXT show the various fixes included:
> >>
> >>
>
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> >> /src/original$ head -n 50 branch-3.2/CHANGES.txt
> >> Release 3.2.1
> >>
> >> Backward compatibile changes:
> >>
> >> BUGFIXES:
> >>   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
via
> >> flavio)
> >>
> >>   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
via
> >> mahadev)
> >>
> >>   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
mahadev)
> >>
> >>   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
> >> mahadev)
> >>
> >>   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
> >>   (giri via mahadev)
> >>
> >>   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
mahadev)
> >>
> >>   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
immediate
> >>   failure. (chris via mahadev)
> >>
> >>   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
via
> >> phunt)
> >>
> >>   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
> >> other)
> >>   embedded clients (ryan rawson via phunt)
> >>
> >>   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
via
> >> mahadev)
> >>
> >>   ZOOKEEPER-479.  QuorumHierarchical does not count groups
correctly
> >>   (flavio via mahadev)
> >>
> >>   ZOOKEEPER-466. crash on zookeeper_close() when using auth with
empty
> >> cert
> >>   (Chris Darroch via phunt)
> >>
> >>   ZOOKEEPER-480. FLE should perform leader check when node is not
> >> leading and
> >>   add vote of follower (flavio via mahadev)
> >>
> >>   ZOOKEEPER-491. Prevent zero-weight servers from being elected
(flavio
> >> via
> >>   mahadev)
> >>
> >> What can I do to assist you with this issue?
> >>
> >> -Todd
> >>
> >>> -----Original Message-----
> >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>> Sent: Tuesday, August 04, 2009 12:43 PM
> >>> To: zookeeper-dev@hadoop.apache.org
> >>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>
> >>> Hi todd,
> >>>  comments in line
> >>>
> >>>
> >>> On 8/4/09 12:38 PM, "Todd Greenwood" <to...@audiencescience.com>
> >> wrote:
> >>>> Mahadev,
> >>>>
> >>>> Some quick questions:
> >>>>
> >>>> 1. Version
> >>>>
> >>>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
> >> still
> >>>> calling this 3.2.0. Should this be rev'd, and am I correct in
> >> calling
> >>>> this release 3.2.1?
> >>> Yes the release is 3.2.1. The build.xml will be fixed as soon as
we
> >> tag
> >>> the
> >>> release.
> >>>
> >>>> 2. Build targets
> >>>>
> >>>> The package target fails b/c the create-cppunit-configure target
> >> fails
> >>>> due to various problems w/ respect to autoconf. Are these
> >> dependencies
> >>>> documented somewhere ? I'd like to have a fully building system.
> >>>>
> >>>> create-cppunit-configure:
> >>>>      [exec] Can't exec "libtoolize": No such file or directory at
> >>>> /usr/bin/autoreconf line 188.
> >>>>      [exec] Use of uninitialized value $libtoolize in pattern
match
> >>>> (m//) at /usr/bin/autoreconf line 188.
> >>>>      [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
> >> found
> >>>> in library
> >>>>      [exec] configure.ac:33: error: possibly undefined macro:
> >>>> AM_PATH_CPPUNIT
> >>>>      [exec]       If this token and others are legitimate, please
> >> use
> >>>> m4_pattern_allow.
> >>>>      [exec]       See the Autoconf documentation.
> >>>>      [exec] configure.ac:53: error: possibly undefined macro:
> >>>> AC_PROG_LIBTOOL
> >>>>      [exec] autoreconf: /usr/bin/autoconf failed with exit
status: 1
> >>>>
> >>> You need auto tools to run this. Please read the README for
building c
> >>> client library at src/c/ for the installation requirements.
> >>>> 3. Sync failure:
> >>>>
> >>>> This is still failing.
> >>>>
> >>>> svn: URL
> >>>>
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
> >>>> doesn't exist
> >>>>
> >>> Yes this hasn't been fixed yet!
> >>>
> >>> Thanks
> >>> mahadev
> >>>> -Todd
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Todd Greenwood
> >>>>> Sent: Tuesday, August 04, 2009 11:26 AM
> >>>>> To: 'zookeeper-u...@hadoop.apache.org'
> >>>>> Subject: RE: Unending Leader Elections in WAN deploy
> >>>>>
> >>>>> Great news. Thank you Mahadev. I'll report our findings later
> >> today.
> >>>>> -Todd
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>>>>> Sent: Tuesday, August 04, 2009 11:20 AM
> >>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>
> >>>>>> Hi Todd,
> >>>>>>  I just committed 480 and 491. You can checkout the 3.2 branch
> >> now.
> >>>>>> Thanks
> >>>>>> mahadev
> >>>>>>
> >>>>>>
> >>>>>> On 8/3/09 4:29 PM, "Todd Greenwood" <to...@audiencescience.com>
> >>>> wrote:
> >>>>>>> That'd be perfect. Thanks!
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>>>>>>> Sent: Monday, August 03, 2009 4:24 PM
> >>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>
> >>>>>>>> Hi Todd,
> >>>>>>>>   Most of the patches that you mention should be in the
branch
> >>>> 3.2 by
> >>>>>>> tomm
> >>>>>>>> or so. 481, 479 are already in. 480 and 491 should be in by
> >> tomm.
> >>>>>>> Would
> >>>>>>>> that
> >>>>>>>> suffice for you?
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>> mahadev
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 8/3/09 4:21 PM, "Todd Greenwood"
<to...@audiencescience.com>
> >>>>> wrote:
> >>>>>>>>> Another problem...I've reverted to the latest versions of
the
> >>>>>>> patches
> >>>>>>>>> that are not specific to branch-3.2, and I'm getting two
> >>>> compilation
> >>>>>>>>> errors:
> >>>>>>>>>
> >>>>>>>>> build-generated:
> >>>>>>>>>     [javac] Compiling 44 source files to
> >>>>>>>>>
> >>
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>>>>> atched/branch-3.2/build/classes
> >>>>>>>>>
> >>>>>>>>> compile-main:
> >>>>>>>>>     [javac] Compiling 2 source files to
> >>>>>>>>>
> >>
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>>>>> atched/branch-3.2/build/classes
> >>>>>>>>>     [javac]
> >>>>>>>>>
> >>
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>>> atched/branch-
> >>>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
> >>>>>>>>> mStats.java:30: name clash: getQuorumPeers() and
> >>>> getQuorumPeers()
> >>>>>>> have
> >>>>>>>>> the same erasure
> >>>>>>>>>     [javac]         public String[] getQuorumPeers();
> >>>>>>>>>     [javac]                         ^
> >>>>>>>>>     [javac]
> >>>>>>>>>
> >>
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>>> atched/branch-
> >>>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
> >>>>>>>>> mStats.java:31: name clash: getServerState() and
> >>>> getServerState()
> >>>>>>> have
> >>>>>>>>> the same erasure
> >>>>>>>>>     [javac]         public String getServerState();
> >>>>>>>>>     [javac]                       ^
> >>>>>>>>>     [javac] 2 errors
> >>>>>>>>>
> >>>>>>>>> My build process is pretty simple:
> >>>>>>>>>
> >>>>>>>>> 1. copy the branch-3.2 source to a temp directory
> >>>>>>>>> (src/patched/branch-3.2)
> >>>>>>>>> 2. apply the ZOOKEEPER patches in my patches directory
> >>>>>>>>> 3. build zookeeper in the temp directory
> >>>>>>>>>
> >>>>>>>>> -Todd
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
> >>>>>>>>>> Sent: Monday, August 03, 2009 4:09 PM
> >>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
> >>>>>>>>>>
> >>>>>>>>>> Flavio,
> >>>>>>>>>> I notice that you've updated the patches referenced for the
> >> WAN
> >>>>>>>>>> deployment. There appears to be an order dependency w/
respect
> >>>> to
> >>>>>>>>> these
> >>>>>>>>>> four patches...
> >>>>>>>>>>
> >>>>>>>>>> ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
> >>>>>>>>>> ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
> >>>>>>>>>>
> >>>>>>>>>> 473 -> 479 (479 fails)
> >>>>>>>>>>
> >>>>>>>>>>
> >>
>
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> >>>>>>>>>> /src/patched/branch-3.2$ patch -p0 <
> >>>>>>>>>> ../patches/ZOOKEEPER-479-branch3.2.patch
> >>>>>>>>>> patching file
> >>>>>>>>>>
> >>
>
src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch
> >>>>>>>>>> ical.java
> >>>>>>>>>> patching file
> >>>>>>>>>>
> >>
>
src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
> >>>>>>>>>> patching file
> >>>>>>>>>>
> >>
>
src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier
> >>>>>>>>>> .java
> >>>>>>>>>> patching file
> >>>>>>>>>>
> >>>>
src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java
> >>>>>>>>>> Hunk #1 FAILED at 93.
> >>>>>>>>>> Hunk #2 FAILED at 145.
> >>>>>>>>>> 2 out of 2 hunks FAILED -- saving rejects to file
> >>>>>>>>>>
> >>
src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej
> >>
>
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> >>>>>>>>>> /src/patched/branch-3.2$ h ../patches/
> >>>>>>>>>>
> >>>>>>>>>> Could you advise as to which patches I need to apply, and
in
> >>>> what
> >>>>>>>>> order?
> >>>>>>>>>> -Todd
> >>>>>>>>>>
> >>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> >>>>>>>>>>> Sent: Friday, July 31, 2009 9:51 PM
> >>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>>>>
> >>>>>>>>>>> Perfect! Thanks for the update, Todd.
> >>>>>>>>>>>
> >>>>>>>>>>> -Flavio
> >>>>>>>>>>>
> >>>>>>>>>>> On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Thanks. You were right, I had a stale version of 479.
> >>>> Compilation
> >>>>>>>>>>>> succeeds and all tests pass on branch-3.2 with the latest
> >>>> patches
> >>>>>>>>>> 473,
> >>>>>>>>>>>> 479, 481, and 491.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Todd
> >>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> >>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:48 PM
> >>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It should be in 479. Perhaps you have a stale version of
> >> the
> >>>>>>>>> patch.
> >>>>>>>>>>>>> -Flavio
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Flavio,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm getting a compilation error for patch 491:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> compile-main:
> >>>>>>>>>>>>>   [javac] Compiling 1 source file to
> >>>>>>>>>>>>>
> >>>>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
> >>>>>>>>>>>>> src/p
> >>>>>>>>>>>>> atched/branch-3.2/build/classes
> >>>>>>>>>>>>>   [javac]
> >>>>>>>>>>>>>
> >>>>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
> >>>>>>>>>>>>> src/p
> >>>>>>>>>>>>>
> >>>>
atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/
> >>>>>>>>>>>>> FastL
> >>>>>>>>>>>>> eaderElection.java:601: cannot find symbol
> >>>>>>>>>>>>>   [javac] symbol  : method getWeight(long)
> >>>>>>>>>>>>>   [javac] location: interface
> >>>>>>>>>>>>>
org.apache.zookeeper.server.quorum.flexible.QuorumVerifier
> >>>>>>>>>>>>>   [javac]
> >>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
> >>>>>>>>>>>>>   [javac]
> >>>> ^
> >>>>>>>>>>>>>   [javac] 1 error
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I see a reference to getWeight in both
> >>>> FastLeaderElection.java
> >>>>>>>>> in
> >>>>>>>>>>>>> patch
> >>>>>>>>>>>>> 491:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> patches/ZOOKEEPER-491.patch:+
> >>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
> >>>>>>>>>>>>> src/java/main/org/apache/zookeeper/server/quorum/
> >>>>>>>>>>>>> FastLeaderElection.java
> >>>>>>>>>>>>> :
> >>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) !=
> >>>>>>>>>>>>> 0)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> However, I don't see a reference to this method in
patches
> >>>> 473,
> >>>>>>>>>> 479,
> >>>>>>>>>>>>> or
> >>>>>>>>>>>>> 481. I also don't see a reference to this method in the
> >>>>>>> trunk...
> >>>>>>>>>>>>> -Todd
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
> >>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:30 PM
> >>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Ok, I'll apply that patch and report back.
> >>>>>>>>>>>>> -Todd
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> >>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:18 PM
> >>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> You're missing 491 from your set of patches.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Flavio
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This repro's in both branch-3.2, and
> >>>> branch-3.2+patches(473,
> >>>>>>>>>> 479,
> >>>>>>>>>>>>> 481).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Basically, it seems like the nodes are electing
> >>>> pd4-zook02
> >>>>>>> to
> >>>>>>>>>> be
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> leader. However, pd4-zook02 seems to realize it's not
> >>>>>>>>> supposed
> >>>>>>>>>> to
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>> then disconnects everyone. Then they re-elect it again,
> >>>> and
> >>>>>>>>> it
> >>>>>>>>>>>>> loops
> >>>>>>>>>>>>> over and over.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -------------
> >>>>>>>>>>>>> Server config
> >>>>>>>>>>>>> -------------
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888
> >>>>>>>>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888
> >>>>>>>>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> group.1:1:2:3:4:5
> >>>>>>>>>>>>> weight.1=1
> >>>>>>>>>>>>> weight.2=1
> >>>>>>>>>>>>> weight.3=1
> >>>>>>>>>>>>> weight.4=1
> >>>>>>>>>>>>> weight.5=1
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> group.2:6:7:8:9
> >>>>>>>>>>>>> weight.6=0
> >>>>>>>>>>>>> weight.7=0
> >>>>>>>>>>>>> weight.8=0
> >>>>>>>>>>>>> weight.9=0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Note that we have 2 groups, composed of machines in 3
> >>>>>>>>> different
> >>>>>>>>>>>>> locations (dc1, pd1, and pd4). The idea is that only
> >>>>>>> machines
> >>>>>>>>>> in
> >>>>>>>>>>>>> dc1
> >>>>>>>>>>>>> have voting rights, and the ability to become a leader.
> >>>> The
> >>>>>>>>>>>>> machines
> >>>>>>>>>>>>> in
> >>>>>>>>>>>>> the pods all have a weight of zero, and are not expected
> >>>> to
> >>>>>>>>>>>> become
> >>>>>>>>>>>>> leaders, or to vote on transactions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Let me know what I can do to help resolve this issue.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Todd
> >>>>>>>>>>>>>
> >

Reply via email to