Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev) 

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd

> -----Original Message-----
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Tuesday, August 04, 2009 12:43 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi todd,
>  comments in line
> 
> 
> On 8/4/09 12:38 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
> 
> > Mahadev,
> >
> > Some quick questions:
> >
> > 1. Version
> >
> > I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
still
> > calling this 3.2.0. Should this be rev'd, and am I correct in
calling
> > this release 3.2.1?
> Yes the release is 3.2.1. The build.xml will be fixed as soon as we
tag
> the
> release.
> 
> >
> > 2. Build targets
> >
> > The package target fails b/c the create-cppunit-configure target
fails
> > due to various problems w/ respect to autoconf. Are these
dependencies
> > documented somewhere ? I'd like to have a fully building system.
> >
> > create-cppunit-configure:
> >      [exec] Can't exec "libtoolize": No such file or directory at
> > /usr/bin/autoreconf line 188.
> >      [exec] Use of uninitialized value $libtoolize in pattern match
> > (m//) at /usr/bin/autoreconf line 188.
> >      [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
found
> > in library
> >      [exec] configure.ac:33: error: possibly undefined macro:
> > AM_PATH_CPPUNIT
> >      [exec]       If this token and others are legitimate, please
use
> > m4_pattern_allow.
> >      [exec]       See the Autoconf documentation.
> >      [exec] configure.ac:53: error: possibly undefined macro:
> > AC_PROG_LIBTOOL
> >      [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
> >
> You need auto tools to run this. Please read the README for building c
> client library at src/c/ for the installation requirements.
> >
> > 3. Sync failure:
> >
> > This is still failing.
> >
> > svn: URL
> > 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
> > doesn't exist
> >
> 
> Yes this hasn't been fixed yet!
> 
> Thanks
> mahadev
> > -Todd
> >
> >> -----Original Message-----
> >> From: Todd Greenwood
> >> Sent: Tuesday, August 04, 2009 11:26 AM
> >> To: 'zookeeper-u...@hadoop.apache.org'
> >> Subject: RE: Unending Leader Elections in WAN deploy
> >>
> >> Great news. Thank you Mahadev. I'll report our findings later
today.
> >> -Todd
> >>
> >>> -----Original Message-----
> >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>> Sent: Tuesday, August 04, 2009 11:20 AM
> >>> To: zookeeper-u...@hadoop.apache.org
> >>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>
> >>> Hi Todd,
> >>>  I just committed 480 and 491. You can checkout the 3.2 branch
now.
> >>>
> >>> Thanks
> >>> mahadev
> >>>
> >>>
> >>> On 8/3/09 4:29 PM, "Todd Greenwood" <to...@audiencescience.com>
> > wrote:
> >>>
> >>>> That'd be perfect. Thanks!
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>>>> Sent: Monday, August 03, 2009 4:24 PM
> >>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>
> >>>>> Hi Todd,
> >>>>>   Most of the patches that you mention should be in the branch
> > 3.2 by
> >>>> tomm
> >>>>> or so. 481, 479 are already in. 480 and 491 should be in by
tomm.
> >>>> Would
> >>>>> that
> >>>>> suffice for you?
> >>>>>
> >>>>> Thanks
> >>>>> mahadev
> >>>>>
> >>>>>
> >>>>> On 8/3/09 4:21 PM, "Todd Greenwood" <to...@audiencescience.com>
> >> wrote:
> >>>>>
> >>>>>> Another problem...I've reverted to the latest versions of the
> >>>> patches
> >>>>>> that are not specific to branch-3.2, and I'm getting two
> > compilation
> >>>>>> errors:
> >>>>>>
> >>>>>> build-generated:
> >>>>>>     [javac] Compiling 44 source files to
> >>>>>>
> >>>>
> >>
> >
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>> atched/branch-3.2/build/classes
> >>>>>>
> >>>>>> compile-main:
> >>>>>>     [javac] Compiling 2 source files to
> >>>>>>
> >>>>
> >>
> >
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>> atched/branch-3.2/build/classes
> >>>>>>     [javac]
> >>>>>>
> >>>>
> >>
> >
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>>
> >>>> atched/branch-
> >> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
> >>>>>> mStats.java:30: name clash: getQuorumPeers() and
> > getQuorumPeers()
> >>>> have
> >>>>>> the same erasure
> >>>>>>     [javac]         public String[] getQuorumPeers();
> >>>>>>     [javac]                         ^
> >>>>>>     [javac]
> >>>>>>
> >>>>
> >>
> >
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> >>>>>>
> >>>> atched/branch-
> >> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
> >>>>>> mStats.java:31: name clash: getServerState() and
> > getServerState()
> >>>> have
> >>>>>> the same erasure
> >>>>>>     [javac]         public String getServerState();
> >>>>>>     [javac]                       ^
> >>>>>>     [javac] 2 errors
> >>>>>>
> >>>>>> My build process is pretty simple:
> >>>>>>
> >>>>>> 1. copy the branch-3.2 source to a temp directory
> >>>>>> (src/patched/branch-3.2)
> >>>>>> 2. apply the ZOOKEEPER patches in my patches directory
> >>>>>> 3. build zookeeper in the temp directory
> >>>>>>
> >>>>>> -Todd
> >>>>>>> -----Original Message-----
> >>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
> >>>>>>> Sent: Monday, August 03, 2009 4:09 PM
> >>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
> >>>>>>>
> >>>>>>> Flavio,
> >>>>>>> I notice that you've updated the patches referenced for the
WAN
> >>>>>>> deployment. There appears to be an order dependency w/ respect
> > to
> >>>>>> these
> >>>>>>> four patches...
> >>>>>>>
> >>>>>>> ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
> >>>>>>> ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
> >>>>>>>
> >>>>>>> 473 -> 479 (479 fails)
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> >>>>>>> /src/patched/branch-3.2$ patch -p0 <
> >>>>>>> ../patches/ZOOKEEPER-479-branch3.2.patch
> >>>>>>> patching file
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch
> >>>>>>> ical.java
> >>>>>>> patching file
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
> >>>>>>> patching file
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier
> >>>>>>> .java
> >>>>>>> patching file
> >>>>>>>
> > src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java
> >>>>>>> Hunk #1 FAILED at 93.
> >>>>>>> Hunk #2 FAILED at 145.
> >>>>>>> 2 out of 2 hunks FAILED -- saving rejects to file
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej
> >>>>>>>
> >>>>>>
> >>>>
> >>
> >
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> >>>>>>> /src/patched/branch-3.2$ h ../patches/
> >>>>>>>
> >>>>>>> Could you advise as to which patches I need to apply, and in
> > what
> >>>>>> order?
> >>>>>>>
> >>>>>>> -Todd
> >>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> >>>>>>>> Sent: Friday, July 31, 2009 9:51 PM
> >>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>
> >>>>>>>> Perfect! Thanks for the update, Todd.
> >>>>>>>>
> >>>>>>>> -Flavio
> >>>>>>>>
> >>>>>>>> On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote:
> >>>>>>>>
> >>>>>>>>> Thanks. You were right, I had a stale version of 479.
> > Compilation
> >>>>>>>>> succeeds and all tests pass on branch-3.2 with the latest
> > patches
> >>>>>>> 473,
> >>>>>>>>> 479, 481, and 491.
> >>>>>>>>>
> >>>>>>>>> -Todd
> >>>>>>>>>
> >>>>>>>>>> -----Original Message-----
> >>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> >>>>>>>>>> Sent: Friday, July 31, 2009 7:48 PM
> >>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>>>
> >>>>>>>>>> It should be in 479. Perhaps you have a stale version of
the
> >>>>>> patch.
> >>>>>>>>>>
> >>>>>>>>>> -Flavio
> >>>>>>>>>>
> >>>>>>>>>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Flavio,
> >>>>>>>>>>>
> >>>>>>>>>>> I'm getting a compilation error for patch 491:
> >>>>>>>>>>>
> >>>>>>>>>>> compile-main:
> >>>>>>>>>>>   [javac] Compiling 1 source file to
> >>>>>>>>>>>
> >>>>>>>
> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
> >>>>>>>>>>> src/p
> >>>>>>>>>>> atched/branch-3.2/build/classes
> >>>>>>>>>>>   [javac]
> >>>>>>>>>>>
> >>>>>>>
> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
> >>>>>>>>>>> src/p
> >>>>>>>>>>>
> >>>>>>>
> > atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/
> >>>>>>>>>>> FastL
> >>>>>>>>>>> eaderElection.java:601: cannot find symbol
> >>>>>>>>>>>   [javac] symbol  : method getWeight(long)
> >>>>>>>>>>>   [javac] location: interface
> >>>>>>>>>>> org.apache.zookeeper.server.quorum.flexible.QuorumVerifier
> >>>>>>>>>>>   [javac]
> >>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
> >>>>>>>>>>>   [javac]
> > ^
> >>>>>>>>>>>   [javac] 1 error
> >>>>>>>>>>>
> >>>>>>>>>>> I see a reference to getWeight in both
> > FastLeaderElection.java
> >>>>>> in
> >>>>>>>>>>> patch
> >>>>>>>>>>> 491:
> >>>>>>>>>>>
> >>>>>>>>>>> patches/ZOOKEEPER-491.patch:+
> >>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
> >>>>>>>>>>> src/java/main/org/apache/zookeeper/server/quorum/
> >>>>>>>>>>> FastLeaderElection.java
> >>>>>>>>>>> :
> >>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) !=
> >>>>>>>>>>> 0)
> >>>>>>>>>>>
> >>>>>>>>>>> However, I don't see a reference to this method in patches
> > 473,
> >>>>>>> 479,
> >>>>>>>>>>> or
> >>>>>>>>>>> 481. I also don't see a reference to this method in the
> >>>> trunk...
> >>>>>>>>>>>
> >>>>>>>>>>> -Todd
> >>>>>>>>>>>
> >>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
> >>>>>>>>>>>> Sent: Friday, July 31, 2009 7:30 PM
> >>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
> >>>>>>>>>>>>
> >>>>>>>>>>>> Ok, I'll apply that patch and report back.
> >>>>>>>>>>>> -Todd
> >>>>>>>>>>>>
> >>>>>>>>>>>> -----Original Message-----
> >>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
> >>>>>>>>>>>> Sent: Friday, July 31, 2009 7:18 PM
> >>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
> >>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>>>>>>>>>>
> >>>>>>>>>>>> You're missing 491 from your set of patches.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Flavio
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> This repro's in both branch-3.2, and
> > branch-3.2+patches(473,
> >>>>>>> 479,
> >>>>>>>>>>>> 481).
> >>>>>>>>>>>>
> >>>>>>>>>>>> Basically, it seems like the nodes are electing
> > pd4-zook02
> >>>> to
> >>>>>>> be
> >>>>>>>>>>> the
> >>>>>>>>>>>> leader. However, pd4-zook02 seems to realize it's not
> >>>>>> supposed
> >>>>>>> to
> >>>>>>>>>>> be
> >>>>>>>>>>>> and
> >>>>>>>>>>>> then disconnects everyone. Then they re-elect it again,
> > and
> >>>>>> it
> >>>>>>>>>>> loops
> >>>>>>>>>>>> over and over.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -------------
> >>>>>>>>>>>> Server config
> >>>>>>>>>>>> -------------
> >>>>>>>>>>>>
> >>>>>>>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888
> >>>>>>>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888
> >>>>>>>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888
> >>>>>>>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888
> >>>>>>>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888
> >>>>>>>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888
> >>>>>>>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888
> >>>>>>>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888
> >>>>>>>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888
> >>>>>>>>>>>>
> >>>>>>>>>>>> group.1:1:2:3:4:5
> >>>>>>>>>>>> weight.1=1
> >>>>>>>>>>>> weight.2=1
> >>>>>>>>>>>> weight.3=1
> >>>>>>>>>>>> weight.4=1
> >>>>>>>>>>>> weight.5=1
> >>>>>>>>>>>>
> >>>>>>>>>>>> group.2:6:7:8:9
> >>>>>>>>>>>> weight.6=0
> >>>>>>>>>>>> weight.7=0
> >>>>>>>>>>>> weight.8=0
> >>>>>>>>>>>> weight.9=0
> >>>>>>>>>>>>
> >>>>>>>>>>>> Note that we have 2 groups, composed of machines in 3
> >>>>>> different
> >>>>>>>>>>>> locations (dc1, pd1, and pd4). The idea is that only
> >>>> machines
> >>>>>>> in
> >>>>>>>>>>> dc1
> >>>>>>>>>>>> have voting rights, and the ability to become a leader.
> > The
> >>>>>>>>>>> machines
> >>>>>>>>>>>> in
> >>>>>>>>>>>> the pods all have a weight of zero, and are not expected
> > to
> >>>>>>>>> become
> >>>>>>>>>>>> leaders, or to vote on transactions.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Let me know what I can do to help resolve this issue.
> >>>>>>>>>>>>
> >>>>>>>>>>>> -Todd
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>>>
> >

Reply via email to