Hi Todd,

  What is the synclimit you are using? Can you post your config? For WAN's
you will have to use much bigger values for synclimit and others.

Thanks
mahadev


On 8/4/09 1:24 PM, "Todd Greenwood" <to...@audiencescience.com> wrote:

> Mahadev,
> 
> I just heard from IT that this build behaves in exactly the same way as
> previous versions, e.g. we get continuous leader elections that
> disconnect the followers and then get re-elected, and disconnect...etc.
> 
> This is from a fresh sync to the 3.2 branch:
> 
> svn co
> http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
> ./branch-3.2
> 
> CHANGES.TXT show the various fixes included:
> 
> to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> /src/original$ head -n 50 branch-3.2/CHANGES.txt
> Release 3.2.1
> 
> Backward compatibile changes:
> 
> BUGFIXES:
>   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
> flavio)
> 
>   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
> mahadev)
> 
>   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)
> 
>   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
> mahadev)
> 
>   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
>   (giri via mahadev)
>   
>   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)
> 
>   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
>   failure. (chris via mahadev)
> 
>   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
> phunt)
> 
>   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
> other)
>   embedded clients (ryan rawson via phunt)
> 
>   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
> mahadev)
> 
>   ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
>   (flavio via mahadev)
> 
>   ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
> cert
>   (Chris Darroch via phunt)
> 
>   ZOOKEEPER-480. FLE should perform leader check when node is not
> leading and
>   add vote of follower (flavio via mahadev)
> 
>   ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
> via
>   mahadev)
> 
> What can I do to assist you with this issue?
> 
> -Todd
> 
>> -----Original Message-----
>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
>> Sent: Tuesday, August 04, 2009 12:43 PM
>> To: zookeeper-dev@hadoop.apache.org
>> Subject: Re: Unending Leader Elections in WAN deploy
>> 
>> Hi todd,
>>  comments in line
>> 
>> 
>> On 8/4/09 12:38 PM, "Todd Greenwood" <to...@audiencescience.com>
> wrote:
>> 
>>> Mahadev,
>>> 
>>> Some quick questions:
>>> 
>>> 1. Version
>>> 
>>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
> still
>>> calling this 3.2.0. Should this be rev'd, and am I correct in
> calling
>>> this release 3.2.1?
>> Yes the release is 3.2.1. The build.xml will be fixed as soon as we
> tag
>> the
>> release.
>> 
>>> 
>>> 2. Build targets
>>> 
>>> The package target fails b/c the create-cppunit-configure target
> fails
>>> due to various problems w/ respect to autoconf. Are these
> dependencies
>>> documented somewhere ? I'd like to have a fully building system.
>>> 
>>> create-cppunit-configure:
>>>      [exec] Can't exec "libtoolize": No such file or directory at
>>> /usr/bin/autoreconf line 188.
>>>      [exec] Use of uninitialized value $libtoolize in pattern match
>>> (m//) at /usr/bin/autoreconf line 188.
>>>      [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
> found
>>> in library
>>>      [exec] configure.ac:33: error: possibly undefined macro:
>>> AM_PATH_CPPUNIT
>>>      [exec]       If this token and others are legitimate, please
> use
>>> m4_pattern_allow.
>>>      [exec]       See the Autoconf documentation.
>>>      [exec] configure.ac:53: error: possibly undefined macro:
>>> AC_PROG_LIBTOOL
>>>      [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
>>> 
>> You need auto tools to run this. Please read the README for building c
>> client library at src/c/ for the installation requirements.
>>> 
>>> 3. Sync failure:
>>> 
>>> This is still failing.
>>> 
>>> svn: URL
>>> 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
>>> doesn't exist
>>> 
>> 
>> Yes this hasn't been fixed yet!
>> 
>> Thanks
>> mahadev
>>> -Todd
>>> 
>>>> -----Original Message-----
>>>> From: Todd Greenwood
>>>> Sent: Tuesday, August 04, 2009 11:26 AM
>>>> To: 'zookeeper-u...@hadoop.apache.org'
>>>> Subject: RE: Unending Leader Elections in WAN deploy
>>>> 
>>>> Great news. Thank you Mahadev. I'll report our findings later
> today.
>>>> -Todd
>>>> 
>>>>> -----Original Message-----
>>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
>>>>> Sent: Tuesday, August 04, 2009 11:20 AM
>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>> 
>>>>> Hi Todd,
>>>>>  I just committed 480 and 491. You can checkout the 3.2 branch
> now.
>>>>> 
>>>>> Thanks
>>>>> mahadev
>>>>> 
>>>>> 
>>>>> On 8/3/09 4:29 PM, "Todd Greenwood" <to...@audiencescience.com>
>>> wrote:
>>>>> 
>>>>>> That'd be perfect. Thanks!
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
>>>>>>> Sent: Monday, August 03, 2009 4:24 PM
>>>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>>> 
>>>>>>> Hi Todd,
>>>>>>>   Most of the patches that you mention should be in the branch
>>> 3.2 by
>>>>>> tomm
>>>>>>> or so. 481, 479 are already in. 480 and 491 should be in by
> tomm.
>>>>>> Would
>>>>>>> that
>>>>>>> suffice for you?
>>>>>>> 
>>>>>>> Thanks
>>>>>>> mahadev
>>>>>>> 
>>>>>>> 
>>>>>>> On 8/3/09 4:21 PM, "Todd Greenwood" <to...@audiencescience.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Another problem...I've reverted to the latest versions of the
>>>>>> patches
>>>>>>>> that are not specific to branch-3.2, and I'm getting two
>>> compilation
>>>>>>>> errors:
>>>>>>>> 
>>>>>>>> build-generated:
>>>>>>>>     [javac] Compiling 44 source files to
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>>>>>>> atched/branch-3.2/build/classes
>>>>>>>> 
>>>>>>>> compile-main:
>>>>>>>>     [javac] Compiling 2 source files to
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>>>>>>> atched/branch-3.2/build/classes
>>>>>>>>     [javac]
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>>>>>>> 
>>>>>> atched/branch-
>>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
>>>>>>>> mStats.java:30: name clash: getQuorumPeers() and
>>> getQuorumPeers()
>>>>>> have
>>>>>>>> the same erasure
>>>>>>>>     [javac]         public String[] getQuorumPeers();
>>>>>>>>     [javac]                         ^
>>>>>>>>     [javac]
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>>>>>>> 
>>>>>> atched/branch-
>>>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
>>>>>>>> mStats.java:31: name clash: getServerState() and
>>> getServerState()
>>>>>> have
>>>>>>>> the same erasure
>>>>>>>>     [javac]         public String getServerState();
>>>>>>>>     [javac]                       ^
>>>>>>>>     [javac] 2 errors
>>>>>>>> 
>>>>>>>> My build process is pretty simple:
>>>>>>>> 
>>>>>>>> 1. copy the branch-3.2 source to a temp directory
>>>>>>>> (src/patched/branch-3.2)
>>>>>>>> 2. apply the ZOOKEEPER patches in my patches directory
>>>>>>>> 3. build zookeeper in the temp directory
>>>>>>>> 
>>>>>>>> -Todd
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
>>>>>>>>> Sent: Monday, August 03, 2009 4:09 PM
>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
>>>>>>>>> 
>>>>>>>>> Flavio,
>>>>>>>>> I notice that you've updated the patches referenced for the
> WAN
>>>>>>>>> deployment. There appears to be an order dependency w/ respect
>>> to
>>>>>>>> these
>>>>>>>>> four patches...
>>>>>>>>> 
>>>>>>>>> ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
>>>>>>>>> ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
>>>>>>>>> 
>>>>>>>>> 473 -> 479 (479 fails)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
>>>>>>>>> /src/patched/branch-3.2$ patch -p0 <
>>>>>>>>> ../patches/ZOOKEEPER-479-branch3.2.patch
>>>>>>>>> patching file
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch
>>>>>>>>> ical.java
>>>>>>>>> patching file
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
>>>>>>>>> patching file
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier
>>>>>>>>> .java
>>>>>>>>> patching file
>>>>>>>>> 
>>> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java
>>>>>>>>> Hunk #1 FAILED at 93.
>>>>>>>>> Hunk #2 FAILED at 145.
>>>>>>>>> 2 out of 2 hunks FAILED -- saving rejects to file
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej
>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
> to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
>>>>>>>>> /src/patched/branch-3.2$ h ../patches/
>>>>>>>>> 
>>>>>>>>> Could you advise as to which patches I need to apply, and in
>>> what
>>>>>>>> order?
>>>>>>>>> 
>>>>>>>>> -Todd
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
>>>>>>>>>> Sent: Friday, July 31, 2009 9:51 PM
>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>>>>>> 
>>>>>>>>>> Perfect! Thanks for the update, Todd.
>>>>>>>>>> 
>>>>>>>>>> -Flavio
>>>>>>>>>> 
>>>>>>>>>> On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote:
>>>>>>>>>> 
>>>>>>>>>>> Thanks. You were right, I had a stale version of 479.
>>> Compilation
>>>>>>>>>>> succeeds and all tests pass on branch-3.2 with the latest
>>> patches
>>>>>>>>> 473,
>>>>>>>>>>> 479, 481, and 491.
>>>>>>>>>>> 
>>>>>>>>>>> -Todd
>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:48 PM
>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>>>>>>>> 
>>>>>>>>>>>> It should be in 479. Perhaps you have a stale version of
> the
>>>>>>>> patch.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Flavio
>>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Flavio,
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm getting a compilation error for patch 491:
>>>>>>>>>>>> 
>>>>>>>>>>>> compile-main:
>>>>>>>>>>>>   [javac] Compiling 1 source file to
>>>>>>>>>>>> 
>>>>>>>>> 
>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
>>>>>>>>>>>> src/p
>>>>>>>>>>>> atched/branch-3.2/build/classes
>>>>>>>>>>>>   [javac]
>>>>>>>>>>>> 
>>>>>>>>> 
>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
>>>>>>>>>>>> src/p
>>>>>>>>>>>> 
>>>>>>>>> 
>>> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/
>>>>>>>>>>>> FastL
>>>>>>>>>>>> eaderElection.java:601: cannot find symbol
>>>>>>>>>>>>   [javac] symbol  : method getWeight(long)
>>>>>>>>>>>>   [javac] location: interface
>>>>>>>>>>>> org.apache.zookeeper.server.quorum.flexible.QuorumVerifier
>>>>>>>>>>>>   [javac]
>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
>>>>>>>>>>>>   [javac]
>>> ^
>>>>>>>>>>>>   [javac] 1 error
>>>>>>>>>>>> 
>>>>>>>>>>>> I see a reference to getWeight in both
>>> FastLeaderElection.java
>>>>>>>> in
>>>>>>>>>>>> patch
>>>>>>>>>>>> 491:
>>>>>>>>>>>> 
>>>>>>>>>>>> patches/ZOOKEEPER-491.patch:+
>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
>>>>>>>>>>>> src/java/main/org/apache/zookeeper/server/quorum/
>>>>>>>>>>>> FastLeaderElection.java
>>>>>>>>>>>> :
>>>>>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) !=
>>>>>>>>>>>> 0)
>>>>>>>>>>>> 
>>>>>>>>>>>> However, I don't see a reference to this method in patches
>>> 473,
>>>>>>>>> 479,
>>>>>>>>>>>> or
>>>>>>>>>>>> 481. I also don't see a reference to this method in the
>>>>>> trunk...
>>>>>>>>>>>> 
>>>>>>>>>>>> -Todd
>>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:30 PM
>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
>>>>>>>>>>>> 
>>>>>>>>>>>> Ok, I'll apply that patch and report back.
>>>>>>>>>>>> -Todd
>>>>>>>>>>>> 
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>> From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
>>>>>>>>>>>> Sent: Friday, July 31, 2009 7:18 PM
>>>>>>>>>>>> To: zookeeper-u...@hadoop.apache.org
>>>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>>>>>>>> 
>>>>>>>>>>>> You're missing 491 from your set of patches.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Flavio
>>>>>>>>>>>> 
>>>>>>>>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> This repro's in both branch-3.2, and
>>> branch-3.2+patches(473,
>>>>>>>>> 479,
>>>>>>>>>>>> 481).
>>>>>>>>>>>> 
>>>>>>>>>>>> Basically, it seems like the nodes are electing
>>> pd4-zook02
>>>>>> to
>>>>>>>>> be
>>>>>>>>>>>> the
>>>>>>>>>>>> leader. However, pd4-zook02 seems to realize it's not
>>>>>>>> supposed
>>>>>>>>> to
>>>>>>>>>>>> be
>>>>>>>>>>>> and
>>>>>>>>>>>> then disconnects everyone. Then they re-elect it again,
>>> and
>>>>>>>> it
>>>>>>>>>>>> loops
>>>>>>>>>>>> over and over.
>>>>>>>>>>>> 
>>>>>>>>>>>> -------------
>>>>>>>>>>>> Server config
>>>>>>>>>>>> -------------
>>>>>>>>>>>> 
>>>>>>>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888
>>>>>>>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888
>>>>>>>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888
>>>>>>>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888
>>>>>>>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888
>>>>>>>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888
>>>>>>>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888
>>>>>>>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888
>>>>>>>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888
>>>>>>>>>>>> 
>>>>>>>>>>>> group.1:1:2:3:4:5
>>>>>>>>>>>> weight.1=1
>>>>>>>>>>>> weight.2=1
>>>>>>>>>>>> weight.3=1
>>>>>>>>>>>> weight.4=1
>>>>>>>>>>>> weight.5=1
>>>>>>>>>>>> 
>>>>>>>>>>>> group.2:6:7:8:9
>>>>>>>>>>>> weight.6=0
>>>>>>>>>>>> weight.7=0
>>>>>>>>>>>> weight.8=0
>>>>>>>>>>>> weight.9=0
>>>>>>>>>>>> 
>>>>>>>>>>>> Note that we have 2 groups, composed of machines in 3
>>>>>>>> different
>>>>>>>>>>>> locations (dc1, pd1, and pd4). The idea is that only
>>>>>> machines
>>>>>>>>> in
>>>>>>>>>>>> dc1
>>>>>>>>>>>> have voting rights, and the ability to become a leader.
>>> The
>>>>>>>>>>>> machines
>>>>>>>>>>>> in
>>>>>>>>>>>> the pods all have a weight of zero, and are not expected
>>> to
>>>>>>>>>>> become
>>>>>>>>>>>> leaders, or to vote on transactions.
>>>>>>>>>>>> 
>>>>>>>>>>>> Let me know what I can do to help resolve this issue.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Todd
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>> 
> 

Reply via email to