It would be better to create a JIRA with configs as well as logs.

Patrick

Mahadev Konar wrote:
Hi Todd,

  What is the synclimit you are using? Can you post your config? For WAN's
you will have to use much bigger values for synclimit and others.

Thanks
mahadev


On 8/4/09 1:24 PM, "Todd Greenwood" <to...@audiencescience.com> wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd

-----Original Message-----
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 12:43 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi todd,
 comments in line


On 8/4/09 12:38 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
still
calling this 3.2.0. Should this be rev'd, and am I correct in
calling
this release 3.2.1?
Yes the release is 3.2.1. The build.xml will be fixed as soon as we
tag
the
release.

2. Build targets

The package target fails b/c the create-cppunit-configure target
fails
due to various problems w/ respect to autoconf. Are these
dependencies
documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
     [exec] Can't exec "libtoolize": No such file or directory at
/usr/bin/autoreconf line 188.
     [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
     [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
found
in library
     [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
     [exec]       If this token and others are legitimate, please
use
m4_pattern_allow.
     [exec]       See the Autoconf documentation.
     [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
     [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1

You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.
3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist

Yes this hasn't been fixed yet!

Thanks
mahadev
-Todd

-----Original Message-----
From: Todd Greenwood
Sent: Tuesday, August 04, 2009 11:26 AM
To: 'zookeeper-u...@hadoop.apache.org'
Subject: RE: Unending Leader Elections in WAN deploy

Great news. Thank you Mahadev. I'll report our findings later
today.
-Todd

-----Original Message-----
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 11:20 AM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
 I just committed 480 and 491. You can checkout the 3.2 branch
now.
Thanks
mahadev


On 8/3/09 4:29 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
That'd be perfect. Thanks!

-----Original Message-----
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Monday, August 03, 2009 4:24 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
  Most of the patches that you mention should be in the branch
3.2 by
tomm
or so. 481, 479 are already in. 480 and 491 should be in by
tomm.
Would
that
suffice for you?

Thanks
mahadev


On 8/3/09 4:21 PM, "Todd Greenwood" <to...@audiencescience.com>
wrote:
Another problem...I've reverted to the latest versions of the
patches
that are not specific to branch-3.2, and I'm getting two
compilation
errors:

build-generated:
    [javac] Compiling 44 source files to

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
atched/branch-3.2/build/classes

compile-main:
    [javac] Compiling 2 source files to

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
atched/branch-3.2/build/classes
    [javac]

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
atched/branch-
3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
mStats.java:30: name clash: getQuorumPeers() and
getQuorumPeers()
have
the same erasure
    [javac]         public String[] getQuorumPeers();
    [javac]                         ^
    [javac]

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
atched/branch-
3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
mStats.java:31: name clash: getServerState() and
getServerState()
have
the same erasure
    [javac]         public String getServerState();
    [javac]                       ^
    [javac] 2 errors

My build process is pretty simple:

1. copy the branch-3.2 source to a temp directory
(src/patched/branch-3.2)
2. apply the ZOOKEEPER patches in my patches directory
3. build zookeeper in the temp directory

-Todd
-----Original Message-----
From: Todd Greenwood [mailto:to...@audiencescience.com]
Sent: Monday, August 03, 2009 4:09 PM
To: zookeeper-u...@hadoop.apache.org
Subject: RE: Unending Leader Elections in WAN deploy

Flavio,
I notice that you've updated the patches referenced for the
WAN
deployment. There appears to be an order dependency w/ respect
to
these
four patches...

ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch

473 -> 479 (479 fails)


to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/patched/branch-3.2$ patch -p0 <
../patches/ZOOKEEPER-479-branch3.2.patch
patching file

src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch
ical.java
patching file

src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
patching file

src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier
.java
patching file

src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java
Hunk #1 FAILED at 93.
Hunk #2 FAILED at 145.
2 out of 2 hunks FAILED -- saving rejects to file

src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/patched/branch-3.2$ h ../patches/

Could you advise as to which patches I need to apply, and in
what
order?
-Todd

-----Original Message-----
From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
Sent: Friday, July 31, 2009 9:51 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Perfect! Thanks for the update, Todd.

-Flavio

On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote:

Thanks. You were right, I had a stale version of 479.
Compilation
succeeds and all tests pass on branch-3.2 with the latest
patches
473,
479, 481, and 491.

-Todd

-----Original Message-----
From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
Sent: Friday, July 31, 2009 7:48 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

It should be in 479. Perhaps you have a stale version of
the
patch.
-Flavio

On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote:

Flavio,

I'm getting a compilation error for patch 491:

compile-main:
  [javac] Compiling 1 source file to

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
src/p
atched/branch-3.2/build/classes
  [javac]

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
src/p

atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/
FastL
eaderElection.java:601: cannot find symbol
  [javac] symbol  : method getWeight(long)
  [javac] location: interface
org.apache.zookeeper.server.quorum.flexible.QuorumVerifier
  [javac]
if(self.getQuorumVerifier().getWeight(n.sid) != 0)
  [javac]
^
  [javac] 1 error

I see a reference to getWeight in both
FastLeaderElection.java
in
patch
491:

patches/ZOOKEEPER-491.patch:+
if(self.getQuorumVerifier().getWeight(n.sid) != 0)
src/java/main/org/apache/zookeeper/server/quorum/
FastLeaderElection.java
:
if(self.getQuorumVerifier().getWeight(n.sid) !=
0)

However, I don't see a reference to this method in patches
473,
479,
or
481. I also don't see a reference to this method in the
trunk...
-Todd

-----Original Message-----
From: Todd Greenwood [mailto:to...@audiencescience.com]
Sent: Friday, July 31, 2009 7:30 PM
To: zookeeper-u...@hadoop.apache.org
Subject: RE: Unending Leader Elections in WAN deploy

Ok, I'll apply that patch and report back.
-Todd

-----Original Message-----
From: Flavio Junqueira [mailto:f...@yahoo-inc.com]
Sent: Friday, July 31, 2009 7:18 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

You're missing 491 from your set of patches.

-Flavio

On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote:

This repro's in both branch-3.2, and
branch-3.2+patches(473,
479,
481).

Basically, it seems like the nodes are electing
pd4-zook02
to
be
the
leader. However, pd4-zook02 seems to realize it's not
supposed
to
be
and
then disconnects everyone. Then they re-elect it again,
and
it
loops
over and over.

-------------
Server config
-------------

server.1=dc1-zook01.dc01.revsci.net:2888:3888
server.2=dc1-zook02.dc01.revsci.net:2888:3888
server.3=dc1-zook03.dc01.revsci.net:2888:3888
server.4=dc1-zook04.dc01.revsci.net:2888:3888
server.5=dc1-zook05.dc01.revsci.net:2888:3888
server.6=pd1-zook01.pd01.revsci.net:2888:3888
server.7=pd1-zook02.pd01.revsci.net:2888:3888
server.8=pd4-zook01.iad1.audsci.net:2888:3888
server.9=pd4-zook02.iad1.audsci.net:2888:3888

group.1:1:2:3:4:5
weight.1=1
weight.2=1
weight.3=1
weight.4=1
weight.5=1

group.2:6:7:8:9
weight.6=0
weight.7=0
weight.8=0
weight.9=0

Note that we have 2 groups, composed of machines in 3
different
locations (dc1, pd1, and pd4). The idea is that only
machines
in
dc1
have voting rights, and the ability to become a leader.
The
machines
in
the pods all have a weight of zero, and are not expected
to
become
leaders, or to vote on transactions.

Let me know what I can do to help resolve this issue.

-Todd


Reply via email to