[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager

2009-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738964#action_12738964
 ] 

Hudson commented on ZOOKEEPER-481:
--

Integrated in ZooKeeper-trunk #404 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/])
. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev)


 Add lastMessageSent to QuorumCnxManager
 ---

 Key: ZOOKEEPER-481
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.1.1, 3.2.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-481-branch3.2.patch, 
 ZOOKEEPER-481-branch3.2.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, 
 ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch


 Currently we rely on TCP for reliable delivery of FLE messages. However, as 
 we concurrently drop and create new connections, it is possible that a 
 message is sent but never received. With this patch, cnx manager keeps a list 
 of last messages sent, and resends the last one sent. Receiving multiples 
 copies is harmless. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-479) QuorumHierarchical does not count groups correctly

2009-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738966#action_12738966
 ] 

Hudson commented on ZOOKEEPER-479:
--

Integrated in ZooKeeper-trunk #404 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/])
.  QuorumHierarchical does not count groups correctly (flavio via mahadev)


 QuorumHierarchical does not count groups correctly
 --

 Key: ZOOKEEPER-479
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-479
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-479-branch3.2.patch, ZOOKEEPER-479.patch, 
 ZOOKEEPER-479.patch, ZOOKEEPER-479.patch


 QuorumHierarchical::containsQuorum should not verify if all groups 
 represented in the input set have more than half of the total weight. 
 Instead, it should check only for an overall majority of groups. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-480.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

I just committed this. thanks flavio.

 FLE should perform leader check when node is not leading and add vote of 
 follower
 -

 Key: ZOOKEEPER-480
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.2.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-480-3.2branch.patch, 
 ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, 
 ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch


 As a server may join leader election while others have already elected a 
 leader, it is necessary that a server handles some special cases of leader 
 election when notifications are from servers that are either LEADING or 
 FOLLOWING. In such special cases, we check if we have received a message from 
 the leader to declare a leader elected. This check does not consider the case 
 that the process performing the check might be a recently elected leader, and 
 consequently the check fails.
 This patch also adds a new case, which corresponds to adding a vote to 
 recvset when the notification is from a process LEADING or FOLLOWING. This 
 fixes the case raised in ZOOKEEPER-475.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-491) Prevent zero-weight servers from being elected

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-491:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1 for the patch. I just committed this. thanks flavio!

 Prevent zero-weight servers from being elected
 --

 Key: ZOOKEEPER-491
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491
 Project: Zookeeper
  Issue Type: New Feature
  Components: leaderElection
Affects Versions: 3.2.0
Reporter: Flavio Paiva Junqueira
Assignee: Flavio Paiva Junqueira
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch


 This is a fix to prevent zero-weight servers from being elected leaders. This 
 will allow in wide-area scenarios to restrict the set of servers that can 
 lead the ensemble.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-475.
-

Resolution: Fixed

given ZOOKEEPER-479, ZOOKEEPER-480, ZOOKEEPER-481 have been fixed, this should 
be fixed.

 FLENewEpochTest failed on nightly builds.
 -

 Key: ZOOKEEPER-475
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475
 Project: Zookeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.2.0
Reporter: Mahadev konar
Assignee: Flavio Paiva Junqueira
Priority: Blocker
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-475.patch, ZOOKEEPER-475.patch


 THe flenewepochtest failed on one of the nightly builds -
 http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-368) Observers

2009-08-04 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-368:
-

Attachment: observers.patch
obs-refactor.patch

Here is both a slightly modified version of the refactor patch, and a patch 
containing the new code for Observers. I have included some tests now as well. 
The Observer implementation is simplified from previous patches. 

I have added new methods to QuorumPeer to get at both the entire view of the 
ensemble, the voting view (containing Followers) and the observing view. 

To use an Observer, in the ensemble config file append :observer to the 
description for any server you want to be an Observer. So for example write:

server.3:localhost:2181:3181:observer

In the Observer's own config file, add a line with the option

peerType=observer

I will probably in the future remove these slightly redundant specifications, 
but for now you will need both. 

You must apply the patches in order; the refactor patch first. Both patches 
apply cleanly for me using patch -p0 against a clean checkout of trunk as of 
tonight (Aug 4th).



 Observers
 -

 Key: ZOOKEEPER-368
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Reporter: Flavio Paiva Junqueira
Assignee: Henry Robinson
 Attachments: obs-refactor.patch, observer-refactor.patch, 
 observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
 ZOOKEEPER-368.patch


 Currently, all servers of an ensemble participate actively in reaching 
 agreement on the order of ZooKeeper transactions. That is, all followers 
 receive proposals, acknowledge them, and receive commit messages from the 
 leader. A leader issues commit messages once it receives acknowledgments from 
 a quorum of followers. For cross-colo operation, it would be useful to have a 
 third role: observer. Using Paxos terminology, observers are similar to 
 learners. An observer does not participate actively in the agreement step of 
 the atomic broadcast protocol. Instead, it only commits proposals that have 
 been accepted by some quorum of followers.
 One simple solution to implement observers is to have the leader forwarding 
 commit messages not only to followers but also to observers, and have 
 observers applying transactions according to the order followers agreed upon. 
 In the current implementation of the protocol, however, commit messages do 
 not carry their corresponding transaction payload because all servers 
 different from the leader are followers and followers receive such a payload 
 first through a proposal message. Just forwarding commit messages as they 
 currently are to an observer consequently is not sufficient. We have a couple 
 of options:
 1- Include the transaction payload along in commit messages to observers;
 2- Send proposals to observers as well.
 Number 2 is simpler to implement because it doesn't require changing the 
 protocol implementation, but it increases traffic slightly. The performance 
 impact due to such an increase might be insignificant, though.
 For scalability purposes, we may consider having followers also forwarding 
 commit messages to observers. With this option, observers can connect to 
 followers, and receive messages from followers. This choice is important to 
 avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still
calling this 3.2.0. Should this be rev'd, and am I correct in calling
this release 3.2.1? 

2. Build targets

The package target fails b/c the create-cppunit-configure target fails
due to various problems w/ respect to autoconf. Are these dependencies
documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
 [exec] Can't exec libtoolize: No such file or directory at
/usr/bin/autoreconf line 188.
 [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
 [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found
in library
 [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
 [exec]   If this token and others are legitimate, please use
m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1


3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist

-Todd

 -Original Message-
 From: Todd Greenwood
 Sent: Tuesday, August 04, 2009 11:26 AM
 To: 'zookeeper-u...@hadoop.apache.org'
 Subject: RE: Unending Leader Elections in WAN deploy
 
 Great news. Thank you Mahadev. I'll report our findings later today.
 -Todd
 
  -Original Message-
  From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
  Sent: Tuesday, August 04, 2009 11:20 AM
  To: zookeeper-u...@hadoop.apache.org
  Subject: Re: Unending Leader Elections in WAN deploy
 
  Hi Todd,
   I just committed 480 and 491. You can checkout the 3.2 branch now.
 
  Thanks
  mahadev
 
 
  On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com
wrote:
 
   That'd be perfect. Thanks!
  
   -Original Message-
   From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
   Sent: Monday, August 03, 2009 4:24 PM
   To: zookeeper-u...@hadoop.apache.org
   Subject: Re: Unending Leader Elections in WAN deploy
  
   Hi Todd,
 Most of the patches that you mention should be in the branch
3.2 by
   tomm
   or so. 481, 479 are already in. 480 and 491 should be in by tomm.
   Would
   that
   suffice for you?
  
   Thanks
   mahadev
  
  
   On 8/3/09 4:21 PM, Todd Greenwood to...@audiencescience.com
 wrote:
  
   Another problem...I've reverted to the latest versions of the
   patches
   that are not specific to branch-3.2, and I'm getting two
compilation
   errors:
  
   build-generated:
   [javac] Compiling 44 source files to
  
  

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
   atched/branch-3.2/build/classes
  
   compile-main:
   [javac] Compiling 2 source files to
  
  

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
   atched/branch-3.2/build/classes
   [javac]
  
  

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
  
   atched/branch-
 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
   mStats.java:30: name clash: getQuorumPeers() and
getQuorumPeers()
   have
   the same erasure
   [javac] public String[] getQuorumPeers();
   [javac] ^
   [javac]
  
  

/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
  
   atched/branch-
 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
   mStats.java:31: name clash: getServerState() and
getServerState()
   have
   the same erasure
   [javac] public String getServerState();
   [javac]   ^
   [javac] 2 errors
  
   My build process is pretty simple:
  
   1. copy the branch-3.2 source to a temp directory
   (src/patched/branch-3.2)
   2. apply the ZOOKEEPER patches in my patches directory
   3. build zookeeper in the temp directory
  
   -Todd
   -Original Message-
   From: Todd Greenwood [mailto:to...@audiencescience.com]
   Sent: Monday, August 03, 2009 4:09 PM
   To: zookeeper-u...@hadoop.apache.org
   Subject: RE: Unending Leader Elections in WAN deploy
  
   Flavio,
   I notice that you've updated the patches referenced for the WAN
   deployment. There appears to be an order dependency w/ respect
to
   these
   four patches...
  
   ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
   ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
  
   473 - 479 (479 fails)
  
  
  
  

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
   /src/patched/branch-3.2$ patch -p0 
   ../patches/ZOOKEEPER-479-branch3.2.patch
   patching file
  
  
  

src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch
   ical.java
   patching file
  
  
  

src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
   patching file
  
  
  


Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi todd, 
 comments in line


On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com wrote:

 Mahadev,
 
 Some quick questions:
 
 1. Version
 
 I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still
 calling this 3.2.0. Should this be rev'd, and am I correct in calling
 this release 3.2.1?
Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the
release.

 
 2. Build targets
 
 The package target fails b/c the create-cppunit-configure target fails
 due to various problems w/ respect to autoconf. Are these dependencies
 documented somewhere ? I'd like to have a fully building system.
 
 create-cppunit-configure:
  [exec] Can't exec libtoolize: No such file or directory at
 /usr/bin/autoreconf line 188.
  [exec] Use of uninitialized value $libtoolize in pattern match
 (m//) at /usr/bin/autoreconf line 188.
  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found
 in library
  [exec] configure.ac:33: error: possibly undefined macro:
 AM_PATH_CPPUNIT
  [exec]   If this token and others are legitimate, please use
 m4_pattern_allow.
  [exec]   See the Autoconf documentation.
  [exec] configure.ac:53: error: possibly undefined macro:
 AC_PROG_LIBTOOL
  [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
 
You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.
 
 3. Sync failure:
 
 This is still failing.
 
 svn: URL
 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
 doesn't exist
 

Yes this hasn't been fixed yet!

Thanks
mahadev
 -Todd
 
 -Original Message-
 From: Todd Greenwood
 Sent: Tuesday, August 04, 2009 11:26 AM
 To: 'zookeeper-u...@hadoop.apache.org'
 Subject: RE: Unending Leader Elections in WAN deploy
 
 Great news. Thank you Mahadev. I'll report our findings later today.
 -Todd
 
 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Tuesday, August 04, 2009 11:20 AM
 To: zookeeper-u...@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi Todd,
  I just committed 480 and 491. You can checkout the 3.2 branch now.
 
 Thanks
 mahadev
 
 
 On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com
 wrote:
 
 That'd be perfect. Thanks!
 
 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Monday, August 03, 2009 4:24 PM
 To: zookeeper-u...@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi Todd,
   Most of the patches that you mention should be in the branch
 3.2 by
 tomm
 or so. 481, 479 are already in. 480 and 491 should be in by tomm.
 Would
 that
 suffice for you?
 
 Thanks
 mahadev
 
 
 On 8/3/09 4:21 PM, Todd Greenwood to...@audiencescience.com
 wrote:
 
 Another problem...I've reverted to the latest versions of the
 patches
 that are not specific to branch-3.2, and I'm getting two
 compilation
 errors:
 
 build-generated:
 [javac] Compiling 44 source files to
 
 
 
 /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
 atched/branch-3.2/build/classes
 
 compile-main:
 [javac] Compiling 2 source files to
 
 
 
 /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
 atched/branch-3.2/build/classes
 [javac]
 
 
 
 /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
 
 atched/branch-
 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
 mStats.java:30: name clash: getQuorumPeers() and
 getQuorumPeers()
 have
 the same erasure
 [javac] public String[] getQuorumPeers();
 [javac] ^
 [javac]
 
 
 
 /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
 
 atched/branch-
 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
 mStats.java:31: name clash: getServerState() and
 getServerState()
 have
 the same erasure
 [javac] public String getServerState();
 [javac]   ^
 [javac] 2 errors
 
 My build process is pretty simple:
 
 1. copy the branch-3.2 source to a temp directory
 (src/patched/branch-3.2)
 2. apply the ZOOKEEPER patches in my patches directory
 3. build zookeeper in the temp directory
 
 -Todd
 -Original Message-
 From: Todd Greenwood [mailto:to...@audiencescience.com]
 Sent: Monday, August 03, 2009 4:09 PM
 To: zookeeper-u...@hadoop.apache.org
 Subject: RE: Unending Leader Elections in WAN deploy
 
 Flavio,
 I notice that you've updated the patches referenced for the WAN
 deployment. There appears to be an order dependency w/ respect
 to
 these
 four patches...
 
 ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
 ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
 
 473 - 479 (479 fails)
 
 
 
 
 
 to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
 /src/patched/branch-3.2$ patch -p0 
 ../patches/ZOOKEEPER-479-branch3.2.patch
 patching file
 
 
 
 
 

RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev) 

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd

 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Tuesday, August 04, 2009 12:43 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi todd,
  comments in line
 
 
 On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com
wrote:
 
  Mahadev,
 
  Some quick questions:
 
  1. Version
 
  I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
still
  calling this 3.2.0. Should this be rev'd, and am I correct in
calling
  this release 3.2.1?
 Yes the release is 3.2.1. The build.xml will be fixed as soon as we
tag
 the
 release.
 
 
  2. Build targets
 
  The package target fails b/c the create-cppunit-configure target
fails
  due to various problems w/ respect to autoconf. Are these
dependencies
  documented somewhere ? I'd like to have a fully building system.
 
  create-cppunit-configure:
   [exec] Can't exec libtoolize: No such file or directory at
  /usr/bin/autoreconf line 188.
   [exec] Use of uninitialized value $libtoolize in pattern match
  (m//) at /usr/bin/autoreconf line 188.
   [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
found
  in library
   [exec] configure.ac:33: error: possibly undefined macro:
  AM_PATH_CPPUNIT
   [exec]   If this token and others are legitimate, please
use
  m4_pattern_allow.
   [exec]   See the Autoconf documentation.
   [exec] configure.ac:53: error: possibly undefined macro:
  AC_PROG_LIBTOOL
   [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
 
 You need auto tools to run this. Please read the README for building c
 client library at src/c/ for the installation requirements.
 
  3. Sync failure:
 
  This is still failing.
 
  svn: URL
  'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
  doesn't exist
 
 
 Yes this hasn't been fixed yet!
 
 Thanks
 mahadev
  -Todd
 
  -Original Message-
  From: Todd Greenwood
  Sent: Tuesday, August 04, 2009 11:26 AM
  To: 'zookeeper-u...@hadoop.apache.org'
  Subject: RE: Unending Leader Elections in WAN deploy
 
  Great news. Thank you Mahadev. I'll report our findings later
today.
  -Todd
 
  -Original Message-
  From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
  Sent: Tuesday, August 04, 2009 11:20 AM
  To: zookeeper-u...@hadoop.apache.org
  Subject: Re: Unending Leader Elections in WAN deploy
 
  Hi Todd,
   I just committed 480 and 491. You can checkout the 3.2 branch
now.
 
  Thanks
  mahadev
 
 
  On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com
  wrote:
 
  That'd be perfect. Thanks!
 
  -Original Message-
  From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
  Sent: Monday, August 03, 2009 4:24 PM
  To: zookeeper-u...@hadoop.apache.org
  Subject: Re: Unending Leader Elections in WAN deploy
 
  Hi Todd,
Most of the patches that you mention should be in the branch
  3.2 by
  tomm
  or so. 481, 479 are already in. 480 and 491 should be in by
tomm.
  Would
  that
  suffice for you?
 
  Thanks
  mahadev
 
 
  On 8/3/09 4:21 PM, Todd Greenwood 

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi Todd,

  What is the synclimit you are using? Can you post your config? For WAN's
you will have to use much bigger values for synclimit and others.

Thanks
mahadev


On 8/4/09 1:24 PM, Todd Greenwood to...@audiencescience.com wrote:

 Mahadev,
 
 I just heard from IT that this build behaves in exactly the same way as
 previous versions, e.g. we get continuous leader elections that
 disconnect the followers and then get re-elected, and disconnect...etc.
 
 This is from a fresh sync to the 3.2 branch:
 
 svn co
 http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
 ./branch-3.2
 
 CHANGES.TXT show the various fixes included:
 
 to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
 /src/original$ head -n 50 branch-3.2/CHANGES.txt
 Release 3.2.1
 
 Backward compatibile changes:
 
 BUGFIXES:
   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
 flavio)
 
   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
 mahadev)
 
   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)
 
   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
 mahadev)
 
   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
   (giri via mahadev)
   
   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)
 
   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
   failure. (chris via mahadev)
 
   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
 phunt)
 
   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
 other)
   embedded clients (ryan rawson via phunt)
 
   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
 mahadev)
 
   ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
   (flavio via mahadev)
 
   ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
 cert
   (Chris Darroch via phunt)
 
   ZOOKEEPER-480. FLE should perform leader check when node is not
 leading and
   add vote of follower (flavio via mahadev)
 
   ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
 via
   mahadev)
 
 What can I do to assist you with this issue?
 
 -Todd
 
 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Tuesday, August 04, 2009 12:43 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi todd,
  comments in line
 
 
 On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com
 wrote:
 
 Mahadev,
 
 Some quick questions:
 
 1. Version
 
 I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
 still
 calling this 3.2.0. Should this be rev'd, and am I correct in
 calling
 this release 3.2.1?
 Yes the release is 3.2.1. The build.xml will be fixed as soon as we
 tag
 the
 release.
 
 
 2. Build targets
 
 The package target fails b/c the create-cppunit-configure target
 fails
 due to various problems w/ respect to autoconf. Are these
 dependencies
 documented somewhere ? I'd like to have a fully building system.
 
 create-cppunit-configure:
  [exec] Can't exec libtoolize: No such file or directory at
 /usr/bin/autoreconf line 188.
  [exec] Use of uninitialized value $libtoolize in pattern match
 (m//) at /usr/bin/autoreconf line 188.
  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
 found
 in library
  [exec] configure.ac:33: error: possibly undefined macro:
 AM_PATH_CPPUNIT
  [exec]   If this token and others are legitimate, please
 use
 m4_pattern_allow.
  [exec]   See the Autoconf documentation.
  [exec] configure.ac:53: error: possibly undefined macro:
 AC_PROG_LIBTOOL
  [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
 
 You need auto tools to run this. Please read the README for building c
 client library at src/c/ for the installation requirements.
 
 3. Sync failure:
 
 This is still failing.
 
 svn: URL
 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
 doesn't exist
 
 
 Yes this hasn't been fixed yet!
 
 Thanks
 mahadev
 -Todd
 
 -Original Message-
 From: Todd Greenwood
 Sent: Tuesday, August 04, 2009 11:26 AM
 To: 'zookeeper-u...@hadoop.apache.org'
 Subject: RE: Unending Leader Elections in WAN deploy
 
 Great news. Thank you Mahadev. I'll report our findings later
 today.
 -Todd
 
 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Tuesday, August 04, 2009 11:20 AM
 To: zookeeper-u...@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi Todd,
  I just committed 480 and 491. You can checkout the 3.2 branch
 now.
 
 Thanks
 mahadev
 
 
 On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com
 wrote:
 
 That'd be perfect. Thanks!
 
 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Monday, August 03, 2009 4:24 PM
 To: zookeeper-u...@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi 

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt

It would be better to create a JIRA with configs as well as logs.

Patrick

Mahadev Konar wrote:

Hi Todd,

  What is the synclimit you are using? Can you post your config? For WAN's
you will have to use much bigger values for synclimit and others.

Thanks
mahadev


On 8/4/09 1:24 PM, Todd Greenwood to...@audiencescience.com wrote:


Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)


  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 12:43 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi todd,
 comments in line


On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com

wrote:

Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is

still

calling this 3.2.0. Should this be rev'd, and am I correct in

calling

this release 3.2.1?

Yes the release is 3.2.1. The build.xml will be fixed as soon as we

tag

the
release.


2. Build targets

The package target fails b/c the create-cppunit-configure target

fails

due to various problems w/ respect to autoconf. Are these

dependencies

documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
 [exec] Can't exec libtoolize: No such file or directory at
/usr/bin/autoreconf line 188.
 [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
 [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not

found

in library
 [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
 [exec]   If this token and others are legitimate, please

use

m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1


You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.

3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist


Yes this hasn't been fixed yet!

Thanks
mahadev

-Todd


-Original Message-
From: Todd Greenwood
Sent: Tuesday, August 04, 2009 11:26 AM
To: 'zookeeper-u...@hadoop.apache.org'
Subject: RE: Unending Leader Elections in WAN deploy

Great news. Thank you Mahadev. I'll report our findings later

today.

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 11:20 AM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
 I just committed 480 and 491. You can checkout the 3.2 branch

now.

Thanks
mahadev


On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com

wrote:

That'd be perfect. Thanks!


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Monday, August 03, 2009 4:24 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
  Most of the patches that you mention should be 

RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Will do.

 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Tuesday, August 04, 2009 1:34 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 It would be better to create a JIRA with configs as well as logs.
 
 Patrick
 
 Mahadev Konar wrote:
  Hi Todd,
 
What is the synclimit you are using? Can you post your config? For
 WAN's
  you will have to use much bigger values for synclimit and others.
 
  Thanks
  mahadev
 
 
  On 8/4/09 1:24 PM, Todd Greenwood to...@audiencescience.com
wrote:
 
  Mahadev,
 
  I just heard from IT that this build behaves in exactly the same
way as
  previous versions, e.g. we get continuous leader elections that
  disconnect the followers and then get re-elected, and
disconnect...etc.
 
  This is from a fresh sync to the 3.2 branch:
 
  svn co
 
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
  ./branch-3.2
 
  CHANGES.TXT show the various fixes included:
 
 

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
  /src/original$ head -n 50 branch-3.2/CHANGES.txt
  Release 3.2.1
 
  Backward compatibile changes:
 
  BUGFIXES:
ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
via
  flavio)
 
ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
via
  mahadev)
 
ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
mahadev)
 
ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
  mahadev)
 
ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
(giri via mahadev)
 
ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
mahadev)
 
ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
immediate
failure. (chris via mahadev)
 
ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
via
  phunt)
 
ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
  other)
embedded clients (ryan rawson via phunt)
 
ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
via
  mahadev)
 
ZOOKEEPER-479.  QuorumHierarchical does not count groups
correctly
(flavio via mahadev)
 
ZOOKEEPER-466. crash on zookeeper_close() when using auth with
empty
  cert
(Chris Darroch via phunt)
 
ZOOKEEPER-480. FLE should perform leader check when node is not
  leading and
add vote of follower (flavio via mahadev)
 
ZOOKEEPER-491. Prevent zero-weight servers from being elected
(flavio
  via
mahadev)
 
  What can I do to assist you with this issue?
 
  -Todd
 
  -Original Message-
  From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
  Sent: Tuesday, August 04, 2009 12:43 PM
  To: zookeeper-dev@hadoop.apache.org
  Subject: Re: Unending Leader Elections in WAN deploy
 
  Hi todd,
   comments in line
 
 
  On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com
  wrote:
  Mahadev,
 
  Some quick questions:
 
  1. Version
 
  I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
  still
  calling this 3.2.0. Should this be rev'd, and am I correct in
  calling
  this release 3.2.1?
  Yes the release is 3.2.1. The build.xml will be fixed as soon as
we
  tag
  the
  release.
 
  2. Build targets
 
  The package target fails b/c the create-cppunit-configure target
  fails
  due to various problems w/ respect to autoconf. Are these
  dependencies
  documented somewhere ? I'd like to have a fully building system.
 
  create-cppunit-configure:
   [exec] Can't exec libtoolize: No such file or directory at
  /usr/bin/autoreconf line 188.
   [exec] Use of uninitialized value $libtoolize in pattern
match
  (m//) at /usr/bin/autoreconf line 188.
   [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
  found
  in library
   [exec] configure.ac:33: error: possibly undefined macro:
  AM_PATH_CPPUNIT
   [exec]   If this token and others are legitimate, please
  use
  m4_pattern_allow.
   [exec]   See the Autoconf documentation.
   [exec] configure.ac:53: error: possibly undefined macro:
  AC_PROG_LIBTOOL
   [exec] autoreconf: /usr/bin/autoconf failed with exit
status: 1
 
  You need auto tools to run this. Please read the README for
building c
  client library at src/c/ for the installation requirements.
  3. Sync failure:
 
  This is still failing.
 
  svn: URL
 
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
  doesn't exist
 
  Yes this hasn't been fixed yet!
 
  Thanks
  mahadev
  -Todd
 
  -Original Message-
  From: Todd Greenwood
  Sent: Tuesday, August 04, 2009 11:26 AM
  To: 'zookeeper-u...@hadoop.apache.org'
  Subject: RE: Unending Leader Elections in WAN deploy
 
  Great news. Thank you Mahadev. I'll report our findings later
  today.
  -Todd
 
  -Original Message-
  From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
  Sent: Tuesday, August 04, 2009 11:20 AM
  To: zookeeper-u...@hadoop.apache.org
  Subject: Re: Unending Leader 

[jira] Created: (ZOOKEEPER-497) api and forrest docs should mention if classes are thread safe

2009-08-04 Thread Patrick Hunt (JIRA)
api and forrest docs should mention if classes are thread safe
--

 Key: ZOOKEEPER-497
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-497
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Priority: Minor
 Fix For: 3.3.0


the api (c/java clients) and the forrest docs should talk about thread safety - 
in particular we don't
mention that ZooKeeper class is thread safe (etc...) Docs should be updated.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-04 Thread Todd Greenwood-Geer (JIRA)
Unending Leader Elections : WAN configuration
-

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:

CentOS 5.2 64-bit
2GB ram
java version 1.6.0_13
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 

Network Topology:
DC : central data center
POD(N): remote data center

Zookeeper Topology:
Leaders may be elected only in DC (weight = 1)
Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Priority: Critical
 Fix For: 3.2.1


In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
central DC group of ZK servers that have a voting weight = 1, and a group of 
servers in remote pods with a voting weight of 0.

What we expect to see is leaders elected only in the DC, and the pods to 
contain only followers. What we are seeing is a continuous cycling of leaders. 
We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 
479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-04 Thread Todd Greenwood-Geer (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Greenwood-Geer updated ZOOKEEPER-498:
--

Attachment: zoo.cfg
pod-zook-logs-01.tar.gz
dc-zook-logs-01.tar.gz

Zookeeper Logs and configuration files:

dc1-zook01.log
dc1-zook02.log
dc1-zook03.log
dc1-zook04.log
dc1-zook05.log
pd1-zook01.log
pd1-zook02.log
pd4-zook01.log
pd4-zook02.log
zoo.cfg


 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Priority: Critical
 Fix For: 3.2.1

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-447:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1, thanks Henry! Committed to 3.2.1 and 3.3

 zkServer.sh doesn't allow different config files to be specified on the 
 command line
 

 Key: ZOOKEEPER-447
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447
 Project: Zookeeper
  Issue Type: Improvement
Affects Versions: 3.1.1, 3.2.0
Reporter: Henry Robinson
Assignee: Henry Robinson
Priority: Minor
 Fix For: 3.2.1, 3.3.0

 Attachments: ZOOKEEPER-447.patch


 Unless I'm missing something, you can change the directory that the zoo.cfg 
 file is in by setting ZOOCFGDIR but not the name of the file itself.
 I find it convenient myself to specify the config file on the command line, 
 but we should also let it be specified by environment variable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-490:
--

Assignee: Patrick Hunt

 the java docs for session creation are misleading/incomplete
 

 Key: ZOOKEEPER-490
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
 Project: Zookeeper
  Issue Type: Bug
Affects Versions: 3.1.1, 3.2.0
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.2.1, 3.3.0


 the javadoc for ZooKeeper constructor says:
  * The client object will pick an arbitrary server and try to connect to 
 it.
  * If failed, it will try the next one in the list, until a connection is
  * established, or all the servers have been tried.
 the or all server tried phrase is misleading, it should indicate that we 
 retry until success, con closed, or session expired. 
 we also need ot mention that connection is async, that constructor returns 
 immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-485:
---

Fix Version/s: (was: 3.2.1)

 need ops documentation that details supervision of ZK server processes
 --

 Key: ZOOKEEPER-485
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation, server
Reporter: Patrick Hunt
 Fix For: 3.3.0


 We need ops documentation detailing what to do if the ZK server VM fails - by 
 fail I mean the jvm process
 exits/dies/crashes/etc...
 In general a supervisor process should be used to start/stop/restart/etc... 
 the ZK server vm.
 Something like daemontools http://cr.yp.to/daemontools.html could be used, or 
 more simply a wrapper script
 should monitor the status of the pid and restart if the jvm fails. It's up to 
 the operator, if this is not done
 automatically then it will have to be done manually, by operator restarting 
 the ZK server jvm
 The inherent behavior of ZK wrt to failures - ie that it automatically 
 recovers as long as quorum is maintained - 
 fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-493) patch for command line setquota

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-493:
---

Attachment: ZOOKEEPER-493.patch

updated patch to cleanup a bit in addition to fix.

ZOOKEEPER-493.patch supersedes previous patch (fixed naming of patch file)

 patch for command line setquota 
 

 Key: ZOOKEEPER-493
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.0
Reporter: steve bendiola
Assignee: steve bendiola
Priority: Minor
 Fix For: 3.2.1, 3.3.0

 Attachments: quotafix.patch, ZOOKEEPER-493.patch


 the command line setquota tries to use argument 3 as both a path and a value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-493) patch for command line setquota

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-493:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1, thanks Steve! Applied to 3.2.1 and 3.3

 patch for command line setquota 
 

 Key: ZOOKEEPER-493
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493
 Project: Zookeeper
  Issue Type: Bug
  Components: java client
Affects Versions: 3.2.0
Reporter: steve bendiola
Assignee: steve bendiola
Priority: Minor
 Fix For: 3.2.1, 3.3.0

 Attachments: quotafix.patch, ZOOKEEPER-493.patch


 the command line setquota tries to use argument 3 as both a path and a value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt
Todd, Mahadev and I looked at this and it turns out to be a regression. 
Ironically a patch I created for 3.2 branch to add quorum tests actually 
broke the quorum config -- a default value for a config parameter was 
lost. I'm going to submit a patch asap to get the default back, but for 
the time being you can set:


electionAlg=3

in each of your config files.

You should see reference to FastLeaderElection in your log files if this 
parameter is set correctly.


Sorry for the trouble,

Patrick

Todd Greenwood wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)


  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev) 


  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 12:43 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi todd,
 comments in line


On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com

wrote:

Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is

still

calling this 3.2.0. Should this be rev'd, and am I correct in

calling

this release 3.2.1?

Yes the release is 3.2.1. The build.xml will be fixed as soon as we

tag

the
release.


2. Build targets

The package target fails b/c the create-cppunit-configure target

fails

due to various problems w/ respect to autoconf. Are these

dependencies

documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
 [exec] Can't exec libtoolize: No such file or directory at
/usr/bin/autoreconf line 188.
 [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
 [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not

found

in library
 [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
 [exec]   If this token and others are legitimate, please

use

m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1


You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.

3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist


Yes this hasn't been fixed yet!

Thanks
mahadev

-Todd


-Original Message-
From: Todd Greenwood
Sent: Tuesday, August 04, 2009 11:26 AM
To: 'zookeeper-u...@hadoop.apache.org'
Subject: RE: Unending Leader Elections in WAN deploy

Great news. Thank you Mahadev. I'll report our findings later

today.

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 11:20 AM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
 I just committed 480 and 491. You can checkout the 3.2 branch

now.

Thanks
mahadev


On 8/3/09 4:29 PM, Todd Greenwood to...@audiencescience.com

wrote:

That'd be perfect. Thanks!


-Original Message-
From: Mahadev Konar 

[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-498:
---

Fix Version/s: 3.3.0
 Assignee: Patrick Hunt

 Unending Leader Elections : WAN configuration
 -

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:
 CentOS 5.2 64-bit
 2GB ram
 java version 1.6.0_13
 Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
 Network Topology:
 DC : central data center
 POD(N): remote data center
 Zookeeper Topology:
 Leaders may be elected only in DC (weight = 1)
 Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Assignee: Patrick Hunt
Priority: Critical
 Fix For: 3.2.1, 3.3.0

 Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg


 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
 re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
 central DC group of ZK servers that have a voting weight = 1, and a group of 
 servers in remote pods with a voting weight of 0.
 What we expect to see is leaders elected only in the DC, and the pods to 
 contain only followers. What we are seeing is a continuous cycling of 
 leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
 patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Patrick, thanks! I'll forward on to IT and I'll report back to you
shortly...

 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Tuesday, August 04, 2009 3:55 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Todd, Mahadev and I looked at this and it turns out to be a
regression.
 Ironically a patch I created for 3.2 branch to add quorum tests
actually
 broke the quorum config -- a default value for a config parameter was
 lost. I'm going to submit a patch asap to get the default back, but
for
 the time being you can set:
 
 electionAlg=3
 
 in each of your config files.
 
 You should see reference to FastLeaderElection in your log files if
this
 parameter is set correctly.
 
 Sorry for the trouble,
 
 Patrick
 
 Todd Greenwood wrote:
  Mahadev,
 
  I just heard from IT that this build behaves in exactly the same way
as
  previous versions, e.g. we get continuous leader elections that
  disconnect the followers and then get re-elected, and
disconnect...etc.
 
  This is from a fresh sync to the 3.2 branch:
 
  svn co
  http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
  ./branch-3.2
 
  CHANGES.TXT show the various fixes included:
 
 
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
  /src/original$ head -n 50 branch-3.2/CHANGES.txt
  Release 3.2.1
 
  Backward compatibile changes:
 
  BUGFIXES:
ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
via
  flavio)
 
ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
via
  mahadev)
 
ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)
 
ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
  mahadev)
 
ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
(giri via mahadev)
 
ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
mahadev)
 
ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
immediate
failure. (chris via mahadev)
 
ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
via
  phunt)
 
ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
  other)
embedded clients (ryan rawson via phunt)
 
ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
via
  mahadev)
 
ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
(flavio via mahadev)
 
ZOOKEEPER-466. crash on zookeeper_close() when using auth with
empty
  cert
(Chris Darroch via phunt)
 
ZOOKEEPER-480. FLE should perform leader check when node is not
  leading and
add vote of follower (flavio via mahadev)
 
ZOOKEEPER-491. Prevent zero-weight servers from being elected
(flavio
  via
mahadev)
 
  What can I do to assist you with this issue?
 
  -Todd
 
  -Original Message-
  From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
  Sent: Tuesday, August 04, 2009 12:43 PM
  To: zookeeper-dev@hadoop.apache.org
  Subject: Re: Unending Leader Elections in WAN deploy
 
  Hi todd,
   comments in line
 
 
  On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com
  wrote:
  Mahadev,
 
  Some quick questions:
 
  1. Version
 
  I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
  still
  calling this 3.2.0. Should this be rev'd, and am I correct in
  calling
  this release 3.2.1?
  Yes the release is 3.2.1. The build.xml will be fixed as soon as we
  tag
  the
  release.
 
  2. Build targets
 
  The package target fails b/c the create-cppunit-configure target
  fails
  due to various problems w/ respect to autoconf. Are these
  dependencies
  documented somewhere ? I'd like to have a fully building system.
 
  create-cppunit-configure:
   [exec] Can't exec libtoolize: No such file or directory at
  /usr/bin/autoreconf line 188.
   [exec] Use of uninitialized value $libtoolize in pattern
match
  (m//) at /usr/bin/autoreconf line 188.
   [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
  found
  in library
   [exec] configure.ac:33: error: possibly undefined macro:
  AM_PATH_CPPUNIT
   [exec]   If this token and others are legitimate, please
  use
  m4_pattern_allow.
   [exec]   See the Autoconf documentation.
   [exec] configure.ac:53: error: possibly undefined macro:
  AC_PROG_LIBTOOL
   [exec] autoreconf: /usr/bin/autoconf failed with exit status:
1
 
  You need auto tools to run this. Please read the README for
building c
  client library at src/c/ for the installation requirements.
  3. Sync failure:
 
  This is still failing.
 
  svn: URL
  'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
  doesn't exist
 
  Yes this hasn't been fixed yet!
 
  Thanks
  mahadev
  -Todd
 
  -Original Message-
  From: Todd Greenwood
  Sent: Tuesday, August 04, 2009 11:26 AM
  To: 'zookeeper-u...@hadoop.apache.org'
  Subject: RE: Unending Leader Elections in WAN deploy
 
  Great news. Thank you Mahadev. I'll 

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi Todd,
 Can you attach the files to the jira? I will takea look at this and will
get back to you by end of day today.

Thanks
mahadev


On 8/4/09 4:56 PM, Todd Greenwood to...@audiencescience.com wrote:

 Looks like we're not getting *any* leader elected now Logs attached.
 
 -Original Message-
 From: Todd Greenwood [mailto:to...@audiencescience.com]
 Sent: Tuesday, August 04, 2009 4:07 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: RE: Unending Leader Elections in WAN deploy
 
 Patrick, thanks! I'll forward on to IT and I'll report back to you
 shortly...
 
 -Original Message-
 From: Patrick Hunt [mailto:ph...@apache.org]
 Sent: Tuesday, August 04, 2009 3:55 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Todd, Mahadev and I looked at this and it turns out to be a
 regression.
 Ironically a patch I created for 3.2 branch to add quorum tests
 actually
 broke the quorum config -- a default value for a config parameter
 was
 lost. I'm going to submit a patch asap to get the default back, but
 for
 the time being you can set:
 
 electionAlg=3
 
 in each of your config files.
 
 You should see reference to FastLeaderElection in your log files if
 this
 parameter is set correctly.
 
 Sorry for the trouble,
 
 Patrick
 
 Todd Greenwood wrote:
 Mahadev,
 
 I just heard from IT that this build behaves in exactly the same
 way
 as
 previous versions, e.g. we get continuous leader elections that
 disconnect the followers and then get re-elected, and
 disconnect...etc.
 
 This is from a fresh sync to the 3.2 branch:
 
 svn co
 
 http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
 ./branch-3.2
 
 CHANGES.TXT show the various fixes included:
 
 
 
 to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
 /src/original$ head -n 50 branch-3.2/CHANGES.txt
 Release 3.2.1
 
 Backward compatibile changes:
 
 BUGFIXES:
   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
 via
 flavio)
 
   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
 via
 mahadev)
 
   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
 mahadev)
 
   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris
 via
 mahadev)
 
   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
   (giri via mahadev)
 
   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
 mahadev)
 
   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
 immediate
   failure. (chris via mahadev)
 
   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
 via
 phunt)
 
   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
 other)
   embedded clients (ryan rawson via phunt)
 
   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
 via
 mahadev)
 
   ZOOKEEPER-479.  QuorumHierarchical does not count groups
 correctly
   (flavio via mahadev)
 
   ZOOKEEPER-466. crash on zookeeper_close() when using auth with
 empty
 cert
   (Chris Darroch via phunt)
 
   ZOOKEEPER-480. FLE should perform leader check when node is not
 leading and
   add vote of follower (flavio via mahadev)
 
   ZOOKEEPER-491. Prevent zero-weight servers from being elected
 (flavio
 via
   mahadev)
 
 What can I do to assist you with this issue?
 
 -Todd
 
 -Original Message-
 From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
 Sent: Tuesday, August 04, 2009 12:43 PM
 To: zookeeper-dev@hadoop.apache.org
 Subject: Re: Unending Leader Elections in WAN deploy
 
 Hi todd,
  comments in line
 
 
 On 8/4/09 12:38 PM, Todd Greenwood to...@audiencescience.com
 wrote:
 Mahadev,
 
 Some quick questions:
 
 1. Version
 
 I see that the CHANGES.txt calls this 3.2.1, but the build.xml
 is
 still
 calling this 3.2.0. Should this be rev'd, and am I correct in
 calling
 this release 3.2.1?
 Yes the release is 3.2.1. The build.xml will be fixed as soon as
 we
 tag
 the
 release.
 
 2. Build targets
 
 The package target fails b/c the create-cppunit-configure target
 fails
 due to various problems w/ respect to autoconf. Are these
 dependencies
 documented somewhere ? I'd like to have a fully building system.
 
 create-cppunit-configure:
  [exec] Can't exec libtoolize: No such file or directory
 at
 /usr/bin/autoreconf line 188.
  [exec] Use of uninitialized value $libtoolize in pattern
 match
 (m//) at /usr/bin/autoreconf line 188.
  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT'
 not
 found
 in library
  [exec] configure.ac:33: error: possibly undefined macro:
 AM_PATH_CPPUNIT
  [exec]   If this token and others are legitimate,
 please
 use
 m4_pattern_allow.
  [exec]   See the Autoconf documentation.
  [exec] configure.ac:53: error: possibly undefined macro:
 AC_PROG_LIBTOOL
  [exec] autoreconf: /usr/bin/autoconf failed with exit
 status:
 1
 
 You need auto tools to run this. Please read the README for
 building c
 client library at src/c/ for the installation 

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt
Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is 
actually failing on my machine, however it's reported as success:

- Standard Error -
Exception in thread Thread-108 junit.framework.AssertionFailedError: 
Elected zero-weight server

at junit.framework.Assert.fail(Assert.java:47)
	at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138)

-  ---

this is probably due because the test is calling assert in a thread 
other than the main test thread - which junit will not track/knowabout.


One problem I see with these tests (0weight test I looked at) -- it 
doesn't have a client attempt to connect to the various servers as part 
of declaring success. Really we should only consider successful test 
(ie assert that) if a client can connect to each server in the cluster 
and change/seechanges. As part of fixing this we really need to do a 
sanity check by testing the various command lines and checking that a 
client can connect.


I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new 
epoch seems to just thrash...


Also I tried 3  5 server quorums by hand from the command line with 0 
weight and they see similar issues to what Todd is seeing.


I'm using the latest code in mainline btw.

Patrick

Mahadev Konar wrote:

Hi todd,
  I see a lot of 


java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana

ger.java:324)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.

java:304)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender

.process(FastLeaderElection.java:317)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender

.run(FastLeaderElection.java:290)
at java.lang.Thread.run(Thread.java:619)


Is it possible that there is some firewall? Can all the servers 1-9 connect
to all the others using ports that you specified in zoo.cfg i.e 2888/3888?


Thanks
mahadev


On 8/4/09 4:56 PM, Todd Greenwood to...@audiencescience.com wrote:


Looks like we're not getting *any* leader elected now Logs attached.


-Original Message-
From: Todd Greenwood [mailto:to...@audiencescience.com]
Sent: Tuesday, August 04, 2009 4:07 PM
To: zookeeper-dev@hadoop.apache.org
Subject: RE: Unending Leader Elections in WAN deploy

Patrick, thanks! I'll forward on to IT and I'll report back to you
shortly...


-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Tuesday, August 04, 2009 3:55 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Todd, Mahadev and I looked at this and it turns out to be a

regression.

Ironically a patch I created for 3.2 branch to add quorum tests

actually

broke the quorum config -- a default value for a config parameter

was

lost. I'm going to submit a patch asap to get the default back, but

for

the time being you can set:

electionAlg=3

in each of your config files.

You should see reference to FastLeaderElection in your log files if

this

parameter is set correctly.

Sorry for the trouble,

Patrick

Todd Greenwood wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same

way

as

previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and

disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co


http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2

./branch-3.2

CHANGES.TXT show the various fixes included:



to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper

/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris

via

flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris

via

mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via

mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris

via

mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)

  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via

mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent

immediate

  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev

via

phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio

via

mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not 

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt

(I see the same error in fle0weighttest using latest 3.2 btw)

Patrick Hunt wrote:
Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is 
actually failing on my machine, however it's reported as success:

- Standard Error -
Exception in thread Thread-108 junit.framework.AssertionFailedError: 
Elected zero-weight server

at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) 


-  ---

this is probably due because the test is calling assert in a thread 
other than the main test thread - which junit will not track/knowabout.


One problem I see with these tests (0weight test I looked at) -- it 
doesn't have a client attempt to connect to the various servers as part 
of declaring success. Really we should only consider successful test 
(ie assert that) if a client can connect to each server in the cluster 
and change/seechanges. As part of fixing this we really need to do a 
sanity check by testing the various command lines and checking that a 
client can connect.


I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new 
epoch seems to just thrash...


Also I tried 3  5 server quorums by hand from the command line with 0 
weight and they see similar issues to what Todd is seeing.


I'm using the latest code in mainline btw.

Patrick

Mahadev Konar wrote:

Hi todd,
  I see a lot of
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)

at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana 


ger.java:324)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. 


java:304)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender 


.process(FastLeaderElection.java:317)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender 


.run(FastLeaderElection.java:290)
at java.lang.Thread.run(Thread.java:619)


Is it possible that there is some firewall? Can all the servers 1-9 
connect
to all the others using ports that you specified in zoo.cfg i.e 
2888/3888?



Thanks
mahadev


On 8/4/09 4:56 PM, Todd Greenwood to...@audiencescience.com wrote:


Looks like we're not getting *any* leader elected now Logs attached.


-Original Message-
From: Todd Greenwood [mailto:to...@audiencescience.com]
Sent: Tuesday, August 04, 2009 4:07 PM
To: zookeeper-dev@hadoop.apache.org
Subject: RE: Unending Leader Elections in WAN deploy

Patrick, thanks! I'll forward on to IT and I'll report back to you
shortly...


-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Tuesday, August 04, 2009 3:55 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Todd, Mahadev and I looked at this and it turns out to be a

regression.

Ironically a patch I created for 3.2 branch to add quorum tests

actually

broke the quorum config -- a default value for a config parameter

was

lost. I'm going to submit a patch asap to get the default back, but

for

the time being you can set:

electionAlg=3

in each of your config files.

You should see reference to FastLeaderElection in your log files if

this

parameter is set correctly.

Sorry for the trouble,

Patrick

Todd Greenwood wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same

way

as

previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and

disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co


http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2

./branch-3.2

CHANGES.TXT show the various fixes included:



to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper

/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris

via

flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris

via

mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via

mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris

via

mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)

  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via

mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent

immediate

  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev

via

phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add