Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt

(I see the same error in fle0weighttest using latest 3.2 btw)

Patrick Hunt wrote:
Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is 
actually failing on my machine, however it's reported as success:

- Standard Error -
Exception in thread "Thread-108" junit.framework.AssertionFailedError: 
Elected zero-weight server

at junit.framework.Assert.fail(Assert.java:47)
at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) 


-  ---

this is probably due because the test is calling assert in a thread 
other than the main test thread - which junit will not track/knowabout.


One problem I see with these tests (0weight test I looked at) -- it 
doesn't have a client attempt to connect to the various servers as part 
of declaring success. Really we should only consider "success"ful test 
(ie assert that) if a client can connect to each server in the cluster 
and change/seechanges. As part of fixing this we really need to do a 
sanity check by testing the various command lines and checking that a 
client can connect.


I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new 
epoch seems to just thrash...


Also I tried 3 & 5 server quorums "by hand from the command line" with 0 
weight and they see similar issues to what Todd is seeing.


I'm using the latest code in mainline btw.

Patrick

Mahadev Konar wrote:

Hi todd,
  I see a lot of
java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)

at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana 


ger.java:324)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. 


java:304)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender 


.process(FastLeaderElection.java:317)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender 


.run(FastLeaderElection.java:290)
at java.lang.Thread.run(Thread.java:619)


Is it possible that there is some firewall? Can all the servers 1-9 
connect
to all the others using ports that you specified in zoo.cfg i.e 
2888/3888?



Thanks
mahadev


On 8/4/09 4:56 PM, "Todd Greenwood"  wrote:


Looks like we're not getting *any* leader elected now Logs attached.


-Original Message-
From: Todd Greenwood [mailto:to...@audiencescience.com]
Sent: Tuesday, August 04, 2009 4:07 PM
To: zookeeper-dev@hadoop.apache.org
Subject: RE: Unending Leader Elections in WAN deploy

Patrick, thanks! I'll forward on to IT and I'll report back to you
shortly...


-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Tuesday, August 04, 2009 3:55 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Todd, Mahadev and I looked at this and it turns out to be a

regression.

Ironically a patch I created for 3.2 branch to add quorum tests

actually

broke the quorum config -- a default value for a config parameter

was

lost. I'm going to submit a patch asap to get the default back, but

for

the time being you can set:

electionAlg=3

in each of your config files.

You should see reference to FastLeaderElection in your log files if

this

parameter is set correctly.

Sorry for the trouble,

Patrick

Todd Greenwood wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same

way

as

previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and

disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co


http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2

./branch-3.2

CHANGES.TXT show the various fixes included:



to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper

/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris

via

flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris

via

mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via

mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris

via

mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)

  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via

mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent

immediate

  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev

via

phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxMan

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt
Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is 
actually failing on my machine, however it's reported as success:

- Standard Error -
Exception in thread "Thread-108" junit.framework.AssertionFailedError: 
Elected zero-weight server

at junit.framework.Assert.fail(Assert.java:47)
	at 
org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138)

-  ---

this is probably due because the test is calling assert in a thread 
other than the main test thread - which junit will not track/knowabout.


One problem I see with these tests (0weight test I looked at) -- it 
doesn't have a client attempt to connect to the various servers as part 
of declaring success. Really we should only consider "success"ful test 
(ie assert that) if a client can connect to each server in the cluster 
and change/seechanges. As part of fixing this we really need to do a 
sanity check by testing the various command lines and checking that a 
client can connect.


I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new 
epoch seems to just thrash...


Also I tried 3 & 5 server quorums "by hand from the command line" with 0 
weight and they see similar issues to what Todd is seeing.


I'm using the latest code in mainline btw.

Patrick

Mahadev Konar wrote:

Hi todd,
  I see a lot of 


java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana

ger.java:324)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.

java:304)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender

.process(FastLeaderElection.java:317)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender

.run(FastLeaderElection.java:290)
at java.lang.Thread.run(Thread.java:619)


Is it possible that there is some firewall? Can all the servers 1-9 connect
to all the others using ports that you specified in zoo.cfg i.e 2888/3888?


Thanks
mahadev


On 8/4/09 4:56 PM, "Todd Greenwood"  wrote:


Looks like we're not getting *any* leader elected now Logs attached.


-Original Message-
From: Todd Greenwood [mailto:to...@audiencescience.com]
Sent: Tuesday, August 04, 2009 4:07 PM
To: zookeeper-dev@hadoop.apache.org
Subject: RE: Unending Leader Elections in WAN deploy

Patrick, thanks! I'll forward on to IT and I'll report back to you
shortly...


-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org]
Sent: Tuesday, August 04, 2009 3:55 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Todd, Mahadev and I looked at this and it turns out to be a

regression.

Ironically a patch I created for 3.2 branch to add quorum tests

actually

broke the quorum config -- a default value for a config parameter

was

lost. I'm going to submit a patch asap to get the default back, but

for

the time being you can set:

electionAlg=3

in each of your config files.

You should see reference to FastLeaderElection in your log files if

this

parameter is set correctly.

Sorry for the trouble,

Patrick

Todd Greenwood wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same

way

as

previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and

disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co


http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2

./branch-3.2

CHANGES.TXT show the various fixes included:



to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper

/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris

via

flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris

via

mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via

mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris

via

mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)

  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via

mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent

immediate

  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev

via

phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio

via

mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups

corre

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi todd,
  I see a lot of 

java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana
ger.java:324)
at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.
java:304)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender
.process(FastLeaderElection.java:317)
at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender
.run(FastLeaderElection.java:290)
at java.lang.Thread.run(Thread.java:619)


Is it possible that there is some firewall? Can all the servers 1-9 connect
to all the others using ports that you specified in zoo.cfg i.e 2888/3888?


Thanks
mahadev


On 8/4/09 4:56 PM, "Todd Greenwood"  wrote:

> Looks like we're not getting *any* leader elected now Logs attached.
> 
>> -Original Message-
>> From: Todd Greenwood [mailto:to...@audiencescience.com]
>> Sent: Tuesday, August 04, 2009 4:07 PM
>> To: zookeeper-dev@hadoop.apache.org
>> Subject: RE: Unending Leader Elections in WAN deploy
>> 
>> Patrick, thanks! I'll forward on to IT and I'll report back to you
>> shortly...
>> 
>>> -Original Message-
>>> From: Patrick Hunt [mailto:ph...@apache.org]
>>> Sent: Tuesday, August 04, 2009 3:55 PM
>>> To: zookeeper-dev@hadoop.apache.org
>>> Subject: Re: Unending Leader Elections in WAN deploy
>>> 
>>> Todd, Mahadev and I looked at this and it turns out to be a
>> regression.
>>> Ironically a patch I created for 3.2 branch to add quorum tests
>> actually
>>> broke the quorum config -- a default value for a config parameter
> was
>>> lost. I'm going to submit a patch asap to get the default back, but
>> for
>>> the time being you can set:
>>> 
>>> electionAlg=3
>>> 
>>> in each of your config files.
>>> 
>>> You should see reference to FastLeaderElection in your log files if
>> this
>>> parameter is set correctly.
>>> 
>>> Sorry for the trouble,
>>> 
>>> Patrick
>>> 
>>> Todd Greenwood wrote:
 Mahadev,
 
 I just heard from IT that this build behaves in exactly the same
> way
>> as
 previous versions, e.g. we get continuous leader elections that
 disconnect the followers and then get re-elected, and
>> disconnect...etc.
 
 This is from a fresh sync to the 3.2 branch:
 
 svn co
 
> http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
 ./branch-3.2
 
 CHANGES.TXT show the various fixes included:
 
 
>> 
> to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
 /src/original$ head -n 50 branch-3.2/CHANGES.txt
 Release 3.2.1
 
 Backward compatibile changes:
 
 BUGFIXES:
   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
>> via
 flavio)
 
   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
>> via
 mahadev)
 
   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
> mahadev)
 
   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris
> via
 mahadev)
 
   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
   (giri via mahadev)
 
   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
>> mahadev)
 
   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
>> immediate
   failure. (chris via mahadev)
 
   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
>> via
 phunt)
 
   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
 other)
   embedded clients (ryan rawson via phunt)
 
   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
>> via
 mahadev)
 
   ZOOKEEPER-479.  QuorumHierarchical does not count groups
> correctly
   (flavio via mahadev)
 
   ZOOKEEPER-466. crash on zookeeper_close() when using auth with
>> empty
 cert
   (Chris Darroch via phunt)
 
   ZOOKEEPER-480. FLE should perform leader check when node is not
 leading and
   add vote of follower (flavio via mahadev)
 
   ZOOKEEPER-491. Prevent zero-weight servers from being elected
>> (flavio
 via
   mahadev)
 
 What can I do to assist you with this issue?
 
 -Todd
 
> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Tuesday, August 04, 2009 12:43 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi todd,
>  comments in line
> 
> 
> On 8/4/09 12:38 PM, "Todd Greenwood" 
 wrote:
>> Mahadev,
>> 
>> Some quick questions:
>> 
>> 1. Version
>> 
>> I see that the CHANGES.txt calls this 3.2.1, but the bu

[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-484:


Status: Patch Available  (was: Open)

> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch, ZOOKEEPER-484.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-484:


Attachment: ZOOKEEPER-484.patch

this patch fixes the issue, assigning the right owner when a session moves from 
follower to the leader. Also, updated the tests to check for this. The tests 
fail without the patch.



> Clients get SESSION MOVED exception when switching from follower to a leader.
> -
>
> Key: ZOOKEEPER-484
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484
> Project: Zookeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: sessionTest.patch, ZOOKEEPER-484.patch
>
>
> When a client is connected to follower and get disconnected and connects to a 
> leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new 
> feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO 
> NOT have this problem. The fix is to make sure the ownership of a connection 
> gets changed when a session moves from follower to the leader. The workaround 
> to it in 3.2.0 would be to swithc off connection from clients to the leader. 
> take a look at *leaderServers* java property in 
> http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi Todd,
 Can you attach the files to the jira? I will takea look at this and will
get back to you by end of day today.

Thanks
mahadev


On 8/4/09 4:56 PM, "Todd Greenwood"  wrote:

> Looks like we're not getting *any* leader elected now Logs attached.
> 
>> -Original Message-
>> From: Todd Greenwood [mailto:to...@audiencescience.com]
>> Sent: Tuesday, August 04, 2009 4:07 PM
>> To: zookeeper-dev@hadoop.apache.org
>> Subject: RE: Unending Leader Elections in WAN deploy
>> 
>> Patrick, thanks! I'll forward on to IT and I'll report back to you
>> shortly...
>> 
>>> -Original Message-
>>> From: Patrick Hunt [mailto:ph...@apache.org]
>>> Sent: Tuesday, August 04, 2009 3:55 PM
>>> To: zookeeper-dev@hadoop.apache.org
>>> Subject: Re: Unending Leader Elections in WAN deploy
>>> 
>>> Todd, Mahadev and I looked at this and it turns out to be a
>> regression.
>>> Ironically a patch I created for 3.2 branch to add quorum tests
>> actually
>>> broke the quorum config -- a default value for a config parameter
> was
>>> lost. I'm going to submit a patch asap to get the default back, but
>> for
>>> the time being you can set:
>>> 
>>> electionAlg=3
>>> 
>>> in each of your config files.
>>> 
>>> You should see reference to FastLeaderElection in your log files if
>> this
>>> parameter is set correctly.
>>> 
>>> Sorry for the trouble,
>>> 
>>> Patrick
>>> 
>>> Todd Greenwood wrote:
 Mahadev,
 
 I just heard from IT that this build behaves in exactly the same
> way
>> as
 previous versions, e.g. we get continuous leader elections that
 disconnect the followers and then get re-elected, and
>> disconnect...etc.
 
 This is from a fresh sync to the 3.2 branch:
 
 svn co
 
> http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
 ./branch-3.2
 
 CHANGES.TXT show the various fixes included:
 
 
>> 
> to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
 /src/original$ head -n 50 branch-3.2/CHANGES.txt
 Release 3.2.1
 
 Backward compatibile changes:
 
 BUGFIXES:
   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
>> via
 flavio)
 
   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
>> via
 mahadev)
 
   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
> mahadev)
 
   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris
> via
 mahadev)
 
   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
   (giri via mahadev)
 
   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
>> mahadev)
 
   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
>> immediate
   failure. (chris via mahadev)
 
   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
>> via
 phunt)
 
   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
 other)
   embedded clients (ryan rawson via phunt)
 
   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
>> via
 mahadev)
 
   ZOOKEEPER-479.  QuorumHierarchical does not count groups
> correctly
   (flavio via mahadev)
 
   ZOOKEEPER-466. crash on zookeeper_close() when using auth with
>> empty
 cert
   (Chris Darroch via phunt)
 
   ZOOKEEPER-480. FLE should perform leader check when node is not
 leading and
   add vote of follower (flavio via mahadev)
 
   ZOOKEEPER-491. Prevent zero-weight servers from being elected
>> (flavio
 via
   mahadev)
 
 What can I do to assist you with this issue?
 
 -Todd
 
> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Tuesday, August 04, 2009 12:43 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi todd,
>  comments in line
> 
> 
> On 8/4/09 12:38 PM, "Todd Greenwood" 
 wrote:
>> Mahadev,
>> 
>> Some quick questions:
>> 
>> 1. Version
>> 
>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml
> is
 still
>> calling this 3.2.0. Should this be rev'd, and am I correct in
 calling
>> this release 3.2.1?
> Yes the release is 3.2.1. The build.xml will be fixed as soon as
> we
 tag
> the
> release.
> 
>> 2. Build targets
>> 
>> The package target fails b/c the create-cppunit-configure target
 fails
>> due to various problems w/ respect to autoconf. Are these
 dependencies
>> documented somewhere ? I'd like to have a fully building system.
>> 
>> create-cppunit-configure:
>>  [exec] Can't exec "libtoolize": No such file or directory
> at
>> /usr/bin/autoreconf line 188.
>>  [exec] Use of uninitialized value $libtoolize in pattern
>> match
>> (m//) at /usr/bin/autoreconf line 188.
>>   

RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Patrick, thanks! I'll forward on to IT and I'll report back to you
shortly...

> -Original Message-
> From: Patrick Hunt [mailto:ph...@apache.org]
> Sent: Tuesday, August 04, 2009 3:55 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Todd, Mahadev and I looked at this and it turns out to be a
regression.
> Ironically a patch I created for 3.2 branch to add quorum tests
actually
> broke the quorum config -- a default value for a config parameter was
> lost. I'm going to submit a patch asap to get the default back, but
for
> the time being you can set:
> 
> electionAlg=3
> 
> in each of your config files.
> 
> You should see reference to FastLeaderElection in your log files if
this
> parameter is set correctly.
> 
> Sorry for the trouble,
> 
> Patrick
> 
> Todd Greenwood wrote:
> > Mahadev,
> >
> > I just heard from IT that this build behaves in exactly the same way
as
> > previous versions, e.g. we get continuous leader elections that
> > disconnect the followers and then get re-elected, and
disconnect...etc.
> >
> > This is from a fresh sync to the 3.2 branch:
> >
> > svn co
> > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
> > ./branch-3.2
> >
> > CHANGES.TXT show the various fixes included:
> >
> >
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> > /src/original$ head -n 50 branch-3.2/CHANGES.txt
> > Release 3.2.1
> >
> > Backward compatibile changes:
> >
> > BUGFIXES:
> >   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
via
> > flavio)
> >
> >   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
via
> > mahadev)
> >
> >   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)
> >
> >   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
> > mahadev)
> >
> >   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
> >   (giri via mahadev)
> >
> >   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
mahadev)
> >
> >   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
immediate
> >   failure. (chris via mahadev)
> >
> >   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
via
> > phunt)
> >
> >   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
> > other)
> >   embedded clients (ryan rawson via phunt)
> >
> >   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
via
> > mahadev)
> >
> >   ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
> >   (flavio via mahadev)
> >
> >   ZOOKEEPER-466. crash on zookeeper_close() when using auth with
empty
> > cert
> >   (Chris Darroch via phunt)
> >
> >   ZOOKEEPER-480. FLE should perform leader check when node is not
> > leading and
> >   add vote of follower (flavio via mahadev)
> >
> >   ZOOKEEPER-491. Prevent zero-weight servers from being elected
(flavio
> > via
> >   mahadev)
> >
> > What can I do to assist you with this issue?
> >
> > -Todd
> >
> >> -Original Message-
> >> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >> Sent: Tuesday, August 04, 2009 12:43 PM
> >> To: zookeeper-dev@hadoop.apache.org
> >> Subject: Re: Unending Leader Elections in WAN deploy
> >>
> >> Hi todd,
> >>  comments in line
> >>
> >>
> >> On 8/4/09 12:38 PM, "Todd Greenwood" 
> > wrote:
> >>> Mahadev,
> >>>
> >>> Some quick questions:
> >>>
> >>> 1. Version
> >>>
> >>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
> > still
> >>> calling this 3.2.0. Should this be rev'd, and am I correct in
> > calling
> >>> this release 3.2.1?
> >> Yes the release is 3.2.1. The build.xml will be fixed as soon as we
> > tag
> >> the
> >> release.
> >>
> >>> 2. Build targets
> >>>
> >>> The package target fails b/c the create-cppunit-configure target
> > fails
> >>> due to various problems w/ respect to autoconf. Are these
> > dependencies
> >>> documented somewhere ? I'd like to have a fully building system.
> >>>
> >>> create-cppunit-configure:
> >>>  [exec] Can't exec "libtoolize": No such file or directory at
> >>> /usr/bin/autoreconf line 188.
> >>>  [exec] Use of uninitialized value $libtoolize in pattern
match
> >>> (m//) at /usr/bin/autoreconf line 188.
> >>>  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
> > found
> >>> in library
> >>>  [exec] configure.ac:33: error: possibly undefined macro:
> >>> AM_PATH_CPPUNIT
> >>>  [exec]   If this token and others are legitimate, please
> > use
> >>> m4_pattern_allow.
> >>>  [exec]   See the Autoconf documentation.
> >>>  [exec] configure.ac:53: error: possibly undefined macro:
> >>> AC_PROG_LIBTOOL
> >>>  [exec] autoreconf: /usr/bin/autoconf failed with exit status:
1
> >>>
> >> You need auto tools to run this. Please read the README for
building c
> >> client library at src/c/ for the installation requirements.
> >>> 3. Sync failure:
> >>>
> >>> This is still failing.
> >>>
> >>> svn: URL
> >>> 'http://svn.apache.org/r

[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-498:
---

Fix Version/s: 3.3.0
 Assignee: Patrick Hunt

> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Assignee: Patrick Hunt
>Priority: Critical
> Fix For: 3.2.1, 3.3.0
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt
Todd, Mahadev and I looked at this and it turns out to be a regression. 
Ironically a patch I created for 3.2 branch to add quorum tests actually 
broke the quorum config -- a default value for a config parameter was 
lost. I'm going to submit a patch asap to get the default back, but for 
the time being you can set:


electionAlg=3

in each of your config files.

You should see reference to FastLeaderElection in your log files if this 
parameter is set correctly.


Sorry for the trouble,

Patrick

Todd Greenwood wrote:

Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)


  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev) 


  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 12:43 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi todd,
 comments in line


On 8/4/09 12:38 PM, "Todd Greenwood" 

wrote:

Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is

still

calling this 3.2.0. Should this be rev'd, and am I correct in

calling

this release 3.2.1?

Yes the release is 3.2.1. The build.xml will be fixed as soon as we

tag

the
release.


2. Build targets

The package target fails b/c the create-cppunit-configure target

fails

due to various problems w/ respect to autoconf. Are these

dependencies

documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
 [exec] Can't exec "libtoolize": No such file or directory at
/usr/bin/autoreconf line 188.
 [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
 [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not

found

in library
 [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
 [exec]   If this token and others are legitimate, please

use

m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1


You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.

3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist


Yes this hasn't been fixed yet!

Thanks
mahadev

-Todd


-Original Message-
From: Todd Greenwood
Sent: Tuesday, August 04, 2009 11:26 AM
To: 'zookeeper-u...@hadoop.apache.org'
Subject: RE: Unending Leader Elections in WAN deploy

Great news. Thank you Mahadev. I'll report our findings later

today.

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 11:20 AM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
 I just committed 480 and 491. You can checkout the 3.2 branch

now.

Thanks
mahadev


On 8/3/09 4:29 PM, "Todd Greenwood" 

wrote:

That'd be perfect. Thanks!


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Monday, August 03, 2009 4:24 PM
To

[jira] Updated: (ZOOKEEPER-493) patch for command line setquota

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-493:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1, thanks Steve! Applied to 3.2.1 and 3.3

> patch for command line setquota 
> 
>
> Key: ZOOKEEPER-493
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.2.0
>Reporter: steve bendiola
>Assignee: steve bendiola
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: quotafix.patch, ZOOKEEPER-493.patch
>
>
> the command line "setquota" tries to use argument 3 as both a path and a value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-493) patch for command line setquota

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-493:
---

Attachment: ZOOKEEPER-493.patch

updated patch to cleanup a bit in addition to fix.

ZOOKEEPER-493.patch supersedes previous patch (fixed naming of patch file)

> patch for command line setquota 
> 
>
> Key: ZOOKEEPER-493
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493
> Project: Zookeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.2.0
>Reporter: steve bendiola
>Assignee: steve bendiola
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: quotafix.patch, ZOOKEEPER-493.patch
>
>
> the command line "setquota" tries to use argument 3 as both a path and a value

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-485:
---

Fix Version/s: (was: 3.2.1)

> need ops documentation that details supervision of ZK server processes
> --
>
> Key: ZOOKEEPER-485
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485
> Project: Zookeeper
>  Issue Type: Bug
>  Components: documentation, server
>Reporter: Patrick Hunt
> Fix For: 3.3.0
>
>
> We need ops documentation detailing what to do if the ZK server VM fails - by 
> fail I mean the jvm process
> exits/dies/crashes/etc...
> In general a supervisor process should be used to start/stop/restart/etc... 
> the ZK server vm.
> Something like daemontools http://cr.yp.to/daemontools.html could be used, or 
> more simply a wrapper script
> should monitor the status of the pid and restart if the jvm fails. It's up to 
> the operator, if this is not done
> automatically then it will have to be done manually, by operator restarting 
> the ZK server jvm
> The inherent behavior of ZK wrt to failures - ie that it automatically 
> recovers as long as quorum is maintained - 
> fits into this nicely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-490:
--

Assignee: Patrick Hunt

> the java docs for session creation are misleading/incomplete
> 
>
> Key: ZOOKEEPER-490
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Patrick Hunt
>Assignee: Patrick Hunt
> Fix For: 3.2.1, 3.3.0
>
>
> the javadoc for ZooKeeper constructor says:
>  * The client object will pick an arbitrary server and try to connect to 
> it.
>  * If failed, it will try the next one in the list, until a connection is
>  * established, or all the servers have been tried.
> the "or all server tried" phrase is misleading, it should indicate that we 
> retry until success, con closed, or session expired. 
> we also need ot mention that connection is async, that constructor returns 
> immed and you need to look for connection event in watcher

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line

2009-08-04 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-447:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1, thanks Henry! Committed to 3.2.1 and 3.3

> zkServer.sh doesn't allow different config files to be specified on the 
> command line
> 
>
> Key: ZOOKEEPER-447
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447
> Project: Zookeeper
>  Issue Type: Improvement
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Henry Robinson
>Assignee: Henry Robinson
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-447.patch
>
>
> Unless I'm missing something, you can change the directory that the zoo.cfg 
> file is in by setting ZOOCFGDIR but not the name of the file itself.
> I find it convenient myself to specify the config file on the command line, 
> but we should also let it be specified by environment variable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-04 Thread Todd Greenwood-Geer (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Greenwood-Geer updated ZOOKEEPER-498:
--

Attachment: zoo.cfg
pod-zook-logs-01.tar.gz
dc-zook-logs-01.tar.gz

Zookeeper Logs and configuration files:

dc1-zook01.log
dc1-zook02.log
dc1-zook03.log
dc1-zook04.log
dc1-zook05.log
pd1-zook01.log
pd1-zook02.log
pd4-zook01.log
pd4-zook02.log
zoo.cfg


> Unending Leader Elections : WAN configuration
> -
>
> Key: ZOOKEEPER-498
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.2.0
> Environment: Each machine:
> CentOS 5.2 64-bit
> 2GB ram
> java version "1.6.0_13"
> Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
> Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 
> Network Topology:
> DC : central data center
> POD(N): remote data center
> Zookeeper Topology:
> Leaders may be elected only in DC (weight = 1)
> Only followers are elected in PODS (weight = 0)
>Reporter: Todd Greenwood-Geer
>Priority: Critical
> Fix For: 3.2.1
>
> Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg
>
>
> In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
> re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
> central DC group of ZK servers that have a voting weight = 1, and a group of 
> servers in remote pods with a voting weight of 0.
> What we expect to see is leaders elected only in the DC, and the pods to 
> contain only followers. What we are seeing is a continuous cycling of 
> leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended 
> patches (473, 479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration

2009-08-04 Thread Todd Greenwood-Geer (JIRA)
Unending Leader Elections : WAN configuration
-

 Key: ZOOKEEPER-498
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498
 Project: Zookeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.2.0
 Environment: Each machine:

CentOS 5.2 64-bit
2GB ram
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed 

Network Topology:
DC : central data center
POD(N): remote data center

Zookeeper Topology:
Leaders may be elected only in DC (weight = 1)
Only followers are elected in PODS (weight = 0)
Reporter: Todd Greenwood-Geer
Priority: Critical
 Fix For: 3.2.1


In a WAN configuration, ZooKeeper is endlessly electing, terminating, and 
re-electing a ZooKeeper leader. The WAN configuration involves two groups, a 
central DC group of ZK servers that have a voting weight = 1, and a group of 
servers in remote pods with a voting weight of 0.

What we expect to see is leaders elected only in the DC, and the pods to 
contain only followers. What we are seeing is a continuous cycling of leaders. 
We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 
479, 481, 491), and now release 3.2.1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-497) api and forrest docs should mention if classes are thread safe

2009-08-04 Thread Patrick Hunt (JIRA)
api and forrest docs should mention if classes are thread safe
--

 Key: ZOOKEEPER-497
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-497
 Project: Zookeeper
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.2.0
Reporter: Patrick Hunt
Priority: Minor
 Fix For: 3.3.0


the api (c/java clients) and the forrest docs should talk about thread safety - 
in particular we don't
mention that ZooKeeper class is thread safe (etc...) Docs should be updated.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Will do.

> -Original Message-
> From: Patrick Hunt [mailto:ph...@apache.org]
> Sent: Tuesday, August 04, 2009 1:34 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> It would be better to create a JIRA with configs as well as logs.
> 
> Patrick
> 
> Mahadev Konar wrote:
> > Hi Todd,
> >
> >   What is the synclimit you are using? Can you post your config? For
> WAN's
> > you will have to use much bigger values for synclimit and others.
> >
> > Thanks
> > mahadev
> >
> >
> > On 8/4/09 1:24 PM, "Todd Greenwood" 
wrote:
> >
> >> Mahadev,
> >>
> >> I just heard from IT that this build behaves in exactly the same
way as
> >> previous versions, e.g. we get continuous leader elections that
> >> disconnect the followers and then get re-elected, and
disconnect...etc.
> >>
> >> This is from a fresh sync to the 3.2 branch:
> >>
> >> svn co
> >>
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
> >> ./branch-3.2
> >>
> >> CHANGES.TXT show the various fixes included:
> >>
> >>
>
to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> >> /src/original$ head -n 50 branch-3.2/CHANGES.txt
> >> Release 3.2.1
> >>
> >> Backward compatibile changes:
> >>
> >> BUGFIXES:
> >>   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris
via
> >> flavio)
> >>
> >>   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris
via
> >> mahadev)
> >>
> >>   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via
mahadev)
> >>
> >>   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
> >> mahadev)
> >>
> >>   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
> >>   (giri via mahadev)
> >>
> >>   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via
mahadev)
> >>
> >>   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent
immediate
> >>   failure. (chris via mahadev)
> >>
> >>   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev
via
> >> phunt)
> >>
> >>   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
> >> other)
> >>   embedded clients (ryan rawson via phunt)
> >>
> >>   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio
via
> >> mahadev)
> >>
> >>   ZOOKEEPER-479.  QuorumHierarchical does not count groups
correctly
> >>   (flavio via mahadev)
> >>
> >>   ZOOKEEPER-466. crash on zookeeper_close() when using auth with
empty
> >> cert
> >>   (Chris Darroch via phunt)
> >>
> >>   ZOOKEEPER-480. FLE should perform leader check when node is not
> >> leading and
> >>   add vote of follower (flavio via mahadev)
> >>
> >>   ZOOKEEPER-491. Prevent zero-weight servers from being elected
(flavio
> >> via
> >>   mahadev)
> >>
> >> What can I do to assist you with this issue?
> >>
> >> -Todd
> >>
> >>> -Original Message-
> >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>> Sent: Tuesday, August 04, 2009 12:43 PM
> >>> To: zookeeper-dev@hadoop.apache.org
> >>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>
> >>> Hi todd,
> >>>  comments in line
> >>>
> >>>
> >>> On 8/4/09 12:38 PM, "Todd Greenwood" 
> >> wrote:
>  Mahadev,
> 
>  Some quick questions:
> 
>  1. Version
> 
>  I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
> >> still
>  calling this 3.2.0. Should this be rev'd, and am I correct in
> >> calling
>  this release 3.2.1?
> >>> Yes the release is 3.2.1. The build.xml will be fixed as soon as
we
> >> tag
> >>> the
> >>> release.
> >>>
>  2. Build targets
> 
>  The package target fails b/c the create-cppunit-configure target
> >> fails
>  due to various problems w/ respect to autoconf. Are these
> >> dependencies
>  documented somewhere ? I'd like to have a fully building system.
> 
>  create-cppunit-configure:
>   [exec] Can't exec "libtoolize": No such file or directory at
>  /usr/bin/autoreconf line 188.
>   [exec] Use of uninitialized value $libtoolize in pattern
match
>  (m//) at /usr/bin/autoreconf line 188.
>   [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
> >> found
>  in library
>   [exec] configure.ac:33: error: possibly undefined macro:
>  AM_PATH_CPPUNIT
>   [exec]   If this token and others are legitimate, please
> >> use
>  m4_pattern_allow.
>   [exec]   See the Autoconf documentation.
>   [exec] configure.ac:53: error: possibly undefined macro:
>  AC_PROG_LIBTOOL
>   [exec] autoreconf: /usr/bin/autoconf failed with exit
status: 1
> 
> >>> You need auto tools to run this. Please read the README for
building c
> >>> client library at src/c/ for the installation requirements.
>  3. Sync failure:
> 
>  This is still failing.
> 
>  svn: URL
> 
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
>  doesn't exist
> 
> >>> Yes this hasn't been fixed yet!
> >>>
> >>> Thanks
>

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Patrick Hunt

It would be better to create a JIRA with configs as well as logs.

Patrick

Mahadev Konar wrote:

Hi Todd,

  What is the synclimit you are using? Can you post your config? For WAN's
you will have to use much bigger values for synclimit and others.

Thanks
mahadev


On 8/4/09 1:24 PM, "Todd Greenwood"  wrote:


Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)


  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev)

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 12:43 PM
To: zookeeper-dev@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi todd,
 comments in line


On 8/4/09 12:38 PM, "Todd Greenwood" 

wrote:

Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is

still

calling this 3.2.0. Should this be rev'd, and am I correct in

calling

this release 3.2.1?

Yes the release is 3.2.1. The build.xml will be fixed as soon as we

tag

the
release.


2. Build targets

The package target fails b/c the create-cppunit-configure target

fails

due to various problems w/ respect to autoconf. Are these

dependencies

documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
 [exec] Can't exec "libtoolize": No such file or directory at
/usr/bin/autoreconf line 188.
 [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
 [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not

found

in library
 [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
 [exec]   If this token and others are legitimate, please

use

m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1


You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.

3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist


Yes this hasn't been fixed yet!

Thanks
mahadev

-Todd


-Original Message-
From: Todd Greenwood
Sent: Tuesday, August 04, 2009 11:26 AM
To: 'zookeeper-u...@hadoop.apache.org'
Subject: RE: Unending Leader Elections in WAN deploy

Great news. Thank you Mahadev. I'll report our findings later

today.

-Todd


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Tuesday, August 04, 2009 11:20 AM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
 I just committed 480 and 491. You can checkout the 3.2 branch

now.

Thanks
mahadev


On 8/3/09 4:29 PM, "Todd Greenwood" 

wrote:

That'd be perfect. Thanks!


-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
Sent: Monday, August 03, 2009 4:24 PM
To: zookeeper-u...@hadoop.apache.org
Subject: Re: Unending Leader Elections in WAN deploy

Hi Todd,
  Most of the patches that you mention should be in the branch

3.2 by

tomm

or so. 481, 479 are already in. 480 and

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi Todd,

  What is the synclimit you are using? Can you post your config? For WAN's
you will have to use much bigger values for synclimit and others.

Thanks
mahadev


On 8/4/09 1:24 PM, "Todd Greenwood"  wrote:

> Mahadev,
> 
> I just heard from IT that this build behaves in exactly the same way as
> previous versions, e.g. we get continuous leader elections that
> disconnect the followers and then get re-elected, and disconnect...etc.
> 
> This is from a fresh sync to the 3.2 branch:
> 
> svn co
> http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
> ./branch-3.2
> 
> CHANGES.TXT show the various fixes included:
> 
> to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
> /src/original$ head -n 50 branch-3.2/CHANGES.txt
> Release 3.2.1
> 
> Backward compatibile changes:
> 
> BUGFIXES:
>   ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
> flavio)
> 
>   ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
> mahadev)
> 
>   ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)
> 
>   ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
> mahadev)
> 
>   ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
>   (giri via mahadev)
>   
>   ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)
> 
>   ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
>   failure. (chris via mahadev)
> 
>   ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
> phunt)
> 
>   ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
> other)
>   embedded clients (ryan rawson via phunt)
> 
>   ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
> mahadev)
> 
>   ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
>   (flavio via mahadev)
> 
>   ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
> cert
>   (Chris Darroch via phunt)
> 
>   ZOOKEEPER-480. FLE should perform leader check when node is not
> leading and
>   add vote of follower (flavio via mahadev)
> 
>   ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
> via
>   mahadev)
> 
> What can I do to assist you with this issue?
> 
> -Todd
> 
>> -Original Message-
>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
>> Sent: Tuesday, August 04, 2009 12:43 PM
>> To: zookeeper-dev@hadoop.apache.org
>> Subject: Re: Unending Leader Elections in WAN deploy
>> 
>> Hi todd,
>>  comments in line
>> 
>> 
>> On 8/4/09 12:38 PM, "Todd Greenwood" 
> wrote:
>> 
>>> Mahadev,
>>> 
>>> Some quick questions:
>>> 
>>> 1. Version
>>> 
>>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
> still
>>> calling this 3.2.0. Should this be rev'd, and am I correct in
> calling
>>> this release 3.2.1?
>> Yes the release is 3.2.1. The build.xml will be fixed as soon as we
> tag
>> the
>> release.
>> 
>>> 
>>> 2. Build targets
>>> 
>>> The package target fails b/c the create-cppunit-configure target
> fails
>>> due to various problems w/ respect to autoconf. Are these
> dependencies
>>> documented somewhere ? I'd like to have a fully building system.
>>> 
>>> create-cppunit-configure:
>>>  [exec] Can't exec "libtoolize": No such file or directory at
>>> /usr/bin/autoreconf line 188.
>>>  [exec] Use of uninitialized value $libtoolize in pattern match
>>> (m//) at /usr/bin/autoreconf line 188.
>>>  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
> found
>>> in library
>>>  [exec] configure.ac:33: error: possibly undefined macro:
>>> AM_PATH_CPPUNIT
>>>  [exec]   If this token and others are legitimate, please
> use
>>> m4_pattern_allow.
>>>  [exec]   See the Autoconf documentation.
>>>  [exec] configure.ac:53: error: possibly undefined macro:
>>> AC_PROG_LIBTOOL
>>>  [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
>>> 
>> You need auto tools to run this. Please read the README for building c
>> client library at src/c/ for the installation requirements.
>>> 
>>> 3. Sync failure:
>>> 
>>> This is still failing.
>>> 
>>> svn: URL
>>> 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
>>> doesn't exist
>>> 
>> 
>> Yes this hasn't been fixed yet!
>> 
>> Thanks
>> mahadev
>>> -Todd
>>> 
 -Original Message-
 From: Todd Greenwood
 Sent: Tuesday, August 04, 2009 11:26 AM
 To: 'zookeeper-u...@hadoop.apache.org'
 Subject: RE: Unending Leader Elections in WAN deploy
 
 Great news. Thank you Mahadev. I'll report our findings later
> today.
 -Todd
 
> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Tuesday, August 04, 2009 11:20 AM
> To: zookeeper-u...@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi Todd,
>  I just committed 480 and 491. You can checkout the 3.2 branch
> now.
> 
> Thanks
> mahadev
> 
> 
> On 8/

RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Mahadev,

I just heard from IT that this build behaves in exactly the same way as
previous versions, e.g. we get continuous leader elections that
disconnect the followers and then get re-elected, and disconnect...etc.

This is from a fresh sync to the 3.2 branch:

svn co
http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
./branch-3.2

CHANGES.TXT show the various fixes included:

to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
/src/original$ head -n 50 branch-3.2/CHANGES.txt
Release 3.2.1

Backward compatibile changes:

BUGFIXES:
  ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via
flavio)

  ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via
mahadev)

  ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev)

  ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via
mahadev)

  ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure)
  (giri via mahadev)
  
  ZOOKEEPER-467.  Change log level in BookieHandle (flavio via mahadev)

  ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate
  failure. (chris via mahadev) 

  ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via
phunt)

  ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and
other)
  embedded clients (ryan rawson via phunt)

  ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via
mahadev)

  ZOOKEEPER-479.  QuorumHierarchical does not count groups correctly
  (flavio via mahadev)

  ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty
cert
  (Chris Darroch via phunt)

  ZOOKEEPER-480. FLE should perform leader check when node is not
leading and
  add vote of follower (flavio via mahadev)

  ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio
via
  mahadev)

What can I do to assist you with this issue?

-Todd

> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Tuesday, August 04, 2009 12:43 PM
> To: zookeeper-dev@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi todd,
>  comments in line
> 
> 
> On 8/4/09 12:38 PM, "Todd Greenwood" 
wrote:
> 
> > Mahadev,
> >
> > Some quick questions:
> >
> > 1. Version
> >
> > I see that the CHANGES.txt calls this 3.2.1, but the build.xml is
still
> > calling this 3.2.0. Should this be rev'd, and am I correct in
calling
> > this release 3.2.1?
> Yes the release is 3.2.1. The build.xml will be fixed as soon as we
tag
> the
> release.
> 
> >
> > 2. Build targets
> >
> > The package target fails b/c the create-cppunit-configure target
fails
> > due to various problems w/ respect to autoconf. Are these
dependencies
> > documented somewhere ? I'd like to have a fully building system.
> >
> > create-cppunit-configure:
> >  [exec] Can't exec "libtoolize": No such file or directory at
> > /usr/bin/autoreconf line 188.
> >  [exec] Use of uninitialized value $libtoolize in pattern match
> > (m//) at /usr/bin/autoreconf line 188.
> >  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not
found
> > in library
> >  [exec] configure.ac:33: error: possibly undefined macro:
> > AM_PATH_CPPUNIT
> >  [exec]   If this token and others are legitimate, please
use
> > m4_pattern_allow.
> >  [exec]   See the Autoconf documentation.
> >  [exec] configure.ac:53: error: possibly undefined macro:
> > AC_PROG_LIBTOOL
> >  [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
> >
> You need auto tools to run this. Please read the README for building c
> client library at src/c/ for the installation requirements.
> >
> > 3. Sync failure:
> >
> > This is still failing.
> >
> > svn: URL
> > 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
> > doesn't exist
> >
> 
> Yes this hasn't been fixed yet!
> 
> Thanks
> mahadev
> > -Todd
> >
> >> -Original Message-
> >> From: Todd Greenwood
> >> Sent: Tuesday, August 04, 2009 11:26 AM
> >> To: 'zookeeper-u...@hadoop.apache.org'
> >> Subject: RE: Unending Leader Elections in WAN deploy
> >>
> >> Great news. Thank you Mahadev. I'll report our findings later
today.
> >> -Todd
> >>
> >>> -Original Message-
> >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> >>> Sent: Tuesday, August 04, 2009 11:20 AM
> >>> To: zookeeper-u...@hadoop.apache.org
> >>> Subject: Re: Unending Leader Elections in WAN deploy
> >>>
> >>> Hi Todd,
> >>>  I just committed 480 and 491. You can checkout the 3.2 branch
now.
> >>>
> >>> Thanks
> >>> mahadev
> >>>
> >>>
> >>> On 8/3/09 4:29 PM, "Todd Greenwood" 
> > wrote:
> >>>
>  That'd be perfect. Thanks!
> 
> > -Original Message-
> > From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> > Sent: Monday, August 03, 2009 4:24 PM
> > To: zookeeper-u...@hadoop.apache.org
> > Subject: Re: Unending Leader Elections in WAN deploy
> >
> > Hi Todd,
> >   Most of the patches that you mention shou

Re: Unending Leader Elections in WAN deploy

2009-08-04 Thread Mahadev Konar
Hi todd, 
 comments in line


On 8/4/09 12:38 PM, "Todd Greenwood"  wrote:

> Mahadev,
> 
> Some quick questions:
> 
> 1. Version
> 
> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still
> calling this 3.2.0. Should this be rev'd, and am I correct in calling
> this release 3.2.1?
Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the
release.

> 
> 2. Build targets
> 
> The package target fails b/c the create-cppunit-configure target fails
> due to various problems w/ respect to autoconf. Are these dependencies
> documented somewhere ? I'd like to have a fully building system.
> 
> create-cppunit-configure:
>  [exec] Can't exec "libtoolize": No such file or directory at
> /usr/bin/autoreconf line 188.
>  [exec] Use of uninitialized value $libtoolize in pattern match
> (m//) at /usr/bin/autoreconf line 188.
>  [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found
> in library
>  [exec] configure.ac:33: error: possibly undefined macro:
> AM_PATH_CPPUNIT
>  [exec]   If this token and others are legitimate, please use
> m4_pattern_allow.
>  [exec]   See the Autoconf documentation.
>  [exec] configure.ac:53: error: possibly undefined macro:
> AC_PROG_LIBTOOL
>  [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1
> 
You need auto tools to run this. Please read the README for building c
client library at src/c/ for the installation requirements.
> 
> 3. Sync failure:
> 
> This is still failing.
> 
> svn: URL
> 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
> doesn't exist
> 

Yes this hasn't been fixed yet!

Thanks
mahadev
> -Todd
> 
>> -Original Message-
>> From: Todd Greenwood
>> Sent: Tuesday, August 04, 2009 11:26 AM
>> To: 'zookeeper-u...@hadoop.apache.org'
>> Subject: RE: Unending Leader Elections in WAN deploy
>> 
>> Great news. Thank you Mahadev. I'll report our findings later today.
>> -Todd
>> 
>>> -Original Message-
>>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
>>> Sent: Tuesday, August 04, 2009 11:20 AM
>>> To: zookeeper-u...@hadoop.apache.org
>>> Subject: Re: Unending Leader Elections in WAN deploy
>>> 
>>> Hi Todd,
>>>  I just committed 480 and 491. You can checkout the 3.2 branch now.
>>> 
>>> Thanks
>>> mahadev
>>> 
>>> 
>>> On 8/3/09 4:29 PM, "Todd Greenwood" 
> wrote:
>>> 
 That'd be perfect. Thanks!
 
> -Original Message-
> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> Sent: Monday, August 03, 2009 4:24 PM
> To: zookeeper-u...@hadoop.apache.org
> Subject: Re: Unending Leader Elections in WAN deploy
> 
> Hi Todd,
>   Most of the patches that you mention should be in the branch
> 3.2 by
 tomm
> or so. 481, 479 are already in. 480 and 491 should be in by tomm.
 Would
> that
> suffice for you?
> 
> Thanks
> mahadev
> 
> 
> On 8/3/09 4:21 PM, "Todd Greenwood" 
>> wrote:
> 
>> Another problem...I've reverted to the latest versions of the
 patches
>> that are not specific to branch-3.2, and I'm getting two
> compilation
>> errors:
>> 
>> build-generated:
>> [javac] Compiling 44 source files to
>> 
 
>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>> atched/branch-3.2/build/classes
>> 
>> compile-main:
>> [javac] Compiling 2 source files to
>> 
 
>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>> atched/branch-3.2/build/classes
>> [javac]
>> 
 
>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>> 
 atched/branch-
>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
>> mStats.java:30: name clash: getQuorumPeers() and
> getQuorumPeers()
 have
>> the same erasure
>> [javac] public String[] getQuorumPeers();
>> [javac] ^
>> [javac]
>> 
 
>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>> 
 atched/branch-
>> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
>> mStats.java:31: name clash: getServerState() and
> getServerState()
 have
>> the same erasure
>> [javac] public String getServerState();
>> [javac]   ^
>> [javac] 2 errors
>> 
>> My build process is pretty simple:
>> 
>> 1. copy the branch-3.2 source to a temp directory
>> (src/patched/branch-3.2)
>> 2. apply the ZOOKEEPER patches in my patches directory
>> 3. build zookeeper in the temp directory
>> 
>> -Todd
>>> -Original Message-
>>> From: Todd Greenwood [mailto:to...@audiencescience.com]
>>> Sent: Monday, August 03, 2009 4:09 PM
>>> To: zookeeper-u...@hadoop.apache.org
>>> Subject: RE: Unending Leader Elections in WAN deploy
>>> 
>>> Flavio,
>>> I

RE: Unending Leader Elections in WAN deploy

2009-08-04 Thread Todd Greenwood
Mahadev,

Some quick questions:

1. Version

I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still
calling this 3.2.0. Should this be rev'd, and am I correct in calling
this release 3.2.1? 

2. Build targets

The package target fails b/c the create-cppunit-configure target fails
due to various problems w/ respect to autoconf. Are these dependencies
documented somewhere ? I'd like to have a fully building system.

create-cppunit-configure:
 [exec] Can't exec "libtoolize": No such file or directory at
/usr/bin/autoreconf line 188.
 [exec] Use of uninitialized value $libtoolize in pattern match
(m//) at /usr/bin/autoreconf line 188.
 [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found
in library
 [exec] configure.ac:33: error: possibly undefined macro:
AM_PATH_CPPUNIT
 [exec]   If this token and others are legitimate, please use
m4_pattern_allow.
 [exec]   See the Autoconf documentation.
 [exec] configure.ac:53: error: possibly undefined macro:
AC_PROG_LIBTOOL
 [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1


3. Sync failure:

This is still failing.

svn: URL
'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
doesn't exist

-Todd

> -Original Message-
> From: Todd Greenwood
> Sent: Tuesday, August 04, 2009 11:26 AM
> To: 'zookeeper-u...@hadoop.apache.org'
> Subject: RE: Unending Leader Elections in WAN deploy
> 
> Great news. Thank you Mahadev. I'll report our findings later today.
> -Todd
> 
> > -Original Message-
> > From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> > Sent: Tuesday, August 04, 2009 11:20 AM
> > To: zookeeper-u...@hadoop.apache.org
> > Subject: Re: Unending Leader Elections in WAN deploy
> >
> > Hi Todd,
> >  I just committed 480 and 491. You can checkout the 3.2 branch now.
> >
> > Thanks
> > mahadev
> >
> >
> > On 8/3/09 4:29 PM, "Todd Greenwood" 
wrote:
> >
> > > That'd be perfect. Thanks!
> > >
> > >> -Original Message-
> > >> From: Mahadev Konar [mailto:maha...@yahoo-inc.com]
> > >> Sent: Monday, August 03, 2009 4:24 PM
> > >> To: zookeeper-u...@hadoop.apache.org
> > >> Subject: Re: Unending Leader Elections in WAN deploy
> > >>
> > >> Hi Todd,
> > >>   Most of the patches that you mention should be in the branch
3.2 by
> > > tomm
> > >> or so. 481, 479 are already in. 480 and 491 should be in by tomm.
> > > Would
> > >> that
> > >> suffice for you?
> > >>
> > >> Thanks
> > >> mahadev
> > >>
> > >>
> > >> On 8/3/09 4:21 PM, "Todd Greenwood" 
> wrote:
> > >>
> > >>> Another problem...I've reverted to the latest versions of the
> > > patches
> > >>> that are not specific to branch-3.2, and I'm getting two
compilation
> > >>> errors:
> > >>>
> > >>> build-generated:
> > >>> [javac] Compiling 44 source files to
> > >>>
> > >
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> > >>> atched/branch-3.2/build/classes
> > >>>
> > >>> compile-main:
> > >>> [javac] Compiling 2 source files to
> > >>>
> > >
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> > >>> atched/branch-3.2/build/classes
> > >>> [javac]
> > >>>
> > >
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> > >>>
> > > atched/branch-
> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
> > >>> mStats.java:30: name clash: getQuorumPeers() and
getQuorumPeers()
> > > have
> > >>> the same erasure
> > >>> [javac] public String[] getQuorumPeers();
> > >>> [javac] ^
> > >>> [javac]
> > >>>
> > >
>
/home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
> > >>>
> > > atched/branch-
> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
> > >>> mStats.java:31: name clash: getServerState() and
getServerState()
> > > have
> > >>> the same erasure
> > >>> [javac] public String getServerState();
> > >>> [javac]   ^
> > >>> [javac] 2 errors
> > >>>
> > >>> My build process is pretty simple:
> > >>>
> > >>> 1. copy the branch-3.2 source to a temp directory
> > >>> (src/patched/branch-3.2)
> > >>> 2. apply the ZOOKEEPER patches in my patches directory
> > >>> 3. build zookeeper in the temp directory
> > >>>
> > >>> -Todd
> >  -Original Message-
> >  From: Todd Greenwood [mailto:to...@audiencescience.com]
> >  Sent: Monday, August 03, 2009 4:09 PM
> >  To: zookeeper-u...@hadoop.apache.org
> >  Subject: RE: Unending Leader Elections in WAN deploy
> > 
> >  Flavio,
> >  I notice that you've updated the patches referenced for the WAN
> >  deployment. There appears to be an order dependency w/ respect
to
> > >>> these
> >  four patches...
> > 
> >  ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
> >  ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
> > 
> >  473 -> 479 (479 fails)
> > 
> > 
> > >>>
> > >
>
to...@toddg01lt:~/asi/workspaces/ma

[jira] Resolved: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-475.
-

Resolution: Fixed

given ZOOKEEPER-479, ZOOKEEPER-480, ZOOKEEPER-481 have been fixed, this should 
be fixed.

> FLENewEpochTest failed on nightly builds.
> -
>
> Key: ZOOKEEPER-475
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.2.0
>Reporter: Mahadev konar
>Assignee: Flavio Paiva Junqueira
>Priority: Blocker
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-475.patch, ZOOKEEPER-475.patch
>
>
> THe flenewepochtest failed on one of the nightly builds -
> http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-368) Observers

2009-08-04 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated ZOOKEEPER-368:
-

Attachment: observers.patch
obs-refactor.patch

Here is both a slightly modified version of the refactor patch, and a patch 
containing the new code for Observers. I have included some tests now as well. 
The Observer implementation is simplified from previous patches. 

I have added new methods to QuorumPeer to get at both the entire view of the 
ensemble, the voting view (containing Followers) and the observing view. 

To use an Observer, in the ensemble config file append :observer to the 
description for any server you want to be an Observer. So for example write:

server.3:localhost:2181:3181:observer

In the Observer's own config file, add a line with the option

peerType=observer

I will probably in the future remove these slightly redundant specifications, 
but for now you will need both. 

You must apply the patches in order; the refactor patch first. Both patches 
apply cleanly for me using patch -p0 against a clean checkout of trunk as of 
tonight (Aug 4th).



> Observers
> -
>
> Key: ZOOKEEPER-368
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Henry Robinson
> Attachments: obs-refactor.patch, observer-refactor.patch, 
> observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, 
> ZOOKEEPER-368.patch
>
>
> Currently, all servers of an ensemble participate actively in reaching 
> agreement on the order of ZooKeeper transactions. That is, all followers 
> receive proposals, acknowledge them, and receive commit messages from the 
> leader. A leader issues commit messages once it receives acknowledgments from 
> a quorum of followers. For cross-colo operation, it would be useful to have a 
> third role: observer. Using Paxos terminology, observers are similar to 
> learners. An observer does not participate actively in the agreement step of 
> the atomic broadcast protocol. Instead, it only commits proposals that have 
> been accepted by some quorum of followers.
> One simple solution to implement observers is to have the leader forwarding 
> commit messages not only to followers but also to observers, and have 
> observers applying transactions according to the order followers agreed upon. 
> In the current implementation of the protocol, however, commit messages do 
> not carry their corresponding transaction payload because all servers 
> different from the leader are followers and followers receive such a payload 
> first through a proposal message. Just forwarding commit messages as they 
> currently are to an observer consequently is not sufficient. We have a couple 
> of options:
> 1- Include the transaction payload along in commit messages to observers;
> 2- Send proposals to observers as well.
> Number 2 is simpler to implement because it doesn't require changing the 
> protocol implementation, but it increases traffic slightly. The performance 
> impact due to such an increase might be insignificant, though.
> For scalability purposes, we may consider having followers also forwarding 
> commit messages to observers. With this option, observers can connect to 
> followers, and receive messages from followers. This choice is important to 
> avoid increasing the load on the leader with the number of observers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-491) Prevent zero-weight servers from being elected

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated ZOOKEEPER-491:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

+1 for the patch. I just committed this. thanks flavio!

> Prevent zero-weight servers from being elected
> --
>
> Key: ZOOKEEPER-491
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491
> Project: Zookeeper
>  Issue Type: New Feature
>  Components: leaderElection
>Affects Versions: 3.2.0
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch
>
>
> This is a fix to prevent zero-weight servers from being elected leaders. This 
> will allow in wide-area scenarios to restrict the set of servers that can 
> lead the ensemble.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower

2009-08-04 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar resolved ZOOKEEPER-480.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

I just committed this. thanks flavio.

> FLE should perform leader check when node is not leading and add vote of 
> follower
> -
>
> Key: ZOOKEEPER-480
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480
> Project: Zookeeper
>  Issue Type: Bug
>Affects Versions: 3.2.0
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-480-3.2branch.patch, 
> ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, 
> ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch
>
>
> As a server may join leader election while others have already elected a 
> leader, it is necessary that a server handles some special cases of leader 
> election when notifications are from servers that are either LEADING or 
> FOLLOWING. In such special cases, we check if we have received a message from 
> the leader to declare a leader elected. This check does not consider the case 
> that the process performing the check might be a recently elected leader, and 
> consequently the check fails.
> This patch also adds a new case, which corresponds to adding a vote to 
> recvset when the notification is from a process LEADING or FOLLOWING. This 
> fixes the case raised in ZOOKEEPER-475.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager

2009-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738964#action_12738964
 ] 

Hudson commented on ZOOKEEPER-481:
--

Integrated in ZooKeeper-trunk #404 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/])
. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev)


> Add lastMessageSent to QuorumCnxManager
> ---
>
> Key: ZOOKEEPER-481
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481
> Project: Zookeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.1.1, 3.2.0
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-481-branch3.2.patch, 
> ZOOKEEPER-481-branch3.2.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, 
> ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch
>
>
> Currently we rely on TCP for reliable delivery of FLE messages. However, as 
> we concurrently drop and create new connections, it is possible that a 
> message is sent but never received. With this patch, cnx manager keeps a list 
> of last messages sent, and resends the last one sent. Receiving multiples 
> copies is harmless. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-479) QuorumHierarchical does not count groups correctly

2009-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738966#action_12738966
 ] 

Hudson commented on ZOOKEEPER-479:
--

Integrated in ZooKeeper-trunk #404 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/])
.  QuorumHierarchical does not count groups correctly (flavio via mahadev)


> QuorumHierarchical does not count groups correctly
> --
>
> Key: ZOOKEEPER-479
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-479
> Project: Zookeeper
>  Issue Type: Bug
>  Components: quorum
>Reporter: Flavio Paiva Junqueira
>Assignee: Flavio Paiva Junqueira
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-479-branch3.2.patch, ZOOKEEPER-479.patch, 
> ZOOKEEPER-479.patch, ZOOKEEPER-479.patch
>
>
> QuorumHierarchical::containsQuorum should not verify if all groups 
> represented in the input set have more than half of the total weight. 
> Instead, it should check only for an overall majority of groups. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert

2009-08-04 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738965#action_12738965
 ] 

Hudson commented on ZOOKEEPER-466:
--

Integrated in ZooKeeper-trunk #404 (See 
[http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/])
. crash on zookeeper_close() when using auth with empty cert


> crash on zookeeper_close() when using auth with empty cert
> --
>
> Key: ZOOKEEPER-466
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466
> Project: Zookeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.2.0
>Reporter: Chris Darroch
>Assignee: Chris Darroch
> Fix For: 3.2.1, 3.3.0
>
> Attachments: ZOOKEEPER-466.patch
>
>
> The free_auth_info() function calls deallocate_Buffer(&auth->auth) on every 
> element in the auth list; that function frees any memory pointed to by 
> auth->auth.buff if that field is non-NULL.
> In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set 
> to 0, but then not assigned to authinfo->auth when auth.buff is NULL.  The 
> result is uninitialized data in auth->auth.buff in free_auth_info(), and 
> potential crashes.
> The attached patch adds a test which attempts to duplicate this error; it 
> works for me but may not always on all systems as it depends on the 
> uninitialized data being non-zero; there's not really a simple way I can see 
> to trigger this in the current test framework.  The patch also fixes the 
> problem, I believe.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.