Re: Unending Leader Elections in WAN deploy
(I see the same error in fle0weighttest using latest 3.2 btw) Patrick Hunt wrote: Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: - Standard Error - Exception in thread "Thread-108" junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) - --- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider "success"ful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 & 5 server quorums "by hand from the command line" with 0 weight and they see similar issues to what Todd is seeing. I'm using the latest code in mainline btw. Patrick Mahadev Konar wrote: Hi todd, I see a lot of java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana ger.java:324) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. java:304) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .process(FastLeaderElection.java:317) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .run(FastLeaderElection.java:290) at java.lang.Thread.run(Thread.java:619) Is it possible that there is some firewall? Can all the servers 1-9 connect to all the others using ports that you specified in zoo.cfg i.e 2888/3888? Thanks mahadev On 8/4/09 4:56 PM, "Todd Greenwood" wrote: Looks like we're not getting *any* leader elected now Logs attached. -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Tuesday, August 04, 2009 4:07 PM To: zookeeper-dev@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 3:55 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxMan
Re: Unending Leader Elections in WAN deploy
Mahadev/Flavio -- looks like 0 weight is still busted, fle0weighttest is actually failing on my machine, however it's reported as success: - Standard Error - Exception in thread "Thread-108" junit.framework.AssertionFailedError: Elected zero-weight server at junit.framework.Assert.fail(Assert.java:47) at org.apache.zookeeper.test.FLEZeroWeightTest$LEThread.run(FLEZeroWeightTest.java:138) - --- this is probably due because the test is calling assert in a thread other than the main test thread - which junit will not track/knowabout. One problem I see with these tests (0weight test I looked at) -- it doesn't have a client attempt to connect to the various servers as part of declaring success. Really we should only consider "success"ful test (ie assert that) if a client can connect to each server in the cluster and change/seechanges. As part of fixing this we really need to do a sanity check by testing the various command lines and checking that a client can connect. I'm not even sure FLEnewepochtest/fletest/etc... are passing either. new epoch seems to just thrash... Also I tried 3 & 5 server quorums "by hand from the command line" with 0 weight and they see similar issues to what Todd is seeing. I'm using the latest code in mainline btw. Patrick Mahadev Konar wrote: Hi todd, I see a lot of java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana ger.java:324) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. java:304) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .process(FastLeaderElection.java:317) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .run(FastLeaderElection.java:290) at java.lang.Thread.run(Thread.java:619) Is it possible that there is some firewall? Can all the servers 1-9 connect to all the others using ports that you specified in zoo.cfg i.e 2888/3888? Thanks mahadev On 8/4/09 4:56 PM, "Todd Greenwood" wrote: Looks like we're not getting *any* leader elected now Logs attached. -Original Message- From: Todd Greenwood [mailto:to...@audiencescience.com] Sent: Tuesday, August 04, 2009 4:07 PM To: zookeeper-dev@hadoop.apache.org Subject: RE: Unending Leader Elections in WAN deploy Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Tuesday, August 04, 2009 3:55 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups corre
Re: Unending Leader Elections in WAN deploy
Hi todd, I see a lot of java.net.ConnectException: Connection refused at sun.nio.ch.Net.connect(Native Method) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507) at java.nio.channels.SocketChannel.open(SocketChannel.java:146) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxMana ger.java:324) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager. java:304) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .process(FastLeaderElection.java:317) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender .run(FastLeaderElection.java:290) at java.lang.Thread.run(Thread.java:619) Is it possible that there is some firewall? Can all the servers 1-9 connect to all the others using ports that you specified in zoo.cfg i.e 2888/3888? Thanks mahadev On 8/4/09 4:56 PM, "Todd Greenwood" wrote: > Looks like we're not getting *any* leader elected now Logs attached. > >> -Original Message- >> From: Todd Greenwood [mailto:to...@audiencescience.com] >> Sent: Tuesday, August 04, 2009 4:07 PM >> To: zookeeper-dev@hadoop.apache.org >> Subject: RE: Unending Leader Elections in WAN deploy >> >> Patrick, thanks! I'll forward on to IT and I'll report back to you >> shortly... >> >>> -Original Message- >>> From: Patrick Hunt [mailto:ph...@apache.org] >>> Sent: Tuesday, August 04, 2009 3:55 PM >>> To: zookeeper-dev@hadoop.apache.org >>> Subject: Re: Unending Leader Elections in WAN deploy >>> >>> Todd, Mahadev and I looked at this and it turns out to be a >> regression. >>> Ironically a patch I created for 3.2 branch to add quorum tests >> actually >>> broke the quorum config -- a default value for a config parameter > was >>> lost. I'm going to submit a patch asap to get the default back, but >> for >>> the time being you can set: >>> >>> electionAlg=3 >>> >>> in each of your config files. >>> >>> You should see reference to FastLeaderElection in your log files if >> this >>> parameter is set correctly. >>> >>> Sorry for the trouble, >>> >>> Patrick >>> >>> Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same > way >> as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and >> disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: >> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris >> via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris >> via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via > mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris > via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via >> mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent >> immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev >> via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio >> via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups > correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with >> empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected >> (flavio via mahadev) What can I do to assist you with this issue? -Todd > -Original Message- > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > Sent: Tuesday, August 04, 2009 12:43 PM > To: zookeeper-dev@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > Hi todd, > comments in line > > > On 8/4/09 12:38 PM, "Todd Greenwood" wrote: >> Mahadev, >> >> Some quick questions: >> >> 1. Version >> >> I see that the CHANGES.txt calls this 3.2.1, but the bu
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-484: Status: Patch Available (was: Open) > Clients get SESSION MOVED exception when switching from follower to a leader. > - > > Key: ZOOKEEPER-484 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Mahadev konar >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > Attachments: sessionTest.patch, ZOOKEEPER-484.patch > > > When a client is connected to follower and get disconnected and connects to a > leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new > feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO > NOT have this problem. The fix is to make sure the ownership of a connection > gets changed when a session moves from follower to the leader. The workaround > to it in 3.2.0 would be to swithc off connection from clients to the leader. > take a look at *leaderServers* java property in > http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-484) Clients get SESSION MOVED exception when switching from follower to a leader.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-484: Attachment: ZOOKEEPER-484.patch this patch fixes the issue, assigning the right owner when a session moves from follower to the leader. Also, updated the tests to check for this. The tests fail without the patch. > Clients get SESSION MOVED exception when switching from follower to a leader. > - > > Key: ZOOKEEPER-484 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-484 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Mahadev konar >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > Attachments: sessionTest.patch, ZOOKEEPER-484.patch > > > When a client is connected to follower and get disconnected and connects to a > leader it gets SESSION MOVED excpetion. This is beacuse of a bug in the new > feature of ZOOKEEPER-417 that we added in 3.2. All the releases before 3.2 DO > NOT have this problem. The fix is to make sure the ownership of a connection > gets changed when a session moves from follower to the leader. The workaround > to it in 3.2.0 would be to swithc off connection from clients to the leader. > take a look at *leaderServers* java property in > http://hadoop.apache.org/zookeeper/docs/r3.2.0/zookeeperAdmin.html. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Unending Leader Elections in WAN deploy
Hi Todd, Can you attach the files to the jira? I will takea look at this and will get back to you by end of day today. Thanks mahadev On 8/4/09 4:56 PM, "Todd Greenwood" wrote: > Looks like we're not getting *any* leader elected now Logs attached. > >> -Original Message- >> From: Todd Greenwood [mailto:to...@audiencescience.com] >> Sent: Tuesday, August 04, 2009 4:07 PM >> To: zookeeper-dev@hadoop.apache.org >> Subject: RE: Unending Leader Elections in WAN deploy >> >> Patrick, thanks! I'll forward on to IT and I'll report back to you >> shortly... >> >>> -Original Message- >>> From: Patrick Hunt [mailto:ph...@apache.org] >>> Sent: Tuesday, August 04, 2009 3:55 PM >>> To: zookeeper-dev@hadoop.apache.org >>> Subject: Re: Unending Leader Elections in WAN deploy >>> >>> Todd, Mahadev and I looked at this and it turns out to be a >> regression. >>> Ironically a patch I created for 3.2 branch to add quorum tests >> actually >>> broke the quorum config -- a default value for a config parameter > was >>> lost. I'm going to submit a patch asap to get the default back, but >> for >>> the time being you can set: >>> >>> electionAlg=3 >>> >>> in each of your config files. >>> >>> You should see reference to FastLeaderElection in your log files if >> this >>> parameter is set correctly. >>> >>> Sorry for the trouble, >>> >>> Patrick >>> >>> Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same > way >> as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and >> disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: >> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris >> via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris >> via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via > mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris > via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via >> mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent >> immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev >> via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio >> via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups > correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with >> empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected >> (flavio via mahadev) What can I do to assist you with this issue? -Todd > -Original Message- > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > Sent: Tuesday, August 04, 2009 12:43 PM > To: zookeeper-dev@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > Hi todd, > comments in line > > > On 8/4/09 12:38 PM, "Todd Greenwood" wrote: >> Mahadev, >> >> Some quick questions: >> >> 1. Version >> >> I see that the CHANGES.txt calls this 3.2.1, but the build.xml > is still >> calling this 3.2.0. Should this be rev'd, and am I correct in calling >> this release 3.2.1? > Yes the release is 3.2.1. The build.xml will be fixed as soon as > we tag > the > release. > >> 2. Build targets >> >> The package target fails b/c the create-cppunit-configure target fails >> due to various problems w/ respect to autoconf. Are these dependencies >> documented somewhere ? I'd like to have a fully building system. >> >> create-cppunit-configure: >> [exec] Can't exec "libtoolize": No such file or directory > at >> /usr/bin/autoreconf line 188. >> [exec] Use of uninitialized value $libtoolize in pattern >> match >> (m//) at /usr/bin/autoreconf line 188. >>
RE: Unending Leader Elections in WAN deploy
Patrick, thanks! I'll forward on to IT and I'll report back to you shortly... > -Original Message- > From: Patrick Hunt [mailto:ph...@apache.org] > Sent: Tuesday, August 04, 2009 3:55 PM > To: zookeeper-dev@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > Todd, Mahadev and I looked at this and it turns out to be a regression. > Ironically a patch I created for 3.2 branch to add quorum tests actually > broke the quorum config -- a default value for a config parameter was > lost. I'm going to submit a patch asap to get the default back, but for > the time being you can set: > > electionAlg=3 > > in each of your config files. > > You should see reference to FastLeaderElection in your log files if this > parameter is set correctly. > > Sorry for the trouble, > > Patrick > > Todd Greenwood wrote: > > Mahadev, > > > > I just heard from IT that this build behaves in exactly the same way as > > previous versions, e.g. we get continuous leader elections that > > disconnect the followers and then get re-elected, and disconnect...etc. > > > > This is from a fresh sync to the 3.2 branch: > > > > svn co > > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 > > ./branch-3.2 > > > > CHANGES.TXT show the various fixes included: > > > > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > > /src/original$ head -n 50 branch-3.2/CHANGES.txt > > Release 3.2.1 > > > > Backward compatibile changes: > > > > BUGFIXES: > > ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via > > flavio) > > > > ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via > > mahadev) > > > > ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) > > > > ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via > > mahadev) > > > > ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) > > (giri via mahadev) > > > > ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) > > > > ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate > > failure. (chris via mahadev) > > > > ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via > > phunt) > > > > ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and > > other) > > embedded clients (ryan rawson via phunt) > > > > ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via > > mahadev) > > > > ZOOKEEPER-479. QuorumHierarchical does not count groups correctly > > (flavio via mahadev) > > > > ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty > > cert > > (Chris Darroch via phunt) > > > > ZOOKEEPER-480. FLE should perform leader check when node is not > > leading and > > add vote of follower (flavio via mahadev) > > > > ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio > > via > > mahadev) > > > > What can I do to assist you with this issue? > > > > -Todd > > > >> -Original Message- > >> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > >> Sent: Tuesday, August 04, 2009 12:43 PM > >> To: zookeeper-dev@hadoop.apache.org > >> Subject: Re: Unending Leader Elections in WAN deploy > >> > >> Hi todd, > >> comments in line > >> > >> > >> On 8/4/09 12:38 PM, "Todd Greenwood" > > wrote: > >>> Mahadev, > >>> > >>> Some quick questions: > >>> > >>> 1. Version > >>> > >>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is > > still > >>> calling this 3.2.0. Should this be rev'd, and am I correct in > > calling > >>> this release 3.2.1? > >> Yes the release is 3.2.1. The build.xml will be fixed as soon as we > > tag > >> the > >> release. > >> > >>> 2. Build targets > >>> > >>> The package target fails b/c the create-cppunit-configure target > > fails > >>> due to various problems w/ respect to autoconf. Are these > > dependencies > >>> documented somewhere ? I'd like to have a fully building system. > >>> > >>> create-cppunit-configure: > >>> [exec] Can't exec "libtoolize": No such file or directory at > >>> /usr/bin/autoreconf line 188. > >>> [exec] Use of uninitialized value $libtoolize in pattern match > >>> (m//) at /usr/bin/autoreconf line 188. > >>> [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not > > found > >>> in library > >>> [exec] configure.ac:33: error: possibly undefined macro: > >>> AM_PATH_CPPUNIT > >>> [exec] If this token and others are legitimate, please > > use > >>> m4_pattern_allow. > >>> [exec] See the Autoconf documentation. > >>> [exec] configure.ac:53: error: possibly undefined macro: > >>> AC_PROG_LIBTOOL > >>> [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 > >>> > >> You need auto tools to run this. Please read the README for building c > >> client library at src/c/ for the installation requirements. > >>> 3. Sync failure: > >>> > >>> This is still failing. > >>> > >>> svn: URL > >>> 'http://svn.apache.org/r
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-498: --- Fix Version/s: 3.3.0 Assignee: Patrick Hunt > Unending Leader Elections : WAN configuration > - > > Key: ZOOKEEPER-498 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.2.0 > Environment: Each machine: > CentOS 5.2 64-bit > 2GB ram > java version "1.6.0_13" > Java(TM) SE Runtime Environment (build 1.6.0_13-b03) > Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed > Network Topology: > DC : central data center > POD(N): remote data center > Zookeeper Topology: > Leaders may be elected only in DC (weight = 1) > Only followers are elected in PODS (weight = 0) >Reporter: Todd Greenwood-Geer >Assignee: Patrick Hunt >Priority: Critical > Fix For: 3.2.1, 3.3.0 > > Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg > > > In a WAN configuration, ZooKeeper is endlessly electing, terminating, and > re-electing a ZooKeeper leader. The WAN configuration involves two groups, a > central DC group of ZK servers that have a voting weight = 1, and a group of > servers in remote pods with a voting weight of 0. > What we expect to see is leaders elected only in the DC, and the pods to > contain only followers. What we are seeing is a continuous cycling of > leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended > patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Unending Leader Elections in WAN deploy
Todd, Mahadev and I looked at this and it turns out to be a regression. Ironically a patch I created for 3.2 branch to add quorum tests actually broke the quorum config -- a default value for a config parameter was lost. I'm going to submit a patch asap to get the default back, but for the time being you can set: electionAlg=3 in each of your config files. You should see reference to FastLeaderElection in your log files if this parameter is set correctly. Sorry for the trouble, Patrick Todd Greenwood wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, "Todd Greenwood" wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec "libtoolize": No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, "Todd Greenwood" wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To
[jira] Updated: (ZOOKEEPER-493) patch for command line setquota
[ https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-493: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, thanks Steve! Applied to 3.2.1 and 3.3 > patch for command line setquota > > > Key: ZOOKEEPER-493 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.2.0 >Reporter: steve bendiola >Assignee: steve bendiola >Priority: Minor > Fix For: 3.2.1, 3.3.0 > > Attachments: quotafix.patch, ZOOKEEPER-493.patch > > > the command line "setquota" tries to use argument 3 as both a path and a value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-493) patch for command line setquota
[ https://issues.apache.org/jira/browse/ZOOKEEPER-493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-493: --- Attachment: ZOOKEEPER-493.patch updated patch to cleanup a bit in addition to fix. ZOOKEEPER-493.patch supersedes previous patch (fixed naming of patch file) > patch for command line setquota > > > Key: ZOOKEEPER-493 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-493 > Project: Zookeeper > Issue Type: Bug > Components: java client >Affects Versions: 3.2.0 >Reporter: steve bendiola >Assignee: steve bendiola >Priority: Minor > Fix For: 3.2.1, 3.3.0 > > Attachments: quotafix.patch, ZOOKEEPER-493.patch > > > the command line "setquota" tries to use argument 3 as both a path and a value -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-485) need ops documentation that details supervision of ZK server processes
[ https://issues.apache.org/jira/browse/ZOOKEEPER-485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-485: --- Fix Version/s: (was: 3.2.1) > need ops documentation that details supervision of ZK server processes > -- > > Key: ZOOKEEPER-485 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-485 > Project: Zookeeper > Issue Type: Bug > Components: documentation, server >Reporter: Patrick Hunt > Fix For: 3.3.0 > > > We need ops documentation detailing what to do if the ZK server VM fails - by > fail I mean the jvm process > exits/dies/crashes/etc... > In general a supervisor process should be used to start/stop/restart/etc... > the ZK server vm. > Something like daemontools http://cr.yp.to/daemontools.html could be used, or > more simply a wrapper script > should monitor the status of the pid and restart if the jvm fails. It's up to > the operator, if this is not done > automatically then it will have to be done manually, by operator restarting > the ZK server jvm > The inherent behavior of ZK wrt to failures - ie that it automatically > recovers as long as quorum is maintained - > fits into this nicely. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (ZOOKEEPER-490) the java docs for session creation are misleading/incomplete
[ https://issues.apache.org/jira/browse/ZOOKEEPER-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-490: -- Assignee: Patrick Hunt > the java docs for session creation are misleading/incomplete > > > Key: ZOOKEEPER-490 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-490 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.1.1, 3.2.0 >Reporter: Patrick Hunt >Assignee: Patrick Hunt > Fix For: 3.2.1, 3.3.0 > > > the javadoc for ZooKeeper constructor says: > * The client object will pick an arbitrary server and try to connect to > it. > * If failed, it will try the next one in the list, until a connection is > * established, or all the servers have been tried. > the "or all server tried" phrase is misleading, it should indicate that we > retry until success, con closed, or session expired. > we also need ot mention that connection is async, that constructor returns > immed and you need to look for connection event in watcher -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-447) zkServer.sh doesn't allow different config files to be specified on the command line
[ https://issues.apache.org/jira/browse/ZOOKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-447: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, thanks Henry! Committed to 3.2.1 and 3.3 > zkServer.sh doesn't allow different config files to be specified on the > command line > > > Key: ZOOKEEPER-447 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-447 > Project: Zookeeper > Issue Type: Improvement >Affects Versions: 3.1.1, 3.2.0 >Reporter: Henry Robinson >Assignee: Henry Robinson >Priority: Minor > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-447.patch > > > Unless I'm missing something, you can change the directory that the zoo.cfg > file is in by setting ZOOCFGDIR but not the name of the file itself. > I find it convenient myself to specify the config file on the command line, > but we should also let it be specified by environment variable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
[ https://issues.apache.org/jira/browse/ZOOKEEPER-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Greenwood-Geer updated ZOOKEEPER-498: -- Attachment: zoo.cfg pod-zook-logs-01.tar.gz dc-zook-logs-01.tar.gz Zookeeper Logs and configuration files: dc1-zook01.log dc1-zook02.log dc1-zook03.log dc1-zook04.log dc1-zook05.log pd1-zook01.log pd1-zook02.log pd4-zook01.log pd4-zook02.log zoo.cfg > Unending Leader Elections : WAN configuration > - > > Key: ZOOKEEPER-498 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.2.0 > Environment: Each machine: > CentOS 5.2 64-bit > 2GB ram > java version "1.6.0_13" > Java(TM) SE Runtime Environment (build 1.6.0_13-b03) > Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed > Network Topology: > DC : central data center > POD(N): remote data center > Zookeeper Topology: > Leaders may be elected only in DC (weight = 1) > Only followers are elected in PODS (weight = 0) >Reporter: Todd Greenwood-Geer >Priority: Critical > Fix For: 3.2.1 > > Attachments: dc-zook-logs-01.tar.gz, pod-zook-logs-01.tar.gz, zoo.cfg > > > In a WAN configuration, ZooKeeper is endlessly electing, terminating, and > re-electing a ZooKeeper leader. The WAN configuration involves two groups, a > central DC group of ZK servers that have a voting weight = 1, and a group of > servers in remote pods with a voting weight of 0. > What we expect to see is leaders elected only in the DC, and the pods to > contain only followers. What we are seeing is a continuous cycling of > leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended > patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-498) Unending Leader Elections : WAN configuration
Unending Leader Elections : WAN configuration - Key: ZOOKEEPER-498 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-498 Project: Zookeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.2.0 Environment: Each machine: CentOS 5.2 64-bit 2GB ram java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed Network Topology: DC : central data center POD(N): remote data center Zookeeper Topology: Leaders may be elected only in DC (weight = 1) Only followers are elected in PODS (weight = 0) Reporter: Todd Greenwood-Geer Priority: Critical Fix For: 3.2.1 In a WAN configuration, ZooKeeper is endlessly electing, terminating, and re-electing a ZooKeeper leader. The WAN configuration involves two groups, a central DC group of ZK servers that have a voting weight = 1, and a group of servers in remote pods with a voting weight of 0. What we expect to see is leaders elected only in the DC, and the pods to contain only followers. What we are seeing is a continuous cycling of leaders. We have seen this consistently with 3.2.0, 3.2.0 + recommended patches (473, 479, 481, 491), and now release 3.2.1. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-497) api and forrest docs should mention if classes are thread safe
api and forrest docs should mention if classes are thread safe -- Key: ZOOKEEPER-497 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-497 Project: Zookeeper Issue Type: Bug Components: documentation Affects Versions: 3.2.0 Reporter: Patrick Hunt Priority: Minor Fix For: 3.3.0 the api (c/java clients) and the forrest docs should talk about thread safety - in particular we don't mention that ZooKeeper class is thread safe (etc...) Docs should be updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Unending Leader Elections in WAN deploy
Will do. > -Original Message- > From: Patrick Hunt [mailto:ph...@apache.org] > Sent: Tuesday, August 04, 2009 1:34 PM > To: zookeeper-dev@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > It would be better to create a JIRA with configs as well as logs. > > Patrick > > Mahadev Konar wrote: > > Hi Todd, > > > > What is the synclimit you are using? Can you post your config? For > WAN's > > you will have to use much bigger values for synclimit and others. > > > > Thanks > > mahadev > > > > > > On 8/4/09 1:24 PM, "Todd Greenwood" wrote: > > > >> Mahadev, > >> > >> I just heard from IT that this build behaves in exactly the same way as > >> previous versions, e.g. we get continuous leader elections that > >> disconnect the followers and then get re-elected, and disconnect...etc. > >> > >> This is from a fresh sync to the 3.2 branch: > >> > >> svn co > >> http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 > >> ./branch-3.2 > >> > >> CHANGES.TXT show the various fixes included: > >> > >> > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > >> /src/original$ head -n 50 branch-3.2/CHANGES.txt > >> Release 3.2.1 > >> > >> Backward compatibile changes: > >> > >> BUGFIXES: > >> ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via > >> flavio) > >> > >> ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via > >> mahadev) > >> > >> ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) > >> > >> ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via > >> mahadev) > >> > >> ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) > >> (giri via mahadev) > >> > >> ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) > >> > >> ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate > >> failure. (chris via mahadev) > >> > >> ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via > >> phunt) > >> > >> ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and > >> other) > >> embedded clients (ryan rawson via phunt) > >> > >> ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via > >> mahadev) > >> > >> ZOOKEEPER-479. QuorumHierarchical does not count groups correctly > >> (flavio via mahadev) > >> > >> ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty > >> cert > >> (Chris Darroch via phunt) > >> > >> ZOOKEEPER-480. FLE should perform leader check when node is not > >> leading and > >> add vote of follower (flavio via mahadev) > >> > >> ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio > >> via > >> mahadev) > >> > >> What can I do to assist you with this issue? > >> > >> -Todd > >> > >>> -Original Message- > >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > >>> Sent: Tuesday, August 04, 2009 12:43 PM > >>> To: zookeeper-dev@hadoop.apache.org > >>> Subject: Re: Unending Leader Elections in WAN deploy > >>> > >>> Hi todd, > >>> comments in line > >>> > >>> > >>> On 8/4/09 12:38 PM, "Todd Greenwood" > >> wrote: > Mahadev, > > Some quick questions: > > 1. Version > > I see that the CHANGES.txt calls this 3.2.1, but the build.xml is > >> still > calling this 3.2.0. Should this be rev'd, and am I correct in > >> calling > this release 3.2.1? > >>> Yes the release is 3.2.1. The build.xml will be fixed as soon as we > >> tag > >>> the > >>> release. > >>> > 2. Build targets > > The package target fails b/c the create-cppunit-configure target > >> fails > due to various problems w/ respect to autoconf. Are these > >> dependencies > documented somewhere ? I'd like to have a fully building system. > > create-cppunit-configure: > [exec] Can't exec "libtoolize": No such file or directory at > /usr/bin/autoreconf line 188. > [exec] Use of uninitialized value $libtoolize in pattern match > (m//) at /usr/bin/autoreconf line 188. > [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not > >> found > in library > [exec] configure.ac:33: error: possibly undefined macro: > AM_PATH_CPPUNIT > [exec] If this token and others are legitimate, please > >> use > m4_pattern_allow. > [exec] See the Autoconf documentation. > [exec] configure.ac:53: error: possibly undefined macro: > AC_PROG_LIBTOOL > [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 > > >>> You need auto tools to run this. Please read the README for building c > >>> client library at src/c/ for the installation requirements. > 3. Sync failure: > > This is still failing. > > svn: URL > 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' > doesn't exist > > >>> Yes this hasn't been fixed yet! > >>> > >>> Thanks >
Re: Unending Leader Elections in WAN deploy
It would be better to create a JIRA with configs as well as logs. Patrick Mahadev Konar wrote: Hi Todd, What is the synclimit you are using? Can you post your config? For WAN's you will have to use much bigger values for synclimit and others. Thanks mahadev On 8/4/09 1:24 PM, "Todd Greenwood" wrote: Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 12:43 PM To: zookeeper-dev@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi todd, comments in line On 8/4/09 12:38 PM, "Todd Greenwood" wrote: Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec "libtoolize": No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist Yes this hasn't been fixed yet! Thanks mahadev -Todd -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later today. -Todd -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Tuesday, August 04, 2009 11:20 AM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, I just committed 480 and 491. You can checkout the 3.2 branch now. Thanks mahadev On 8/3/09 4:29 PM, "Todd Greenwood" wrote: That'd be perfect. Thanks! -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Monday, August 03, 2009 4:24 PM To: zookeeper-u...@hadoop.apache.org Subject: Re: Unending Leader Elections in WAN deploy Hi Todd, Most of the patches that you mention should be in the branch 3.2 by tomm or so. 481, 479 are already in. 480 and
Re: Unending Leader Elections in WAN deploy
Hi Todd, What is the synclimit you are using? Can you post your config? For WAN's you will have to use much bigger values for synclimit and others. Thanks mahadev On 8/4/09 1:24 PM, "Todd Greenwood" wrote: > Mahadev, > > I just heard from IT that this build behaves in exactly the same way as > previous versions, e.g. we get continuous leader elections that > disconnect the followers and then get re-elected, and disconnect...etc. > > This is from a fresh sync to the 3.2 branch: > > svn co > http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 > ./branch-3.2 > > CHANGES.TXT show the various fixes included: > > to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper > /src/original$ head -n 50 branch-3.2/CHANGES.txt > Release 3.2.1 > > Backward compatibile changes: > > BUGFIXES: > ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via > flavio) > > ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via > mahadev) > > ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) > > ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via > mahadev) > > ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) > (giri via mahadev) > > ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) > > ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate > failure. (chris via mahadev) > > ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via > phunt) > > ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and > other) > embedded clients (ryan rawson via phunt) > > ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via > mahadev) > > ZOOKEEPER-479. QuorumHierarchical does not count groups correctly > (flavio via mahadev) > > ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty > cert > (Chris Darroch via phunt) > > ZOOKEEPER-480. FLE should perform leader check when node is not > leading and > add vote of follower (flavio via mahadev) > > ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio > via > mahadev) > > What can I do to assist you with this issue? > > -Todd > >> -Original Message- >> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] >> Sent: Tuesday, August 04, 2009 12:43 PM >> To: zookeeper-dev@hadoop.apache.org >> Subject: Re: Unending Leader Elections in WAN deploy >> >> Hi todd, >> comments in line >> >> >> On 8/4/09 12:38 PM, "Todd Greenwood" > wrote: >> >>> Mahadev, >>> >>> Some quick questions: >>> >>> 1. Version >>> >>> I see that the CHANGES.txt calls this 3.2.1, but the build.xml is > still >>> calling this 3.2.0. Should this be rev'd, and am I correct in > calling >>> this release 3.2.1? >> Yes the release is 3.2.1. The build.xml will be fixed as soon as we > tag >> the >> release. >> >>> >>> 2. Build targets >>> >>> The package target fails b/c the create-cppunit-configure target > fails >>> due to various problems w/ respect to autoconf. Are these > dependencies >>> documented somewhere ? I'd like to have a fully building system. >>> >>> create-cppunit-configure: >>> [exec] Can't exec "libtoolize": No such file or directory at >>> /usr/bin/autoreconf line 188. >>> [exec] Use of uninitialized value $libtoolize in pattern match >>> (m//) at /usr/bin/autoreconf line 188. >>> [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not > found >>> in library >>> [exec] configure.ac:33: error: possibly undefined macro: >>> AM_PATH_CPPUNIT >>> [exec] If this token and others are legitimate, please > use >>> m4_pattern_allow. >>> [exec] See the Autoconf documentation. >>> [exec] configure.ac:53: error: possibly undefined macro: >>> AC_PROG_LIBTOOL >>> [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 >>> >> You need auto tools to run this. Please read the README for building c >> client library at src/c/ for the installation requirements. >>> >>> 3. Sync failure: >>> >>> This is still failing. >>> >>> svn: URL >>> 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' >>> doesn't exist >>> >> >> Yes this hasn't been fixed yet! >> >> Thanks >> mahadev >>> -Todd >>> -Original Message- From: Todd Greenwood Sent: Tuesday, August 04, 2009 11:26 AM To: 'zookeeper-u...@hadoop.apache.org' Subject: RE: Unending Leader Elections in WAN deploy Great news. Thank you Mahadev. I'll report our findings later > today. -Todd > -Original Message- > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > Sent: Tuesday, August 04, 2009 11:20 AM > To: zookeeper-u...@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > Hi Todd, > I just committed 480 and 491. You can checkout the 3.2 branch > now. > > Thanks > mahadev > > > On 8/
RE: Unending Leader Elections in WAN deploy
Mahadev, I just heard from IT that this build behaves in exactly the same way as previous versions, e.g. we get continuous leader elections that disconnect the followers and then get re-elected, and disconnect...etc. This is from a fresh sync to the 3.2 branch: svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 ./branch-3.2 CHANGES.TXT show the various fixes included: to...@toddg01lt:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper /src/original$ head -n 50 branch-3.2/CHANGES.txt Release 3.2.1 Backward compatibile changes: BUGFIXES: ZOOKEEPER-468. avoid compile warning in send_auth_info(). (chris via flavio) ZOOKEEPER-469. make sure CPPUNIT_CFLAGS isn't overwritten (chris via mahadev) ZOOKEEPER-471. update zkperl for 3.2.x branch. (chris via mahadev) ZOOKEEPER-470. include unistd.h for sleep() in c tests (chris via mahadev) ZOOKEEPER-460. bad testRetry in cppunit tests (hudson failure) (giri via mahadev) ZOOKEEPER-467. Change log level in BookieHandle (flavio via mahadev) ZOOKEEPER-482. ignore sigpipe in testRetry to avoid silent immediate failure. (chris via mahadev) ZOOKEEPER-487. setdata on root (/) crashes the servers (mahadev via phunt) ZOOKEEPER-457. Make ZookeeperMain public, support for HBase (and other) embedded clients (ryan rawson via phunt) ZOOKEEPER-481. Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) ZOOKEEPER-479. QuorumHierarchical does not count groups correctly (flavio via mahadev) ZOOKEEPER-466. crash on zookeeper_close() when using auth with empty cert (Chris Darroch via phunt) ZOOKEEPER-480. FLE should perform leader check when node is not leading and add vote of follower (flavio via mahadev) ZOOKEEPER-491. Prevent zero-weight servers from being elected (flavio via mahadev) What can I do to assist you with this issue? -Todd > -Original Message- > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > Sent: Tuesday, August 04, 2009 12:43 PM > To: zookeeper-dev@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > Hi todd, > comments in line > > > On 8/4/09 12:38 PM, "Todd Greenwood" wrote: > > > Mahadev, > > > > Some quick questions: > > > > 1. Version > > > > I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still > > calling this 3.2.0. Should this be rev'd, and am I correct in calling > > this release 3.2.1? > Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag > the > release. > > > > > 2. Build targets > > > > The package target fails b/c the create-cppunit-configure target fails > > due to various problems w/ respect to autoconf. Are these dependencies > > documented somewhere ? I'd like to have a fully building system. > > > > create-cppunit-configure: > > [exec] Can't exec "libtoolize": No such file or directory at > > /usr/bin/autoreconf line 188. > > [exec] Use of uninitialized value $libtoolize in pattern match > > (m//) at /usr/bin/autoreconf line 188. > > [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found > > in library > > [exec] configure.ac:33: error: possibly undefined macro: > > AM_PATH_CPPUNIT > > [exec] If this token and others are legitimate, please use > > m4_pattern_allow. > > [exec] See the Autoconf documentation. > > [exec] configure.ac:53: error: possibly undefined macro: > > AC_PROG_LIBTOOL > > [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 > > > You need auto tools to run this. Please read the README for building c > client library at src/c/ for the installation requirements. > > > > 3. Sync failure: > > > > This is still failing. > > > > svn: URL > > 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' > > doesn't exist > > > > Yes this hasn't been fixed yet! > > Thanks > mahadev > > -Todd > > > >> -Original Message- > >> From: Todd Greenwood > >> Sent: Tuesday, August 04, 2009 11:26 AM > >> To: 'zookeeper-u...@hadoop.apache.org' > >> Subject: RE: Unending Leader Elections in WAN deploy > >> > >> Great news. Thank you Mahadev. I'll report our findings later today. > >> -Todd > >> > >>> -Original Message- > >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > >>> Sent: Tuesday, August 04, 2009 11:20 AM > >>> To: zookeeper-u...@hadoop.apache.org > >>> Subject: Re: Unending Leader Elections in WAN deploy > >>> > >>> Hi Todd, > >>> I just committed 480 and 491. You can checkout the 3.2 branch now. > >>> > >>> Thanks > >>> mahadev > >>> > >>> > >>> On 8/3/09 4:29 PM, "Todd Greenwood" > > wrote: > >>> > That'd be perfect. Thanks! > > > -Original Message- > > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > > Sent: Monday, August 03, 2009 4:24 PM > > To: zookeeper-u...@hadoop.apache.org > > Subject: Re: Unending Leader Elections in WAN deploy > > > > Hi Todd, > > Most of the patches that you mention shou
Re: Unending Leader Elections in WAN deploy
Hi todd, comments in line On 8/4/09 12:38 PM, "Todd Greenwood" wrote: > Mahadev, > > Some quick questions: > > 1. Version > > I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still > calling this 3.2.0. Should this be rev'd, and am I correct in calling > this release 3.2.1? Yes the release is 3.2.1. The build.xml will be fixed as soon as we tag the release. > > 2. Build targets > > The package target fails b/c the create-cppunit-configure target fails > due to various problems w/ respect to autoconf. Are these dependencies > documented somewhere ? I'd like to have a fully building system. > > create-cppunit-configure: > [exec] Can't exec "libtoolize": No such file or directory at > /usr/bin/autoreconf line 188. > [exec] Use of uninitialized value $libtoolize in pattern match > (m//) at /usr/bin/autoreconf line 188. > [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found > in library > [exec] configure.ac:33: error: possibly undefined macro: > AM_PATH_CPPUNIT > [exec] If this token and others are legitimate, please use > m4_pattern_allow. > [exec] See the Autoconf documentation. > [exec] configure.ac:53: error: possibly undefined macro: > AC_PROG_LIBTOOL > [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 > You need auto tools to run this. Please read the README for building c client library at src/c/ for the installation requirements. > > 3. Sync failure: > > This is still failing. > > svn: URL > 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' > doesn't exist > Yes this hasn't been fixed yet! Thanks mahadev > -Todd > >> -Original Message- >> From: Todd Greenwood >> Sent: Tuesday, August 04, 2009 11:26 AM >> To: 'zookeeper-u...@hadoop.apache.org' >> Subject: RE: Unending Leader Elections in WAN deploy >> >> Great news. Thank you Mahadev. I'll report our findings later today. >> -Todd >> >>> -Original Message- >>> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] >>> Sent: Tuesday, August 04, 2009 11:20 AM >>> To: zookeeper-u...@hadoop.apache.org >>> Subject: Re: Unending Leader Elections in WAN deploy >>> >>> Hi Todd, >>> I just committed 480 and 491. You can checkout the 3.2 branch now. >>> >>> Thanks >>> mahadev >>> >>> >>> On 8/3/09 4:29 PM, "Todd Greenwood" > wrote: >>> That'd be perfect. Thanks! > -Original Message- > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > Sent: Monday, August 03, 2009 4:24 PM > To: zookeeper-u...@hadoop.apache.org > Subject: Re: Unending Leader Elections in WAN deploy > > Hi Todd, > Most of the patches that you mention should be in the branch > 3.2 by tomm > or so. 481, 479 are already in. 480 and 491 should be in by tomm. Would > that > suffice for you? > > Thanks > mahadev > > > On 8/3/09 4:21 PM, "Todd Greenwood" >> wrote: > >> Another problem...I've reverted to the latest versions of the patches >> that are not specific to branch-3.2, and I'm getting two > compilation >> errors: >> >> build-generated: >> [javac] Compiling 44 source files to >> >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >> atched/branch-3.2/build/classes >> >> compile-main: >> [javac] Compiling 2 source files to >> >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >> atched/branch-3.2/build/classes >> [javac] >> >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >> atched/branch- >> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru >> mStats.java:30: name clash: getQuorumPeers() and > getQuorumPeers() have >> the same erasure >> [javac] public String[] getQuorumPeers(); >> [javac] ^ >> [javac] >> >> > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p >> atched/branch- >> 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru >> mStats.java:31: name clash: getServerState() and > getServerState() have >> the same erasure >> [javac] public String getServerState(); >> [javac] ^ >> [javac] 2 errors >> >> My build process is pretty simple: >> >> 1. copy the branch-3.2 source to a temp directory >> (src/patched/branch-3.2) >> 2. apply the ZOOKEEPER patches in my patches directory >> 3. build zookeeper in the temp directory >> >> -Todd >>> -Original Message- >>> From: Todd Greenwood [mailto:to...@audiencescience.com] >>> Sent: Monday, August 03, 2009 4:09 PM >>> To: zookeeper-u...@hadoop.apache.org >>> Subject: RE: Unending Leader Elections in WAN deploy >>> >>> Flavio, >>> I
RE: Unending Leader Elections in WAN deploy
Mahadev, Some quick questions: 1. Version I see that the CHANGES.txt calls this 3.2.1, but the build.xml is still calling this 3.2.0. Should this be rev'd, and am I correct in calling this release 3.2.1? 2. Build targets The package target fails b/c the create-cppunit-configure target fails due to various problems w/ respect to autoconf. Are these dependencies documented somewhere ? I'd like to have a fully building system. create-cppunit-configure: [exec] Can't exec "libtoolize": No such file or directory at /usr/bin/autoreconf line 188. [exec] Use of uninitialized value $libtoolize in pattern match (m//) at /usr/bin/autoreconf line 188. [exec] configure.ac:33: warning: macro `AM_PATH_CPPUNIT' not found in library [exec] configure.ac:33: error: possibly undefined macro: AM_PATH_CPPUNIT [exec] If this token and others are legitimate, please use m4_pattern_allow. [exec] See the Autoconf documentation. [exec] configure.ac:53: error: possibly undefined macro: AC_PROG_LIBTOOL [exec] autoreconf: /usr/bin/autoconf failed with exit status: 1 3. Sync failure: This is still failing. svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist -Todd > -Original Message- > From: Todd Greenwood > Sent: Tuesday, August 04, 2009 11:26 AM > To: 'zookeeper-u...@hadoop.apache.org' > Subject: RE: Unending Leader Elections in WAN deploy > > Great news. Thank you Mahadev. I'll report our findings later today. > -Todd > > > -Original Message- > > From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > > Sent: Tuesday, August 04, 2009 11:20 AM > > To: zookeeper-u...@hadoop.apache.org > > Subject: Re: Unending Leader Elections in WAN deploy > > > > Hi Todd, > > I just committed 480 and 491. You can checkout the 3.2 branch now. > > > > Thanks > > mahadev > > > > > > On 8/3/09 4:29 PM, "Todd Greenwood" wrote: > > > > > That'd be perfect. Thanks! > > > > > >> -Original Message- > > >> From: Mahadev Konar [mailto:maha...@yahoo-inc.com] > > >> Sent: Monday, August 03, 2009 4:24 PM > > >> To: zookeeper-u...@hadoop.apache.org > > >> Subject: Re: Unending Leader Elections in WAN deploy > > >> > > >> Hi Todd, > > >> Most of the patches that you mention should be in the branch 3.2 by > > > tomm > > >> or so. 481, 479 are already in. 480 and 491 should be in by tomm. > > > Would > > >> that > > >> suffice for you? > > >> > > >> Thanks > > >> mahadev > > >> > > >> > > >> On 8/3/09 4:21 PM, "Todd Greenwood" > wrote: > > >> > > >>> Another problem...I've reverted to the latest versions of the > > > patches > > >>> that are not specific to branch-3.2, and I'm getting two compilation > > >>> errors: > > >>> > > >>> build-generated: > > >>> [javac] Compiling 44 source files to > > >>> > > > > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > > >>> atched/branch-3.2/build/classes > > >>> > > >>> compile-main: > > >>> [javac] Compiling 2 source files to > > >>> > > > > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > > >>> atched/branch-3.2/build/classes > > >>> [javac] > > >>> > > > > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > > >>> > > > atched/branch- > 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru > > >>> mStats.java:30: name clash: getQuorumPeers() and getQuorumPeers() > > > have > > >>> the same erasure > > >>> [javac] public String[] getQuorumPeers(); > > >>> [javac] ^ > > >>> [javac] > > >>> > > > > /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p > > >>> > > > atched/branch- > 3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru > > >>> mStats.java:31: name clash: getServerState() and getServerState() > > > have > > >>> the same erasure > > >>> [javac] public String getServerState(); > > >>> [javac] ^ > > >>> [javac] 2 errors > > >>> > > >>> My build process is pretty simple: > > >>> > > >>> 1. copy the branch-3.2 source to a temp directory > > >>> (src/patched/branch-3.2) > > >>> 2. apply the ZOOKEEPER patches in my patches directory > > >>> 3. build zookeeper in the temp directory > > >>> > > >>> -Todd > > -Original Message- > > From: Todd Greenwood [mailto:to...@audiencescience.com] > > Sent: Monday, August 03, 2009 4:09 PM > > To: zookeeper-u...@hadoop.apache.org > > Subject: RE: Unending Leader Elections in WAN deploy > > > > Flavio, > > I notice that you've updated the patches referenced for the WAN > > deployment. There appears to be an order dependency w/ respect to > > >>> these > > four patches... > > > > ZOOKEEPER-473.patch ZOOKEEPER-479-branch3.2.patch > > ZOOKEEPER-481-branch3.2.patch ZOOKEEPER-491.patch > > > > 473 -> 479 (479 fails) > > > > > > >>> > > > > to...@toddg01lt:~/asi/workspaces/ma
[jira] Resolved: (ZOOKEEPER-475) FLENewEpochTest failed on nightly builds.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-475. - Resolution: Fixed given ZOOKEEPER-479, ZOOKEEPER-480, ZOOKEEPER-481 have been fixed, this should be fixed. > FLENewEpochTest failed on nightly builds. > - > > Key: ZOOKEEPER-475 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-475 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.2.0 >Reporter: Mahadev konar >Assignee: Flavio Paiva Junqueira >Priority: Blocker > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-475.patch, ZOOKEEPER-475.patch > > > THe flenewepochtest failed on one of the nightly builds - > http://hudson.zones.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/377. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-368) Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Henry Robinson updated ZOOKEEPER-368: - Attachment: observers.patch obs-refactor.patch Here is both a slightly modified version of the refactor patch, and a patch containing the new code for Observers. I have included some tests now as well. The Observer implementation is simplified from previous patches. I have added new methods to QuorumPeer to get at both the entire view of the ensemble, the voting view (containing Followers) and the observing view. To use an Observer, in the ensemble config file append :observer to the description for any server you want to be an Observer. So for example write: server.3:localhost:2181:3181:observer In the Observer's own config file, add a line with the option peerType=observer I will probably in the future remove these slightly redundant specifications, but for now you will need both. You must apply the patches in order; the refactor patch first. Both patches apply cleanly for me using patch -p0 against a clean checkout of trunk as of tonight (Aug 4th). > Observers > - > > Key: ZOOKEEPER-368 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-368 > Project: Zookeeper > Issue Type: New Feature > Components: quorum >Reporter: Flavio Paiva Junqueira >Assignee: Henry Robinson > Attachments: obs-refactor.patch, observer-refactor.patch, > observers.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, > ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, ZOOKEEPER-368.patch, > ZOOKEEPER-368.patch > > > Currently, all servers of an ensemble participate actively in reaching > agreement on the order of ZooKeeper transactions. That is, all followers > receive proposals, acknowledge them, and receive commit messages from the > leader. A leader issues commit messages once it receives acknowledgments from > a quorum of followers. For cross-colo operation, it would be useful to have a > third role: observer. Using Paxos terminology, observers are similar to > learners. An observer does not participate actively in the agreement step of > the atomic broadcast protocol. Instead, it only commits proposals that have > been accepted by some quorum of followers. > One simple solution to implement observers is to have the leader forwarding > commit messages not only to followers but also to observers, and have > observers applying transactions according to the order followers agreed upon. > In the current implementation of the protocol, however, commit messages do > not carry their corresponding transaction payload because all servers > different from the leader are followers and followers receive such a payload > first through a proposal message. Just forwarding commit messages as they > currently are to an observer consequently is not sufficient. We have a couple > of options: > 1- Include the transaction payload along in commit messages to observers; > 2- Send proposals to observers as well. > Number 2 is simpler to implement because it doesn't require changing the > protocol implementation, but it increases traffic slightly. The performance > impact due to such an increase might be insignificant, though. > For scalability purposes, we may consider having followers also forwarding > commit messages to observers. With this option, observers can connect to > followers, and receive messages from followers. This choice is important to > avoid increasing the load on the leader with the number of observers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-491) Prevent zero-weight servers from being elected
[ https://issues.apache.org/jira/browse/ZOOKEEPER-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar updated ZOOKEEPER-491: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1 for the patch. I just committed this. thanks flavio! > Prevent zero-weight servers from being elected > -- > > Key: ZOOKEEPER-491 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-491 > Project: Zookeeper > Issue Type: New Feature > Components: leaderElection >Affects Versions: 3.2.0 >Reporter: Flavio Paiva Junqueira >Assignee: Flavio Paiva Junqueira > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-491-3.2branch.patch, ZOOKEEPER-491.patch > > > This is a fix to prevent zero-weight servers from being elected leaders. This > will allow in wide-area scenarios to restrict the set of servers that can > lead the ensemble. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (ZOOKEEPER-480) FLE should perform leader check when node is not leading and add vote of follower
[ https://issues.apache.org/jira/browse/ZOOKEEPER-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mahadev konar resolved ZOOKEEPER-480. - Resolution: Fixed Hadoop Flags: [Reviewed] I just committed this. thanks flavio. > FLE should perform leader check when node is not leading and add vote of > follower > - > > Key: ZOOKEEPER-480 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-480 > Project: Zookeeper > Issue Type: Bug >Affects Versions: 3.2.0 >Reporter: Flavio Paiva Junqueira >Assignee: Flavio Paiva Junqueira > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-480-3.2branch.patch, > ZOOKEEPER-480-3.2branch.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, > ZOOKEEPER-480.patch, ZOOKEEPER-480.patch, ZOOKEEPER-480.patch > > > As a server may join leader election while others have already elected a > leader, it is necessary that a server handles some special cases of leader > election when notifications are from servers that are either LEADING or > FOLLOWING. In such special cases, we check if we have received a message from > the leader to declare a leader elected. This check does not consider the case > that the process performing the check might be a recently elected leader, and > consequently the check fails. > This patch also adds a new case, which corresponds to adding a vote to > recvset when the notification is from a process LEADING or FOLLOWING. This > fixes the case raised in ZOOKEEPER-475. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-481) Add lastMessageSent to QuorumCnxManager
[ https://issues.apache.org/jira/browse/ZOOKEEPER-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738964#action_12738964 ] Hudson commented on ZOOKEEPER-481: -- Integrated in ZooKeeper-trunk #404 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/]) . Add lastMessageSent to QuorumCnxManager. (flavio via mahadev) > Add lastMessageSent to QuorumCnxManager > --- > > Key: ZOOKEEPER-481 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-481 > Project: Zookeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.1.1, 3.2.0 >Reporter: Flavio Paiva Junqueira >Assignee: Flavio Paiva Junqueira > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-481-branch3.2.patch, > ZOOKEEPER-481-branch3.2.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, > ZOOKEEPER-481.patch, ZOOKEEPER-481.patch, ZOOKEEPER-481.patch > > > Currently we rely on TCP for reliable delivery of FLE messages. However, as > we concurrently drop and create new connections, it is possible that a > message is sent but never received. With this patch, cnx manager keeps a list > of last messages sent, and resends the last one sent. Receiving multiples > copies is harmless. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-479) QuorumHierarchical does not count groups correctly
[ https://issues.apache.org/jira/browse/ZOOKEEPER-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738966#action_12738966 ] Hudson commented on ZOOKEEPER-479: -- Integrated in ZooKeeper-trunk #404 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/]) . QuorumHierarchical does not count groups correctly (flavio via mahadev) > QuorumHierarchical does not count groups correctly > -- > > Key: ZOOKEEPER-479 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-479 > Project: Zookeeper > Issue Type: Bug > Components: quorum >Reporter: Flavio Paiva Junqueira >Assignee: Flavio Paiva Junqueira > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-479-branch3.2.patch, ZOOKEEPER-479.patch, > ZOOKEEPER-479.patch, ZOOKEEPER-479.patch > > > QuorumHierarchical::containsQuorum should not verify if all groups > represented in the input set have more than half of the total weight. > Instead, it should check only for an overall majority of groups. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-466) crash on zookeeper_close() when using auth with empty cert
[ https://issues.apache.org/jira/browse/ZOOKEEPER-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738965#action_12738965 ] Hudson commented on ZOOKEEPER-466: -- Integrated in ZooKeeper-trunk #404 (See [http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/404/]) . crash on zookeeper_close() when using auth with empty cert > crash on zookeeper_close() when using auth with empty cert > -- > > Key: ZOOKEEPER-466 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-466 > Project: Zookeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.2.0 >Reporter: Chris Darroch >Assignee: Chris Darroch > Fix For: 3.2.1, 3.3.0 > > Attachments: ZOOKEEPER-466.patch > > > The free_auth_info() function calls deallocate_Buffer(&auth->auth) on every > element in the auth list; that function frees any memory pointed to by > auth->auth.buff if that field is non-NULL. > In zoo_add_auth(), when certLen is zero (or cert is NULL), auth.buff is set > to 0, but then not assigned to authinfo->auth when auth.buff is NULL. The > result is uninitialized data in auth->auth.buff in free_auth_info(), and > potential crashes. > The attached patch adds a test which attempts to duplicate this error; it > works for me but may not always on all systems as it depends on the > uninitialized data being non-zero; there's not really a simple way I can see > to trigger this in the current test framework. The patch also fixes the > problem, I believe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.