[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434923#comment-16434923 ] ASF GitHub Bot commented on ZOOKEEPER-2959: --- Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/500 Given that this change affects leader election I think it'd be very beneficial if @fpj could take a look by any chance. > ignore accepted epoch and LEADERINFO ack from observers when a newly elected > leader computes new epoch > -- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: xiangyq000 >Assignee: Bogdan Kanivets >Priority: Blocker > > Once the ZooKeeper cluster finishes the election for new leader, all learners > report their accepted epoch to the leader for the computation of new cluster > epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws > InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for > epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the > leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, > and now the leader and the observer have reported their accepted epochs while > neither of the followers has. Thus, the connectingFollowers set consists of > two elements, resulting in a size of 2, which is greater than half quorum, > namely, 2. Then QuorumVerifier#containsQuorum will return true, because it > does not check whether the elements of the parameter are participants. > The same flaw exists in > org.apache.zookeeper.server.quorum.Leader#waitForEpochAck -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper issue #500: ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO a...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/500 Given that this change affects leader election I think it'd be very beneficial if @fpj could take a look by any chance. ---
[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434910#comment-16434910 ] ASF GitHub Bot commented on ZOOKEEPER-2959: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/500#discussion_r180793047 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception { recursiveDelete(tmpDir); } } + +@Test +public void testGetEpochToProposeWithObserver() throws Exception { +File tmpDir = File.createTempFile("test", "dir", testData); +tmpDir.delete(); +tmpDir.mkdir(); +Leader leader = null; +try { +QuorumPeer peer = createQuorumPeer(tmpDir); +long participantId = 1; +long observerId = peer.quorumPeers.size(); +peer.quorumPeers.put(observerId, new QuorumServer(observerId, "0.0.0.0", 33225, --- End diff -- I think to be consistent with `createQuorumPeer()` method this should be something like: ``` peers.put(observerId, new QuorumServer(observerId, new InetSocketAddress("127.0.0.1", PortAssignment.unique()), new InetSocketAddress("127.0.0.1", PortAssignment.unique()), new InetSocketAddress("127.0.0.1", PortAssignment.unique()), QuorumPeer.LearnerType.OBSERVER)); ``` > ignore accepted epoch and LEADERINFO ack from observers when a newly elected > leader computes new epoch > -- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: xiangyq000 >Assignee: Bogdan Kanivets >Priority: Blocker > > Once the ZooKeeper cluster finishes the election for new leader, all learners > report their accepted epoch to the leader for the computation of new cluster > epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws > InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for > epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the > leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, > and now the leader and the observer have reported their accepted epochs while > neither of the followers has. Thus, the connectingFollowers set consists of > two elements, resulting in a size of 2, which is greater than half quorum, > namely, 2. Then QuorumVerifier#containsQuorum will return true, because it > does not check whether the elements of the parameter are participants. > The same flaw exists in > org.apache.zookeeper.server.quorum.Leader#waitForEpochAck -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434911#comment-16434911 ] ASF GitHub Bot commented on ZOOKEEPER-2959: --- Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/500#discussion_r180789703 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception { recursiveDelete(tmpDir); } } + +@Test +public void testGetEpochToProposeWithObserver() throws Exception { +File tmpDir = File.createTempFile("test", "dir", testData); --- End diff -- Have you considered using ClientBase.createEmptyTestDir() instead? > ignore accepted epoch and LEADERINFO ack from observers when a newly elected > leader computes new epoch > -- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: xiangyq000 >Assignee: Bogdan Kanivets >Priority: Blocker > > Once the ZooKeeper cluster finishes the election for new leader, all learners > report their accepted epoch to the leader for the computation of new cluster > epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws > InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for > epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the > leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, > and now the leader and the observer have reported their accepted epochs while > neither of the followers has. Thus, the connectingFollowers set consists of > two elements, resulting in a size of 2, which is greater than half quorum, > namely, 2. Then QuorumVerifier#containsQuorum will return true, because it > does not check whether the elements of the parameter are participants. > The same flaw exists in > org.apache.zookeeper.server.quorum.Leader#waitForEpochAck -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper pull request #500: ZOOKEEPER-2959: ignore accepted epoch and LEADE...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/500#discussion_r180789703 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception { recursiveDelete(tmpDir); } } + +@Test +public void testGetEpochToProposeWithObserver() throws Exception { +File tmpDir = File.createTempFile("test", "dir", testData); --- End diff -- Have you considered using ClientBase.createEmptyTestDir() instead? ---
[GitHub] zookeeper pull request #500: ZOOKEEPER-2959: ignore accepted epoch and LEADE...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/500#discussion_r180793047 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception { recursiveDelete(tmpDir); } } + +@Test +public void testGetEpochToProposeWithObserver() throws Exception { +File tmpDir = File.createTempFile("test", "dir", testData); +tmpDir.delete(); +tmpDir.mkdir(); +Leader leader = null; +try { +QuorumPeer peer = createQuorumPeer(tmpDir); +long participantId = 1; +long observerId = peer.quorumPeers.size(); +peer.quorumPeers.put(observerId, new QuorumServer(observerId, "0.0.0.0", 33225, --- End diff -- I think to be consistent with `createQuorumPeer()` method this should be something like: ``` peers.put(observerId, new QuorumServer(observerId, new InetSocketAddress("127.0.0.1", PortAssignment.unique()), new InetSocketAddress("127.0.0.1", PortAssignment.unique()), new InetSocketAddress("127.0.0.1", PortAssignment.unique()), QuorumPeer.LearnerType.OBSERVER)); ``` ---
ZooKeeper-trunk - Build # 3796 - Failure
See https://builds.apache.org/job/ZooKeeper-trunk/3796/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 259.92 KB...] [exec] Zookeeper_simpleSystem::testNullData : elapsed 1038 : OK [exec] Zookeeper_simpleSystem::testIPV6 : elapsed 1025 : OK [exec] Zookeeper_simpleSystem::testCreate : elapsed 2314 : OK [exec] Zookeeper_simpleSystem::testPath : elapsed 4130 : OK [exec] Zookeeper_simpleSystem::testPathValidation : elapsed 1185 : OK [exec] Zookeeper_simpleSystem::testPing : elapsed 18242 : OK [exec] Zookeeper_simpleSystem::testAcl : elapsed 1291 : OK [exec] Zookeeper_simpleSystem::testChroot : elapsed 3098 : OK [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started ZooKeeper server process failed ZooKeeper server NOT started : assertion : elapsed 42157 [exec] Zookeeper_simpleSystem::testHangingClient : elapsed 1001 : OK [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal : assertion : elapsed 1001 [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithLocal : assertion : elapsed 1000 [exec] Zookeeper_simpleSystem::testGetChildren2 : assertion : elapsed 1000 [exec] Zookeeper_simpleSystem::testLastZxid : assertion : elapsed 2005 [exec] Zookeeper_simpleSystem::testRemoveWatchers : assertion : elapsed 1001 [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/zkServer.sh: line 62: kill: (13001) - No such process [exec] Zookeeper_readOnly::testReadOnly : elapsed 4047 : OK [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/TestClient.cc:679: Assertion: assertion failed [Expression: ctx5.waitForConnected(zk)] [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/TestClient.cc:1243: Assertion: equality assertion failed [Expected: 0, Actual : -4] [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/TestClient.cc:1262: Assertion: equality assertion failed [Expected: 0, Actual : -4] [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/TestClient.cc:716: Assertion: equality assertion failed [Expected: 0, Actual : -4] [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/TestClient.cc:1291: Assertion: equality assertion failed [Expected: 0, Actual : -4] [exec] /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/src/c/tests/TestClient.cc:1336: Assertion: equality assertion failed [Expected: 0, Actual : -4] [exec] Failures !!! [exec] Run: 74 Failure total: 6 Failures: 6 Errors: 0 [exec] FAIL: zktest-mt [exec] == [exec] 1 of 2 tests failed [exec] Please report to u...@zookeeper.apache.org [exec] == [exec] Makefile:1744: recipe for target 'check-TESTS' failed [exec] make[1]: Leaving directory '/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit' [exec] Makefile:2000: recipe for target 'check-am' failed [exec] make[1]: *** [check-TESTS] Error 1 [exec] make: *** [check-am] Error 2 BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1395: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1355: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1365: exec returned: 2 Total time: 19 minutes 53 seconds Build step 'Execute shell' marked build as failure [FINDBUGS] Skipping publisher since build result is FAILURE [WARNINGS] Skipping publisher since build result is FAILURE Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording fingerprints Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Publishing Javadoc Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## All tests passed
Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1
Hi, The PR for ZK-2959 already has +1. Can the PR be merged ? Thanks
ZooKeeper_branch34_openjdk7 - Build # 1877 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1877/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 39.58 KB...] [junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest [junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.544 sec [junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.505 sec [junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.533 sec [junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.484 sec [junit] Running org.apache.zookeeper.test.SaslAuthFailTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.587 sec [junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.483 sec [junit] Running org.apache.zookeeper.test.SaslClientTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.083 sec [junit] Running org.apache.zookeeper.test.SessionInvalidationTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.47 sec [junit] Running org.apache.zookeeper.test.SessionTest [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 32.85 sec [junit] Running org.apache.zookeeper.test.StandaloneTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.872 sec [junit] Running org.apache.zookeeper.test.StatTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.597 sec [junit] Running org.apache.zookeeper.test.StaticHostProviderTest [junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.316 sec [junit] Running org.apache.zookeeper.test.SyncCallTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.539 sec [junit] Running org.apache.zookeeper.test.TruncateTest [junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.991 sec [junit] Running org.apache.zookeeper.test.UpgradeTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.258 sec [junit] Running org.apache.zookeeper.test.WatchedEventTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.092 sec [junit] Running org.apache.zookeeper.test.WatcherFuncTest [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.78 sec [junit] Running org.apache.zookeeper.test.WatcherTest [junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.051 sec [junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.245 sec [junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.586 sec fail.build.on.test.failure: BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1382: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1385: Tests failed! Total time: 28 minutes 13 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/ Recording test results Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/ Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/ Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/ Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/ Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/ ### ## FAILED TESTS (if any) ## 8 tests failed. FAILED: org.apache.zookeeper.server.ZxidRolloverTest.testMultipleRollover Error Message: java.net.BindException: Address already in use Stack Trace: java.lang.RuntimeException: java.net.BindException: Address already in use at org.apache.zookeeper.test.QuorumUtil.(QuorumUtil.java:116) at org.apache.zookeeper.test.QuorumUtil.(QuorumUtil.java:121) at org.apache.zookeeper.server.ZxidRolloverTest.setUp(ZxidRolloverTest.java:63) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method)