[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306483#comment-16306483 ] Hadoop QA commented on ZOOKEEPER-1621: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//console This message is automatically generated. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:130) > at > org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223) > at > org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:259) > at > org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:386) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:138) > at > org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:112) > at > org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:86) > at >
Failed: ZOOKEEPER- PreCommit Build #1392
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 78.91 MB...] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1392//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 08fad879f3b64947d84a45928f5731150566b0a1 logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722: exec returned: 1 Total time: 19 minutes 0 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-1621 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig Error Message: waiting for server 3 being up Stack Trace: junit.framework.AssertionFailedError: waiting for server 3 being up at org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig(ReconfigRecoveryTest.java:224) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
[jira] [Commented] (ZOOKEEPER-1621) ZooKeeper does not recover from crash when disk was full
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306470#comment-16306470 ] ASF GitHub Bot commented on ZOOKEEPER-1621: --- GitHub user abhishekrai opened a pull request: https://github.com/apache/zookeeper/pull/439 ZOOKEEPER-1621: Delete and skip txn log with incomplete header Based on the patch by Michi Mutsuzaki. When Zookeeper server encounters a txn log with incomplete header, the old behavior was to crash due to the resulting EOFException. The new behavior is catch the exception and skip the txn log. Additionally, the txn log is deleted to ensure that it does not influence future loads/PurgeTxnLog in believing that this is the only txn log before the following snapshot that they need to load/retain. You can merge this pull request into a Git repository by running: $ git pull https://github.com/abhishekrai/zookeeper ZOOKEEPER-1621 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #439 commit 6b457a069ccdb01e1ee77537b02db80f3005f5b1 Author: Abhishek RaiDate: 2017-12-29T17:38:52Z ZOOKEEPER-1621: Delete and skip txn log with incomplete header Based on the patch by Michi Mutsuzaki. When Zookeeper server encounters a txn log with incomplete header, the old behavior was to crash due to the resulting EOFException. The new behavior is catch the exception and skip the txn log. Additionally, the txn log is deleted to ensure that it does not influence future loads/PurgeTxnLog in believing that this is the only txn log before the following snapshot that they need to load/retain. > ZooKeeper does not recover from crash when disk was full > > > Key: ZOOKEEPER-1621 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1621 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.3 > Environment: Ubuntu 12.04, Amazon EC2 instance >Reporter: David Arthur >Assignee: Michi Mutsuzaki > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-1621.2.patch, ZOOKEEPER-1621.patch, > zookeeper.log.gz > > > The disk that ZooKeeper was using filled up. During a snapshot write, I got > the following exception > 2013-01-16 03:11:14,098 - ERROR [SyncThread:0:SyncRequestProcessor@151] - > Severe unrecoverable error, exiting > java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:282) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:309) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:306) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:162) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101) > Then many subsequent exceptions like: > 2013-01-16 15:02:23,984 - ERROR [main:Util@239] - Last transaction was > partial. > 2013-01-16 15:02:23,985 - ERROR [main:ZooKeeperServerMain@63] - Unexpected > exception, exiting abnormally > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:375) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:64) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:558) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:577) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:543) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:625) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:529) > at > org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.(FileTxnLog.java:504) > at > org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:341) > at >
[GitHub] zookeeper pull request #439: ZOOKEEPER-1621: Delete and skip txn log with in...
GitHub user abhishekrai opened a pull request: https://github.com/apache/zookeeper/pull/439 ZOOKEEPER-1621: Delete and skip txn log with incomplete header Based on the patch by Michi Mutsuzaki. When Zookeeper server encounters a txn log with incomplete header, the old behavior was to crash due to the resulting EOFException. The new behavior is catch the exception and skip the txn log. Additionally, the txn log is deleted to ensure that it does not influence future loads/PurgeTxnLog in believing that this is the only txn log before the following snapshot that they need to load/retain. You can merge this pull request into a Git repository by running: $ git pull https://github.com/abhishekrai/zookeeper ZOOKEEPER-1621 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/439.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #439 commit 6b457a069ccdb01e1ee77537b02db80f3005f5b1 Author: Abhishek RaiDate: 2017-12-29T17:38:52Z ZOOKEEPER-1621: Delete and skip txn log with incomplete header Based on the patch by Michi Mutsuzaki. When Zookeeper server encounters a txn log with incomplete header, the old behavior was to crash due to the resulting EOFException. The new behavior is catch the exception and skip the txn log. Additionally, the txn log is deleted to ensure that it does not influence future loads/PurgeTxnLog in believing that this is the only txn log before the following snapshot that they need to load/retain. ---
[jira] [Commented] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306451#comment-16306451 ] ASF GitHub Bot commented on ZOOKEEPER-2901: --- Github user Randgalt commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/377#discussion_r159091495 --- Diff: src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml --- @@ -949,14 +949,15 @@ server.3=zoo3:2888:3888 -ttlNodesEnabled +zookeeper.extendedTypesEnabled --- End diff -- I only thought that we might be able to use the setting for other things in the future. But, I'm OK either way. Let me know. > Session ID that is negative causes mis-calculation of Ephemeral Type > > > Key: ZOOKEEPER-2901 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3 > Environment: Running 3.5.3-beta in Docker container >Reporter: Mark Johnson >Assignee: Jordan Zimmerman >Priority: Blocker > > In the code that determines the EphemeralType it is looking at the owner > (which is the client ID or connection ID): > EphemeralType.java: >public static EphemeralType get(long ephemeralOwner) { >if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) { >return CONTAINER; >} >if (ephemeralOwner < 0) { >return TTL; >} >return (ephemeralOwner == 0) ? VOID : NORMAL; >} > However my connection ID is: > header.getClientId(): -720548323429908480 > This causes the code to think this is a TTL Ephemeral node instead of a > NORMAL Ephemeral node. > This also explains why this is random - if my client ID is non-negative > then the node gets added correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...
Github user Randgalt commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/377#discussion_r159091495 --- Diff: src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml --- @@ -949,14 +949,15 @@ server.3=zoo3:2888:3888 -ttlNodesEnabled +zookeeper.extendedTypesEnabled --- End diff -- I only thought that we might be able to use the setting for other things in the future. But, I'm OK either way. Let me know. ---
[jira] [Commented] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306450#comment-16306450 ] ASF GitHub Bot commented on ZOOKEEPER-2901: --- Github user Randgalt commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/377#discussion_r159091284 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -476,9 +474,12 @@ public ZooKeeperServerListener getZooKeeperServerListener() { return listener; } +// Visible for testing +static volatile int serverId = 1; --- End diff -- @phunt I'm not sure what you mean about quorum peer. I could find another way to set this value. Also, it's only for testing and clearly marked. What do you suggest? > Session ID that is negative causes mis-calculation of Ephemeral Type > > > Key: ZOOKEEPER-2901 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3 > Environment: Running 3.5.3-beta in Docker container >Reporter: Mark Johnson >Assignee: Jordan Zimmerman >Priority: Blocker > > In the code that determines the EphemeralType it is looking at the owner > (which is the client ID or connection ID): > EphemeralType.java: >public static EphemeralType get(long ephemeralOwner) { >if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) { >return CONTAINER; >} >if (ephemeralOwner < 0) { >return TTL; >} >return (ephemeralOwner == 0) ? VOID : NORMAL; >} > However my connection ID is: > header.getClientId(): -720548323429908480 > This causes the code to think this is a TTL Ephemeral node instead of a > NORMAL Ephemeral node. > This also explains why this is random - if my client ID is non-negative > then the node gets added correctly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...
Github user Randgalt commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/377#discussion_r159091284 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -476,9 +474,12 @@ public ZooKeeperServerListener getZooKeeperServerListener() { return listener; } +// Visible for testing +static volatile int serverId = 1; --- End diff -- @phunt I'm not sure what you mean about quorum peer. I could find another way to set this value. Also, it's only for testing and clearly marked. What do you suggest? ---
Failed: ZOOKEEPER- PreCommit Build #1391
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 78.61 MB...] [exec] +0 tests included. The patch appears to be a documentation patch that doesn't require tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] -1 core tests. The patch failed core unit tests. [exec] [exec] +1 contrib tests. The patch passed contrib unit tests. [exec] [exec] Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//testReport/ [exec] Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html [exec] Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//console [exec] [exec] This message is automatically generated. [exec] [exec] [exec] == [exec] == [exec] Adding comment to Jira. [exec] == [exec] == [exec] [exec] [exec] Comment added. [exec] 97ab23202f713250cf36a5ddd3a7d6f87839b26b logged out [exec] [exec] [exec] == [exec] == [exec] Finished build. [exec] == [exec] == [exec] [exec] [exec] mv: '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' and '/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess' are the same file BUILD FAILED /home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722: exec returned: 1 Total time: 17 minutes 52 seconds Build step 'Execute shell' marked build as failure Archiving artifacts Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Recording test results Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 [description-setter] Description set: ZOOKEEPER-2959 Putting comment on the pull request Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7 ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.test.AsyncHammerTest.testHammer Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.zookeeper.test.AsyncHammerTest.testHammer(AsyncHammerTest.java:185) at org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306154#comment-16306154 ] Hadoop QA commented on ZOOKEEPER-2959: -- -1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//console This message is automatically generated. > ignore accepted epoch and LEADERINFO ack from observers when a newly elected > leader computes new epoch > -- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: xiangyq000 > > Once the ZooKeeper cluster finishes the election for new leader, all learners > report their accepted epoch to the leader for the computation of new cluster > epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws > InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for > epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the > leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, > and now the leader and the observer have reported their accepted epochs while > neither of the followers has. Thus, the connectingFollowers set consists of > two elements, resulting in a size of 2, which is greater than half quorum, > namely, 2. Then QuorumVerifier#containsQuorum will return true, because it > does not check whether the elements of the parameter are participants. > The same flaw exists in > org.apache.zookeeper.server.quorum.Leader#waitForEpochAck -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyq000 updated ZOOKEEPER-2959: -- Summary: ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch (was: ignore epoch proposal and ack from observers when a newly elected leader computes new epoch) > ignore accepted epoch and LEADERINFO ack from observers when a newly elected > leader computes new epoch > -- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: xiangyq000 > > Once the ZooKeeper cluster finishes the election for new leader, all learners > report their accepted epoch to the leader for the computation of new cluster > epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws > InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for > epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the > leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, > and now the leader and the observer have reported their accepted epochs while > neither of the followers has. Thus, the connectingFollowers set consists of > two elements, resulting in a size of 2, which is greater than half quorum, > namely, 2. Then QuorumVerifier#containsQuorum will return true, because it > does not check whether the elements of the parameter are participants. > The same flaw exists in > org.apache.zookeeper.server.quorum.Leader#waitForEpochAck -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2959) ignore epoch proposal and ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306144#comment-16306144 ] ASF GitHub Bot commented on ZOOKEEPER-2959: --- GitHub user xyq000 opened a pull request: https://github.com/apache/zookeeper/pull/438 ZOOKEEPER-2959: ignore accepted epoch and ack from observers https://issues.apache.org/jira/browse/ZOOKEEPER-2959 After a round of elections completes, followers and observers send their accepted epochs to the leader to determine a final epoch. Since `QuorumVerifier#containsQuorum(Set set)` does not check whether the elements of argument `set` exactly represent participants, this pull request is intended to ignore reported epochs and acks from observers for logical consistency. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xyq000/zookeeper ZOOKEEPER-2959 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/438.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #438 commit 647061aa7ba1182b83b44b7f2671508012a30b4c Author: Yongqiang XiangDate: 2017-12-29T08:20:06Z ignore accepted epoch and ack from observers > ignore epoch proposal and ack from observers when a newly elected leader > computes new epoch > --- > > Key: ZOOKEEPER-2959 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.4.10, 3.5.3 >Reporter: xiangyq000 > > Once the ZooKeeper cluster finishes the election for new leader, all learners > report their accepted epoch to the leader for the computation of new cluster > epoch. > org.apache.zookeeper.server.quorum.Leader#getEpochToPropose > {code:java} > private final HashSet connectingFollowers = new HashSet(); > public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws > InterruptedException, IOException { > synchronized(connectingFollowers) { > if (!waitingForNewEpoch) { > return epoch; > } > if (lastAcceptedEpoch >= epoch) { > epoch = lastAcceptedEpoch+1; > } > connectingFollowers.add(sid); > QuorumVerifier verifier = self.getQuorumVerifier(); > if (connectingFollowers.contains(self.getId()) && > > verifier.containsQuorum(connectingFollowers)) { > waitingForNewEpoch = false; > self.setAcceptedEpoch(epoch); > connectingFollowers.notifyAll(); > } else { > long start = Time.currentElapsedTime(); > long cur = start; > long end = start + self.getInitLimit()*self.getTickTime(); > while(waitingForNewEpoch && cur < end) { > connectingFollowers.wait(end - cur); > cur = Time.currentElapsedTime(); > } > if (waitingForNewEpoch) { > throw new InterruptedException("Timeout while waiting for > epoch from quorum"); > } > } > return epoch; > } > } > {code} > The computation will get an outcome once : > # The leader has call method "getEpochToPropose" > # The number of all reporters is greater than half of participants. > The problem is, an observer server will also send its accepted epoch to the > leader, while this procedure treat observers as participants. > Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, > and now the leader and the observer have reported their accepted epochs while > neither of the followers has. Thus, the connectingFollowers set consists of > two elements, resulting in a size of 2, which is greater than half quorum, > namely, 2. Then QuorumVerifier#containsQuorum will return true, because it > does not check whether the elements of the parameter are participants. > The same flaw exists in > org.apache.zookeeper.server.quorum.Leader#waitForEpochAck -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] zookeeper pull request #438: ZOOKEEPER-2959: ignore accepted epoch and ack f...
GitHub user xyq000 opened a pull request: https://github.com/apache/zookeeper/pull/438 ZOOKEEPER-2959: ignore accepted epoch and ack from observers https://issues.apache.org/jira/browse/ZOOKEEPER-2959 After a round of elections completes, followers and observers send their accepted epochs to the leader to determine a final epoch. Since `QuorumVerifier#containsQuorum(Set set)` does not check whether the elements of argument `set` exactly represent participants, this pull request is intended to ignore reported epochs and acks from observers for logical consistency. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xyq000/zookeeper ZOOKEEPER-2959 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/438.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #438 commit 647061aa7ba1182b83b44b7f2671508012a30b4c Author: Yongqiang XiangDate: 2017-12-29T08:20:06Z ignore accepted epoch and ack from observers ---
[jira] [Updated] (ZOOKEEPER-2959) ignore epoch proposal and ack from observers when a newly elected leader computes new epoch
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xiangyq000 updated ZOOKEEPER-2959: -- Description: Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch. org.apache.zookeeper.server.quorum.Leader#getEpochToPropose {code:java} private final HashSet connectingFollowers = new HashSet(); public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException { synchronized(connectingFollowers) { if (!waitingForNewEpoch) { return epoch; } if (lastAcceptedEpoch >= epoch) { epoch = lastAcceptedEpoch+1; } connectingFollowers.add(sid); QuorumVerifier verifier = self.getQuorumVerifier(); if (connectingFollowers.contains(self.getId()) && verifier.containsQuorum(connectingFollowers)) { waitingForNewEpoch = false; self.setAcceptedEpoch(epoch); connectingFollowers.notifyAll(); } else { long start = Time.currentElapsedTime(); long cur = start; long end = start + self.getInitLimit()*self.getTickTime(); while(waitingForNewEpoch && cur < end) { connectingFollowers.wait(end - cur); cur = Time.currentElapsedTime(); } if (waitingForNewEpoch) { throw new InterruptedException("Timeout while waiting for epoch from quorum"); } } return epoch; } } {code} The computation will get an outcome once : # The leader has call method "getEpochToPropose" # The number of all reporters is greater than half of participants. The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants. Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then QuorumVerifier#containsQuorum will return true, because it does not check whether the elements of the parameter are participants. The same flaw exists in org.apache.zookeeper.server.quorum.Leader#waitForEpochAck was: Once the ZooKeeper cluster finishes the election for new leader, all learners report their accepted epoch to the leader for the computation of new cluster epoch. org.apache.zookeeper.server.quorum.Leader#getEpochToPropose {code:java} private final HashSet connectingFollowers = new HashSet(); public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws InterruptedException, IOException { synchronized(connectingFollowers) { if (!waitingForNewEpoch) { return epoch; } if (lastAcceptedEpoch >= epoch) { epoch = lastAcceptedEpoch+1; } connectingFollowers.add(sid); QuorumVerifier verifier = self.getQuorumVerifier(); if (connectingFollowers.contains(self.getId()) && verifier.containsQuorum(connectingFollowers)) { waitingForNewEpoch = false; self.setAcceptedEpoch(epoch); connectingFollowers.notifyAll(); } else { long start = Time.currentElapsedTime(); long cur = start; long end = start + self.getInitLimit()*self.getTickTime(); while(waitingForNewEpoch && cur < end) { connectingFollowers.wait(end - cur); cur = Time.currentElapsedTime(); } if (waitingForNewEpoch) { throw new InterruptedException("Timeout while waiting for epoch from quorum"); } } return epoch; } } {code} The computation will get an outcome once : # The leader has call method "getEpochToPropose" # The number of all reporters is greater than half of participants. The problem is, an observer server will also send its accepted epoch to the leader, while this procedure treat observers as participants. Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, and now the leader and the observer have reported their accepted epochs while neither of the followers has. Thus, the connectingFollowers set consists of two elements, resulting in a size of 2, which is greater than half quorum, namely, 2. Then