[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474807#comment-16474807
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1684//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1684//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1684//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469973#comment-16469973
 ] 

Hudson commented on ZOOKEEPER-2959:
---

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #18 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/18/])
ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers 
(ashraer: rev 088dfdf188663f6bad79b0e87b710737b318537d)
* (edit) src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
* (edit) src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/Leader.java
* (add) src/java/test/org/apache/zookeeper/server/quorum/ZabUtils.java
* (add) 
src/java/test/org/apache/zookeeper/server/quorum/LeaderWithObserverTest.java


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-09 Thread Bogdan Kanivets (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16469516#comment-16469516
 ] 

Bogdan Kanivets commented on ZOOKEEPER-2959:


I think this is ready to merge. There are 3 PRs for 3.4, 3.5 and master.

Steps to reproduce the bug:

Start with 3 servers. Config:

 
{code:java}
clientPort=2181
leaderServes=yes
server.1=:2888:3888
server.2=:2888:3888
server.3=:2888:3888:observer
{code}
 

On server.2 block follower port from server.1 to server.2:
{code:java}
sudo iptables -A INPUT -s  -p tcp --destination-port 2888 -j 
DROP{code}
Start server.1, server.2 and server.3
Wait for server.2 to declare itself a leader and then fail in 
waitForNewLeaderAck

 
{code:java}
2018-04-16 20:56:25,990 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@361] - LEADING - LEADER 
ELECTION TOOK - 3903
2018-04-16 20:56:27,275 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@329] - Follower sid: 3 : 
info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@136ca5bc
2018-04-16 20:56:27,281 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@384] - Synchronizing with 
Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
2018-04-16 20:56:27,281 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@393] - leader and follower 
are in sync, zxid=0x0
2018-04-16 20:56:27,282 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@458] - Sending DIFF
2018-04-16 20:56:27,291 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@518] - Received 
NEWLEADER-ACK message from 3
2018-04-16 20:56:47,283 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@502] - Shutting down
2018-04-16 20:56:47,284 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@508] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Waiting for a quorum of 
followers, only synced with sids: [ 2 ]
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:508)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:406)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:859){code}

On server.2 check that currentEpoch is incremented in currentEpoch file. This 
is the bug. Epoch is incremented in getEpochToPropose because server.3 is 
counted in connectingFollowers.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 

[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467905#comment-16467905
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16459482#comment-16459482
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1658//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1658//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1658//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457266#comment-16457266
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1646//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1646//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1646//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447661#comment-16447661
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1619//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1619//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1619//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446678#comment-16446678
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1610//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1610//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1610//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445066#comment-16445066
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1605//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1605//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1605//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445067#comment-16445067
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1606//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1606//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1606//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445063#comment-16445063
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

GitHub user lavacat opened a pull request:

https://github.com/apache/zookeeper/pull/503

ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers

https://issues.apache.org/jira/browse/ZOOKEEPER-2959
- added getVotingMembers check for id in getEpochToPropose and 
waitForEpochAck
- removed unused learnerType param in waitForNewLeaderAck
- unit tests
- refactored common test helpers into ZabUtils

credit: Xiang Yongqiang (https://github.com/xyq000) for original PR and 
reporting the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lavacat/zookeeper branch-3.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/503.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #503


commit d7181d65f66adcfc0fecda2670580e2d2b8ddccb
Author: Bogdan Kanivets 
Date:   2018-04-20T00:02:59Z

ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers

https://issues.apache.org/jira/browse/ZOOKEEPER-2959
- added getVotingMembers check for id in getEpochToPropose and 
waitForEpochAck
- removed unused learnerType param in waitForNewLeaderAck
- unit tests
- refactored common test helpers into ZabUtils

credit: Xiang Yongqiang (https://github.com/xyq000) for original PR and 
reporting the issue




> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445056#comment-16445056
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r182922147
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -900,9 +902,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 return epoch;
 }
 }
-
-private HashSet electingFollowers = new HashSet();
-private boolean electionFinished = false;
+// VisibleForTesting
+protected HashSet electingFollowers = new HashSet();
--- End diff --

updated


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444884#comment-16444884
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1604//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1604//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1604//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444826#comment-16444826
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r182889043
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -900,9 +902,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 return epoch;
 }
 }
-
-private HashSet electingFollowers = new HashSet();
-private boolean electionFinished = false;
+// VisibleForTesting
+protected HashSet electingFollowers = new HashSet();
--- End diff --

Can't use Set, because QuorumVerifier uses HashSet param. 
QuorumVerifier.containsQuorum(HashSet set);

I can refactor it all, but then I'll need to touch QuorumVerifier.java, 
QuorumMaj.java and QuorumHierarchical.java


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444818#comment-16444818
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r182887771
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/ZabUtils.java ---
@@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.quorum;
+
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZKDatabase;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.apache.zookeeper.server.quorum.flexible.QuorumMaj;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.net.InetSocketAddress;
+import java.util.HashMap;
+
+public class ZabUtils {
+public static final int SYNC_LIMIT = 2;
+
+public static QuorumPeer createQuorumPeer(File tmpDir) throws 
IOException{
+QuorumPeer peer = new QuorumPeer();
+peer.syncLimit = 2;
+peer.initLimit = 2;
+peer.tickTime = 2000;
+peer.quorumPeers = new HashMap();
+peer.quorumPeers.put(0L, new QuorumPeer.QuorumServer(0, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.quorumPeers.put(1L, new QuorumPeer.QuorumServer(1, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.quorumPeers.put(2L, new QuorumPeer.QuorumServer(2, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.setQuorumVerifier(new QuorumMaj(peer.quorumPeers.size()));
+peer.setCnxnFactory(new NullServerCnxnFactory());
+File version2 = new File(tmpDir, "version-2");
+version2.mkdir();
+FileOutputStream fos;
+fos = new FileOutputStream(new File(version2, "currentEpoch"));
+fos.write("0\n".getBytes());
+fos.close();
+fos = new FileOutputStream(new File(version2, "acceptedEpoch"));
+fos.write("0\n".getBytes());
+fos.close();
+return peer;
+}
+
+public static Leader createLeader(File tmpDir, QuorumPeer peer)
+throws IOException, NoSuchFieldException, 
IllegalAccessException{
+LeaderZooKeeperServer zk = prepareLeader(tmpDir, peer);
+return new Leader(peer, zk);
+}
+
+public static MockLeader createMockLeader(File tmpDir, QuorumPeer peer)
+throws IOException, NoSuchFieldException, 
IllegalAccessException{
+LeaderZooKeeperServer zk = prepareLeader(tmpDir, peer);
+return new MockLeader(peer, zk);
+}
+
+private static LeaderZooKeeperServer prepareLeader(File tmpDir, 
QuorumPeer peer)
+throws IOException, NoSuchFieldException, 
IllegalAccessException {
+FileTxnSnapLog logFactory = new FileTxnSnapLog(tmpDir, tmpDir);
+peer.setTxnFactory(logFactory);
+Field addrField = peer.getClass().getDeclaredField("myQuorumAddr");
+addrField.setAccessible(true);
+addrField.set(peer, new 
InetSocketAddress(PortAssignment.unique()));
+ZKDatabase zkDb = new ZKDatabase(logFactory);
+return new LeaderZooKeeperServer(logFactory, peer, new 
ZooKeeperServer.BasicDataTreeBuilder(), zkDb);
+}
+
+private static final class NullServerCnxnFactory extends 
ServerCnxnFactory {
+public void startup(ZooKeeperServer zkServer) throws IOException,
+InterruptedException {
+}
+public void start() {
+}
+public void 

[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441297#comment-16441297
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user edwardoliveira commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r182180406
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -161,8 +127,8 @@ public void testLeaderInConnectingFollowers() throws 
Exception {
 tmpDir.mkdir();
 Leader leader = null;
 try {
-QuorumPeer peer = createQuorumPeer(tmpDir);
-leader = createLeader(tmpDir, peer);
+QuorumPeer peer = ZabUtils.createQuorumPeer(tmpDir);
--- End diff --

Yup, you right. Sorry about that. :)


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16441069#comment-16441069
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r182128913
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -161,8 +127,8 @@ public void testLeaderInConnectingFollowers() throws 
Exception {
 tmpDir.mkdir();
 Leader leader = null;
 try {
-QuorumPeer peer = createQuorumPeer(tmpDir);
-leader = createLeader(tmpDir, peer);
+QuorumPeer peer = ZabUtils.createQuorumPeer(tmpDir);
--- End diff --

Agreed, but please don't use asterisk (*) import. We avoid wildcard imports 
in Zk project.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438512#comment-16438512
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181267252
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -161,8 +127,8 @@ public void testLeaderInConnectingFollowers() throws 
Exception {
 tmpDir.mkdir();
 Leader leader = null;
 try {
-QuorumPeer peer = createQuorumPeer(tmpDir);
-leader = createLeader(tmpDir, peer);
+QuorumPeer peer = ZabUtils.createQuorumPeer(tmpDir);
--- End diff --

`import static org.apache.zookeeper.server.quorum. ZabUtil.*` then you can 
simplify method invocation by using `createQuorumPeer(tmpDir);` instead of 
`ZabUtils.createQuorumPeer(tmpDir);`


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438514#comment-16438514
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181267836
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -868,8 +868,8 @@ synchronized public long startForwarding(LearnerHandler 
handler,
 
 return lastProposed;
 }
-
-private HashSet connectingFollowers = new HashSet();
+// VisibleForTesting
+protected HashSet connectingFollowers = new HashSet();
--- End diff --

`protected Set connectingFollowers = new HashSet<>();`




> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438516#comment-16438516
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181267704
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -900,9 +902,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 return epoch;
 }
 }
-
-private HashSet electingFollowers = new HashSet();
-private boolean electionFinished = false;
+// VisibleForTesting
+protected HashSet electingFollowers = new HashSet();
--- End diff --

`protected Set electingFollowers = new HashSet<>()`


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438513#comment-16438513
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181268803
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/ZabUtils.java ---
@@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.quorum;
+
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZKDatabase;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.apache.zookeeper.server.quorum.flexible.QuorumMaj;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.net.InetSocketAddress;
+import java.util.HashMap;
+
+public class ZabUtils {
+public static final int SYNC_LIMIT = 2;
+
+public static QuorumPeer createQuorumPeer(File tmpDir) throws 
IOException{
+QuorumPeer peer = new QuorumPeer();
+peer.syncLimit = 2;
+peer.initLimit = 2;
+peer.tickTime = 2000;
+peer.quorumPeers = new HashMap();
+peer.quorumPeers.put(0L, new QuorumPeer.QuorumServer(0, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.quorumPeers.put(1L, new QuorumPeer.QuorumServer(1, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.quorumPeers.put(2L, new QuorumPeer.QuorumServer(2, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.setQuorumVerifier(new QuorumMaj(peer.quorumPeers.size()));
+peer.setCnxnFactory(new NullServerCnxnFactory());
+File version2 = new File(tmpDir, "version-2");
+version2.mkdir();
+FileOutputStream fos;
+fos = new FileOutputStream(new File(version2, "currentEpoch"));
--- End diff --

Could join lines 52 and 53.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + 

[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438515#comment-16438515
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181564184
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -916,7 +919,9 @@ public void waitForEpochAck(long id, StateSummary ss) 
throws IOException, Interr
 + 
leaderStateSummary.getLastZxid()
 + " (last zxid)");
 }
-electingFollowers.add(id);
+if (self.getVotingView().containsKey(id)) {
--- End diff --

I would suggest to encapsulate the `self.getVotingView().containsKey(id)` 
into a private method as below, if nothing else, for the sake of readability

```
private boolean isParticipant(long sid) {
   return self.getVotingView().containsKey(id);
}
```


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438511#comment-16438511
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181564082
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/ZabUtils.java ---
@@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.quorum;
+
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZKDatabase;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.apache.zookeeper.server.quorum.flexible.QuorumMaj;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.net.InetSocketAddress;
+import java.util.HashMap;
+
+public class ZabUtils {
+public static final int SYNC_LIMIT = 2;
+
+public static QuorumPeer createQuorumPeer(File tmpDir) throws 
IOException{
+QuorumPeer peer = new QuorumPeer();
+peer.syncLimit = 2;
+peer.initLimit = 2;
+peer.tickTime = 2000;
+peer.quorumPeers = new HashMap();
+peer.quorumPeers.put(0L, new QuorumPeer.QuorumServer(0, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.quorumPeers.put(1L, new QuorumPeer.QuorumServer(1, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.quorumPeers.put(2L, new QuorumPeer.QuorumServer(2, 
"127.0.0.1", PortAssignment.unique(), 0, null));
+peer.setQuorumVerifier(new QuorumMaj(peer.quorumPeers.size()));
+peer.setCnxnFactory(new NullServerCnxnFactory());
+File version2 = new File(tmpDir, "version-2");
+version2.mkdir();
+FileOutputStream fos;
+fos = new FileOutputStream(new File(version2, "currentEpoch"));
+fos.write("0\n".getBytes());
+fos.close();
+fos = new FileOutputStream(new File(version2, "acceptedEpoch"));
+fos.write("0\n".getBytes());
+fos.close();
+return peer;
+}
+
+public static Leader createLeader(File tmpDir, QuorumPeer peer)
+throws IOException, NoSuchFieldException, 
IllegalAccessException{
+LeaderZooKeeperServer zk = prepareLeader(tmpDir, peer);
+return new Leader(peer, zk);
+}
+
+public static MockLeader createMockLeader(File tmpDir, QuorumPeer peer)
+throws IOException, NoSuchFieldException, 
IllegalAccessException{
+LeaderZooKeeperServer zk = prepareLeader(tmpDir, peer);
+return new MockLeader(peer, zk);
+}
+
+private static LeaderZooKeeperServer prepareLeader(File tmpDir, 
QuorumPeer peer)
+throws IOException, NoSuchFieldException, 
IllegalAccessException {
+FileTxnSnapLog logFactory = new FileTxnSnapLog(tmpDir, tmpDir);
+peer.setTxnFactory(logFactory);
+Field addrField = peer.getClass().getDeclaredField("myQuorumAddr");
+addrField.setAccessible(true);
+addrField.set(peer, new 
InetSocketAddress(PortAssignment.unique()));
+ZKDatabase zkDb = new ZKDatabase(logFactory);
+return new LeaderZooKeeperServer(logFactory, peer, new 
ZooKeeperServer.BasicDataTreeBuilder(), zkDb);
+}
+
+private static final class NullServerCnxnFactory extends 
ServerCnxnFactory {
+public void startup(ZooKeeperServer zkServer) throws IOException,
+InterruptedException {
+}
+public void start() {
+}
+public void 

[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438510#comment-16438510
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181266640
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/ZabUtils.java ---
@@ -0,0 +1,140 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.quorum;
+
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZKDatabase;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.apache.zookeeper.server.persistence.FileTxnSnapLog;
+import org.apache.zookeeper.server.quorum.flexible.QuorumMaj;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.net.InetSocketAddress;
+import java.util.HashMap;
+
+public class ZabUtils {
+public static final int SYNC_LIMIT = 2;
+
--- End diff --

If this is a helper class that doesn't require instantiation then create a 
private constructor: this makes it "final" and prevents instantiation.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the 

[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436644#comment-16436644
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on the issue:

https://github.com/apache/zookeeper/pull/500
  
@anmolnar added ZabUtils


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435177#comment-16435177
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/500
  
@lavacat I think either moving these methods/classes to a base class or 
creating a separate `ZabUtils` makes sense in this case to get cleaner code.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435154#comment-16435154
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on the issue:

https://github.com/apache/zookeeper/pull/500
  
Moved these 3 new tests into new class - LeaderWithObserverTest. Had to 
make createLeader and createQuorumPeer 'public static' in Zab1_0Test. Happy to 
refactor into common base class


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435153#comment-16435153
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181002193
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception {
 recursiveDelete(tmpDir);
 }
 }
+
+@Test
+public void testGetEpochToProposeWithObserver() throws Exception {
+File tmpDir = File.createTempFile("test", "dir", testData);
--- End diff --

Refactored


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435149#comment-16435149
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user lavacat commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r181001854
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception {
 recursiveDelete(tmpDir);
 }
 }
+
+@Test
+public void testGetEpochToProposeWithObserver() throws Exception {
+File tmpDir = File.createTempFile("test", "dir", testData);
+tmpDir.delete();
+tmpDir.mkdir();
+Leader leader = null;
+try {
+QuorumPeer peer = createQuorumPeer(tmpDir);
+long participantId = 1;
+long observerId = peer.quorumPeers.size();
+peer.quorumPeers.put(observerId, new QuorumServer(observerId, 
"0.0.0.0", 33225,
--- End diff --

Do you mean using PortAssignment.unique() and "127.0.0.1"? Changed it.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434923#comment-16434923
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/500
  
Given that this change affects leader election I think it'd be very 
beneficial if @fpj could take a look by any chance.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434910#comment-16434910
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r180793047
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception {
 recursiveDelete(tmpDir);
 }
 }
+
+@Test
+public void testGetEpochToProposeWithObserver() throws Exception {
+File tmpDir = File.createTempFile("test", "dir", testData);
+tmpDir.delete();
+tmpDir.mkdir();
+Leader leader = null;
+try {
+QuorumPeer peer = createQuorumPeer(tmpDir);
+long participantId = 1;
+long observerId = peer.quorumPeers.size();
+peer.quorumPeers.put(observerId, new QuorumServer(observerId, 
"0.0.0.0", 33225,
--- End diff --

I think to be consistent with `createQuorumPeer()` method this should be 
something like:
```
peers.put(observerId, new QuorumServer(observerId, new 
InetSocketAddress("127.0.0.1", PortAssignment.unique()), 
   new InetSocketAddress("127.0.0.1", PortAssignment.unique()),
   new InetSocketAddress("127.0.0.1", PortAssignment.unique()),
   QuorumPeer.LearnerType.OBSERVER));
```



> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434911#comment-16434911
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/500#discussion_r180789703
  
--- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java 
---
@@ -245,6 +245,180 @@ public void testLastAcceptedEpoch() throws Exception {
 recursiveDelete(tmpDir);
 }
 }
+
+@Test
+public void testGetEpochToProposeWithObserver() throws Exception {
+File tmpDir = File.createTempFile("test", "dir", testData);
--- End diff --

Have you considered using ClientBase.createEmptyTestDir() instead?


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429801#comment-16429801
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user shralex commented on the issue:

https://github.com/apache/zookeeper/pull/500
  
I'm +1. Thanks Bogdan for making the PR.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429690#comment-16429690
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

GitHub user lavacat opened a pull request:

https://github.com/apache/zookeeper/pull/500

ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers

https://issues.apache.org/jira/browse/ZOOKEEPER-2959
- add getVotingView check for id in getEpochToPropose and waitForEpochAck
- refactor waitForNewLeaderAck to use getVotingView
- unit tests

credit: Xiang Yongqiang (https://github.com/xyq000) for original PR and 
reporting the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lavacat/zookeeper branch-3.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #500


commit 98c40dac60951c61b3b1922d0038461d81b843a1
Author: Bogdan Kanivets 
Date:   2018-04-08T08:46:37Z

ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers

https://issues.apache.org/jira/browse/ZOOKEEPER-2959
- add getVotingView check for id in getEpochToPropose and waitForEpochAck
- refactor waitForNewLeaderAck to use getVotingView
- unit tests

credit: Xiang Yongqiang (https://github.com/xyq000) for original PR and 
reporting the issue




> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427413#comment-16427413
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/438#discussion_r179558826
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -1183,8 +1183,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 if (lastAcceptedEpoch >= epoch) {
 epoch = lastAcceptedEpoch+1;
 }
-connectingFollowers.add(sid);
 QuorumVerifier verifier = self.getQuorumVerifier();
+if(verifier.getVotingMembers().containsKey(sid)) {
--- End diff --

+1 makes sense.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426533#comment-16426533
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user shralex commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/438#discussion_r179363930
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -1183,8 +1183,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 if (lastAcceptedEpoch >= epoch) {
 epoch = lastAcceptedEpoch+1;
 }
-connectingFollowers.add(sid);
 QuorumVerifier verifier = self.getQuorumVerifier();
+if(verifier.getVotingMembers().containsKey(sid)) {
--- End diff --

If I recall correctly, the reason this wasn't done are concerns around the 
impact on performance - containsQuorum is called every time an ACK is received 
for every operation proposal. So if you need 3 asks to commit an operation, 
we'll be doing these checks (figuring out who's a participant and who's not} 
for {ACK1}, for {ACK1, ACK2} and for {ACK1, ACK2, ACK3}. This compared to 
comparing two ints as it stands now. So this is why it wasn't done...


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Priority: Major
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-01-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16318513#comment-16318513
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/438#discussion_r160422756
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -1183,8 +1183,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 if (lastAcceptedEpoch >= epoch) {
 epoch = lastAcceptedEpoch+1;
 }
-connectingFollowers.add(sid);
 QuorumVerifier verifier = self.getQuorumVerifier();
+if(verifier.getVotingMembers().containsKey(sid)) {
--- End diff --

+1


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317361#comment-16317361
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user afine commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/438#discussion_r160286100
  
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Leader.java ---
@@ -1183,8 +1183,10 @@ public long getEpochToPropose(long sid, long 
lastAcceptedEpoch) throws Interrupt
 if (lastAcceptedEpoch >= epoch) {
 epoch = lastAcceptedEpoch+1;
 }
-connectingFollowers.add(sid);
 QuorumVerifier verifier = self.getQuorumVerifier();
+if(verifier.getVotingMembers().containsKey(sid)) {
--- End diff --

I'm wondering if this logic is best suited for the `QuorumVerifier`. In 
other words, the quorum verifier should be able to determine if a quorum is 
present from a set of ids while taking into account which sids represent voting 
members.


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316906#comment-16316906
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2959:
---

Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/438
  
Hi @xyq000 
Thanks for the contribution. I think fixing this issues makes sense, would 
you please add at least one unit test to reproduce the problem?


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2017-12-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306154#comment-16306154
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1391//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)