[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156793172 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } +/** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ +@Test +public void testElectionFraud() throws IOException, InterruptedException { +// capture QuorumPeer logging +Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); +ByteArrayOutputStream os = new ByteArrayOutputStream(); +WriterAppender appender = new WriterAppender(layout, os); +appender.setThreshold(Level.INFO); +Logger qlogger = Logger.getLogger(QuorumPeer.class); +qlogger.addAppender(appender); + +int numServers = 3; + +// used for assertions later +boolean foundLeading = false; +boolean foundLooking = false; +boolean foundFollowing = false; + +try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); + + // find the leader + int trueLeader = -1; + for (int i = 0; i < numServers; i++) { +if (servers.mt[i].main.quorumPeer.leader != null) { + trueLeader = i; +} + } + Assert.assertTrue("There should be a leader", trueLeader >= 0); + + // find a follower + int falseLeader = (trueLeader + 1) % numServers; + Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null); + + // to keep the quorum peer running and force it to go into the looking state, we kill leader election + // and close the connection to the leader + servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown(); + servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close(); + + // wait for the falseLeader to disconnect + waitForOne(servers.zk[falseLeader], States.CONNECTING); + + // convince falseLeader that it is the leader + servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING); + + // provide time for the falseleader to realize no followers have connected + // (this is twice the timeout used in Leader#getEpochToPropose) + Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime); + + // Restart leader election + servers.mt[falseLeader].main.quorumPeer.startLeaderElection(); --- End diff -- I see. Hence the shutdown() call a few lines before. Makes sense. ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156781040 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } +/** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ +@Test +public void testElectionFraud() throws IOException, InterruptedException { +// capture QuorumPeer logging +Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); +ByteArrayOutputStream os = new ByteArrayOutputStream(); +WriterAppender appender = new WriterAppender(layout, os); +appender.setThreshold(Level.INFO); +Logger qlogger = Logger.getLogger(QuorumPeer.class); +qlogger.addAppender(appender); + +int numServers = 3; + +// used for assertions later +boolean foundLeading = false; +boolean foundLooking = false; +boolean foundFollowing = false; + +try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); + + // find the leader + int trueLeader = -1; + for (int i = 0; i < numServers; i++) { +if (servers.mt[i].main.quorumPeer.leader != null) { + trueLeader = i; +} + } + Assert.assertTrue("There should be a leader", trueLeader >= 0); + + // find a follower + int falseLeader = (trueLeader + 1) % numServers; + Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null); + + // to keep the quorum peer running and force it to go into the looking state, we kill leader election + // and close the connection to the leader + servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown(); + servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close(); + + // wait for the falseLeader to disconnect + waitForOne(servers.zk[falseLeader], States.CONNECTING); + + // convince falseLeader that it is the leader + servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING); + + // provide time for the falseleader to realize no followers have connected + // (this is twice the timeout used in Leader#getEpochToPropose) + Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime); + + // Restart leader election + servers.mt[falseLeader].main.quorumPeer.startLeaderElection(); --- End diff -- Stopping and starting leader election is necessary here to prevent a race condition. It is possible that after the server is disconnected from the leader it becomes a follower before the test hits `servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING);` and falseLeader will never try to `lead`, defeating the purpose of the test. ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156780239 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } +/** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ +@Test +public void testElectionFraud() throws IOException, InterruptedException { +// capture QuorumPeer logging +Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); +ByteArrayOutputStream os = new ByteArrayOutputStream(); +WriterAppender appender = new WriterAppender(layout, os); +appender.setThreshold(Level.INFO); +Logger qlogger = Logger.getLogger(QuorumPeer.class); +qlogger.addAppender(appender); + +int numServers = 3; + +// used for assertions later +boolean foundLeading = false; +boolean foundLooking = false; +boolean foundFollowing = false; + +try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); --- End diff -- Fixed, forgot to put that in this branch. ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156658012 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } +/** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ +@Test +public void testElectionFraud() throws IOException, InterruptedException { +// capture QuorumPeer logging +Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); +ByteArrayOutputStream os = new ByteArrayOutputStream(); +WriterAppender appender = new WriterAppender(layout, os); +appender.setThreshold(Level.INFO); +Logger qlogger = Logger.getLogger(QuorumPeer.class); +qlogger.addAppender(appender); + +int numServers = 3; + +// used for assertions later +boolean foundLeading = false; +boolean foundLooking = false; +boolean foundFollowing = false; + +try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); + + // find the leader + int trueLeader = -1; + for (int i = 0; i < numServers; i++) { +if (servers.mt[i].main.quorumPeer.leader != null) { + trueLeader = i; +} + } + Assert.assertTrue("There should be a leader", trueLeader >= 0); + + // find a follower + int falseLeader = (trueLeader + 1) % numServers; + Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null); --- End diff -- Minor: I believe that it's generally a good idea to add a short message to asserts to describe what could be the problem when assertion fails. ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156660196 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } +/** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ +@Test +public void testElectionFraud() throws IOException, InterruptedException { +// capture QuorumPeer logging +Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); +ByteArrayOutputStream os = new ByteArrayOutputStream(); +WriterAppender appender = new WriterAppender(layout, os); +appender.setThreshold(Level.INFO); +Logger qlogger = Logger.getLogger(QuorumPeer.class); +qlogger.addAppender(appender); + +int numServers = 3; + +// used for assertions later +boolean foundLeading = false; +boolean foundLooking = false; +boolean foundFollowing = false; + +try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); + + // find the leader + int trueLeader = -1; + for (int i = 0; i < numServers; i++) { +if (servers.mt[i].main.quorumPeer.leader != null) { + trueLeader = i; +} + } + Assert.assertTrue("There should be a leader", trueLeader >= 0); + + // find a follower + int falseLeader = (trueLeader + 1) % numServers; + Assert.assertTrue(servers.mt[falseLeader].main.quorumPeer.follower != null); + + // to keep the quorum peer running and force it to go into the looking state, we kill leader election + // and close the connection to the leader + servers.mt[falseLeader].main.quorumPeer.electionAlg.shutdown(); + servers.mt[falseLeader].main.quorumPeer.follower.getSocket().close(); + + // wait for the falseLeader to disconnect + waitForOne(servers.zk[falseLeader], States.CONNECTING); + + // convince falseLeader that it is the leader + servers.mt[falseLeader].main.quorumPeer.setPeerState(QuorumPeer.ServerState.LEADING); + + // provide time for the falseleader to realize no followers have connected + // (this is twice the timeout used in Leader#getEpochToPropose) + Thread.sleep(2 * servers.mt[falseLeader].main.quorumPeer.initLimit * servers.mt[falseLeader].main.quorumPeer.tickTime); + + // Restart leader election + servers.mt[falseLeader].main.quorumPeer.startLeaderElection(); --- End diff -- I wonder if this one has to be called explicitly. The peer should automatically realise that it's no longer the leader, stop leading and start leader election (which is basically looking). That's what this test is intended to validate and shouldn't be started explicitly. Correct me if I'm wrong. ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user anmolnar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/432#discussion_r156657072 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java --- @@ -335,6 +336,100 @@ public void testHighestZxidJoinLate() throws Exception { output[0], 2); } +/** + * This test validates that if a quorum member determines that it is leader without the support of the rest of the + * quorum (the other members do not believe it to be the leader) it will stop attempting to lead and become a follower. + * + * @throws IOException + * @throws InterruptedException + */ +@Test +public void testElectionFraud() throws IOException, InterruptedException { +// capture QuorumPeer logging +Layout layout = Logger.getRootLogger().getAppender("CONSOLE").getLayout(); +ByteArrayOutputStream os = new ByteArrayOutputStream(); +WriterAppender appender = new WriterAppender(layout, os); +appender.setThreshold(Level.INFO); +Logger qlogger = Logger.getLogger(QuorumPeer.class); +qlogger.addAppender(appender); + +int numServers = 3; + +// used for assertions later +boolean foundLeading = false; +boolean foundLooking = false; +boolean foundFollowing = false; + +try { + // spin up a quorum, we use a small ticktime to make the test run faster + Servers servers = LaunchServers(numServers, 500); --- End diff -- It'd be useful to utilize the class-wide ```servers``` field which is used to shutdown test servers in the ```tearDown()``` method. For example ```testHighestZxidJoinLate()``` test does it that way. ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
Github user afine closed the pull request at: https://github.com/apache/zookeeper/pull/432 ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
GitHub user afine reopened a pull request: https://github.com/apache/zookeeper/pull/432 [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment You can merge this pull request into a Git repository by running: $ git pull https://github.com/afine/zookeeper ZOOKEEPER-2953 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #432 commit 68a0382093da1d64583211746ac672ed12f5da5c Author: Abraham FineDate: 2017-12-13T00:35:50Z ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment ---
[GitHub] zookeeper pull request #432: [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBefo...
GitHub user afine opened a pull request: https://github.com/apache/zookeeper/pull/432 [WIP] ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment You can merge this pull request into a Git repository by running: $ git pull https://github.com/afine/zookeeper ZOOKEEPER-2953 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #432 commit 68a0382093da1d64583211746ac672ed12f5da5c Author: Abraham FineDate: 2017-12-13T00:35:50Z ZOOKEEPER-2953: Flaky Test: testNoLogBeforeLeaderEstablishment ---