[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166301#comment-16166301 ] Till Rohrmann commented on FLINK-2733: -- I think [~tonycox], you actually did not use the fixed code when observing the test failure. Will close the issue therefore. > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335976#comment-15335976 ] ASF GitHub Bot commented on FLINK-2733: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2103 > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335970#comment-15335970 ] ASF GitHub Bot commented on FLINK-2733: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2103 Ran 10 Travis builds and couldn't reproduce the ZooKeeperLeaderElectionTest failure. Thus, I assume that this PR fixes/hardens the test case. Will merge it now. > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331478#comment-15331478 ] Till Rohrmann commented on FLINK-2733: -- I think my analysis was not correct. I checked and of course Curator will try to reconnect to ZooKeeper if the connection is lost. The problem was the following (hopefully I got it right this time ;-) The test uses a leader retrieval service to find out which of the current leader contenders is the leader. Assume contender 0 is granted the leadership and the leader retrieval service is informed about it. Shortly afterwards, the contender 0 is revoked the leadership but the retrieval service is not yet informed about it. The test now compares the leader session id of the retrieval service with the contender whose address was returned. Since the leadership was revoked, the leader session id returned by the contender is null. That's the problem. I modified the test to allow false positive leaders being returned from the leader retrieval service. > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331453#comment-15331453 ] ASF GitHub Bot commented on FLINK-2733: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/2103 [FLINK-2733] Harden ZooKeeperLeaderElectionTest Hardens ZooKeeperElectionTest by allowing the testing listener to return out-dated leader information. This can happen if the ZooKeeper connection was suspended and the new leader information has not been sent to the testing listener. In this case, the testing listener will be queried again to return the actual leader information. Add debug statements to ZooKeeperLeaderElectionTest.testZooKeeperReelection You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixZooKeeperLeaderElectionTest Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2103.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2103 commit 37127ed6ad198d010c81b4c725c2dd14a8b11872 Author: Till RohrmannDate: 2016-06-06T15:18:59Z [FLINK-2733] [tests] Harden ZooKeeperLeaderElectionTest Hardens ZooKeeperElectionTest by allowing the testing listener to return out-dated leader information. This can happen if the ZooKeeper connection was suspended and the new leader information has not been sent to the testing listener. In this case, the testing listener will be queried again to return the actual leader information. Add debug statements to ZooKeeperLeaderElectionTest.testZooKeeperReelection > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316636#comment-15316636 ] Till Rohrmann commented on FLINK-2733: -- The problem seems to be a connection loss to the {{ZooKeeper}} testing server. Two curator clients lose their connection to the testing server. I suspect that the testing listener is one of the affected components. As a consequence the testing listener is no longer notified about the changing leader election and the test fails. I assume that this has something to do with the Travis instances and the resouce consumption, since I couldn't reproduce the problem locally. I propose to decrease the number of concurrently connected instances and to increase the connection timeout in order to harden the test case. > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2733) ZooKeeperLeaderElectionTest.testZooKeeperReelection fails
[ https://issues.apache.org/jira/browse/FLINK-2733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316557#comment-15316557 ] Till Rohrmann commented on FLINK-2733: -- Another instance: https://s3.amazonaws.com/archive.travis-ci.org/jobs/134215896/log.txt > ZooKeeperLeaderElectionTest.testZooKeeperReelection fails > - > > Key: FLINK-2733 > URL: https://issues.apache.org/jira/browse/FLINK-2733 > Project: Flink > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Robert Metzger >Assignee: Till Rohrmann > Labels: test-stability > > I observed a test failure in this run: > https://travis-ci.org/rmetzger/flink/jobs/81571914 > {code} > testZooKeeperReelection(org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest) > Time elapsed: 109.794 sec <<< FAILURE! > java.lang.AssertionError: expected: but > was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionTest.testZooKeeperReelection(ZooKeeperLeaderElectionTest.java:171) > Results : > Failed tests: > ZooKeeperLeaderElectionTest.testZooKeeperReelection:171 > expected: but was: > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)