Re: Review Request 54288: Make leader elections resilient to ZK disconnections.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54288/#review158815 --- Master (c8e8953) is red with this patch. ./build-support/jenkins/build.sh at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:120) at org.gradle.launcher.daemon.server.exec.EstablishBuildEnvironment.doBuild(EstablishBuildEnvironment.java:72) at org.gradle.launcher.daemon.server.exec.BuildCommandOnly.execute(BuildCommandOnly.java:36) at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:120) at org.gradle.launcher.daemon.server.exec.HintGCAfterBuild.execute(HintGCAfterBuild.java:44) at org.gradle.launcher.daemon.server.api.DaemonCommandExecution.proceed(DaemonCommandExecution.java:120) at org.gradle.launcher.daemon.server.exec.StartBuildOrRespondWithBusy$1.run(StartBuildOrRespondWithBusy.java:50) at org.gradle.launcher.daemon.server.DaemonStateCoordinator$1.run(DaemonStateCoordinator.java:293) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54) at org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) :pmdTest :test org.apache.aurora.scheduler.discovery.CuratorSingletonServiceTest > testZKDisconnection FAILED java.lang.AssertionError at CuratorSingletonServiceTest.java:233 I1212 01:48:33.031 [ShutdownHook, SchedulerMain] Stopping scheduler services. 1068 tests completed, 1 failed, 2 skipped :test FAILED :jacocoTestReport Coverage report generated: file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/jacoco/test/html/index.html :analyzeReport Instruction coverage is 0.8877981722428748, but must be greater than 0.89 Branch coverage is 0.8013562026326286, but must be greater than 0.835 FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':test'. > There were failing tests. See the report at: > file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/tests/test/index.html * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Total time: 3 mins 46.037 secs I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On Dec. 12, 2016, 12:43 a.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54288/ > --- > > (Updated Dec. 12, 2016, 12:43 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and > Stephan Erb. > > > Bugs: AURORA-1669 > https://issues.apache.org/jira/browse/AURORA-1669 > > > Repository: aurora > > > Description > --- > > As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates > leadership if the ZK connection is lost or if there is a timeout. This is not > compatible with the commons based implementation which would only abdicate > leadership if the ZK session timeout occurred. > > This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a > custom listener that only loses leadership if a connection loss occurs. > > > Diffs > - > > > commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java > 50acaeba82e163f8f2970a264cbd889c9eb3b5ed > > src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java > c378172c850aafe0a9381552b5067277b40dbfab > > src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java > a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583 > > src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java > 6ea49b0c690d288ff59d1d4798144bfa2d153d3a > > Diff: https://reviews.apache.org/r/54288/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 54624: Expose stats on ZooKeeper connection state
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54624/#review158814 --- Master (c8e8953) is green with this patch. ./build-support/jenkins/build.sh However, it appears that it might lack test coverage. I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On Dec. 12, 2016, 1:32 a.m., Jing Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54624/ > --- > > (Updated Dec. 12, 2016, 1:32 a.m.) > > > Review request for Aurora, Joshua Cohen, Mehrdad Nurolahzade, Stephan Erb, > and Zameer Manji. > > > Bugs: AURORA-1838 > https://issues.apache.org/jira/browse/AURORA-1838 > > > Repository: aurora > > > Description > --- > > Expose stats on ZooKeeper connection state > > > Diffs > - > > > src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java > 999a542796858dcfe9e31601c47239189043fd52 > > Diff: https://reviews.apache.org/r/54624/diff/ > > > Testing > --- > > http://192.168.33.7:8081/vars > ``` > zk_connection_state_connected 1 > zk_connection_state_lost 0 > zk_connection_state_readonly 0 > zk_connection_state_reconnected 0 > zk_connection_state_suspended 0 > zk_connection_state_unknown 0 > ``` > > > Thanks, > > Jing Chen > >
Re: Review Request 54624: Expose stats on ZooKeeper connection state
> On Dec. 10, 2016, 8:14 p.m., Joshua Cohen wrote: > > I think the idea behind the ticket was to track these as stats, not just > > into the log. btw, I am not pretty sure if `AtomicLong`s here should be `static` or not - Jing --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54624/#review158773 --- On Dec. 12, 2016, 1:32 a.m., Jing Chen wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54624/ > --- > > (Updated Dec. 12, 2016, 1:32 a.m.) > > > Review request for Aurora, Joshua Cohen, Mehrdad Nurolahzade, Stephan Erb, > and Zameer Manji. > > > Bugs: AURORA-1838 > https://issues.apache.org/jira/browse/AURORA-1838 > > > Repository: aurora > > > Description > --- > > Expose stats on ZooKeeper connection state > > > Diffs > - > > > src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java > 999a542796858dcfe9e31601c47239189043fd52 > > Diff: https://reviews.apache.org/r/54624/diff/ > > > Testing > --- > > http://192.168.33.7:8081/vars > ``` > zk_connection_state_connected 1 > zk_connection_state_lost 0 > zk_connection_state_readonly 0 > zk_connection_state_reconnected 0 > zk_connection_state_suspended 0 > zk_connection_state_unknown 0 > ``` > > > Thanks, > > Jing Chen > >
Re: Review Request 54624: Expose stats on ZooKeeper connection state
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54624/ --- (Updated Dec. 12, 2016, 1:32 a.m.) Review request for Aurora, Joshua Cohen, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji. Bugs: AURORA-1838 https://issues.apache.org/jira/browse/AURORA-1838 Repository: aurora Description --- Expose stats on ZooKeeper connection state Diffs - src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java 999a542796858dcfe9e31601c47239189043fd52 Diff: https://reviews.apache.org/r/54624/diff/ Testing (updated) --- http://192.168.33.7:8081/vars ``` zk_connection_state_connected 1 zk_connection_state_lost 0 zk_connection_state_readonly 0 zk_connection_state_reconnected 0 zk_connection_state_suspended 0 zk_connection_state_unknown 0 ``` Thanks, Jing Chen
Re: Review Request 54624: Expose stats on ZooKeeper connection state
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54624/ --- (Updated Dec. 12, 2016, 1:28 a.m.) Review request for Aurora, Mehrdad Nurolahzade, Stephan Erb, and Zameer Manji. Changes --- expose stats on connection state changes Bugs: AURORA-1838 https://issues.apache.org/jira/browse/AURORA-1838 Repository: aurora Description --- Expose stats on ZooKeeper connection state Diffs (updated) - src/main/java/org/apache/aurora/scheduler/discovery/CuratorServiceDiscoveryModule.java 999a542796858dcfe9e31601c47239189043fd52 Diff: https://reviews.apache.org/r/54624/diff/ Testing --- State Change is logged into aurora-scheduler.log: ``` I1210 10:44:37.973 [main-EventThread, ConnectionStateManager] State change: CONNECTED ``` Thanks, Jing Chen
Re: Review Request 54288: Make leader elections resilient to ZK disconnections.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54288/#review158812 --- @ReviewBot retry - Zameer Manji On Dec. 11, 2016, 4:43 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54288/ > --- > > (Updated Dec. 11, 2016, 4:43 p.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and > Stephan Erb. > > > Bugs: AURORA-1669 > https://issues.apache.org/jira/browse/AURORA-1669 > > > Repository: aurora > > > Description > --- > > As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates > leadership if the ZK connection is lost or if there is a timeout. This is not > compatible with the commons based implementation which would only abdicate > leadership if the ZK session timeout occurred. > > This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a > custom listener that only loses leadership if a connection loss occurs. > > > Diffs > - > > > commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java > 50acaeba82e163f8f2970a264cbd889c9eb3b5ed > > src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java > c378172c850aafe0a9381552b5067277b40dbfab > > src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java > a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583 > > src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java > 6ea49b0c690d288ff59d1d4798144bfa2d153d3a > > Diff: https://reviews.apache.org/r/54288/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 54288: Make leader elections resilient to ZK disconnections.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54288/#review158811 --- Ship it! Master (c8e8953) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On Dec. 12, 2016, 12:43 a.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54288/ > --- > > (Updated Dec. 12, 2016, 12:43 a.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and > Stephan Erb. > > > Bugs: AURORA-1669 > https://issues.apache.org/jira/browse/AURORA-1669 > > > Repository: aurora > > > Description > --- > > As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates > leadership if the ZK connection is lost or if there is a timeout. This is not > compatible with the commons based implementation which would only abdicate > leadership if the ZK session timeout occurred. > > This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a > custom listener that only loses leadership if a connection loss occurs. > > > Diffs > - > > > commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java > 50acaeba82e163f8f2970a264cbd889c9eb3b5ed > > src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java > c378172c850aafe0a9381552b5067277b40dbfab > > src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java > a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583 > > src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java > 6ea49b0c690d288ff59d1d4798144bfa2d153d3a > > Diff: https://reviews.apache.org/r/54288/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >
Re: Review Request 54288: Make leader elections resilient to ZK disconnections.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54288/ --- (Updated Dec. 11, 2016, 4:43 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and Stephan Erb. Changes --- Fix more possible timing issues. Bugs: AURORA-1669 https://issues.apache.org/jira/browse/AURORA-1669 Repository: aurora Description --- As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates leadership if the ZK connection is lost or if there is a timeout. This is not compatible with the commons based implementation which would only abdicate leadership if the ZK session timeout occurred. This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a custom listener that only loses leadership if a connection loss occurs. Diffs (updated) - commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java 50acaeba82e163f8f2970a264cbd889c9eb3b5ed src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java c378172c850aafe0a9381552b5067277b40dbfab src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583 src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java 6ea49b0c690d288ff59d1d4798144bfa2d153d3a Diff: https://reviews.apache.org/r/54288/diff/ Testing --- Thanks, Zameer Manji
Re: Review Request 54288: Make leader elections resilient to ZK disconnections.
> On Dec. 8, 2016, 3:42 p.m., Zameer Manji wrote: > > Does anyone know how to get the test reports from jenkins or have an idea > > of what's going on? > > John Sirois wrote: > Yes, these are legit failures, no Jenkins logs needed, just this is > enough: > ``` > org.apache.aurora.scheduler.discovery.CuratorSingletonServiceTest > > testAbdicateTransition FAILED > java.lang.AssertionError at CuratorSingletonServiceTest.java:125 > java.lang.AssertionError > > org.apache.aurora.scheduler.discovery.CuratorSingletonServiceTest > > testLeadAdvertise FAILED > java.lang.AssertionError at CuratorSingletonServiceTest.java:94 > java.lang.AssertionError > ``` > > I looked at the testLeadAdvertise one 1st which [blocks until an > ephemeral node is > added](https://github.com/apache/aurora/blob/master/src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java#L82-L83) > to the leadership group path as the signal that leadership has been already > synchronously obtained. The assumption is broken with LeaderSelector since > the transition is [fired on an executor > runnable](https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/leader/LeaderSelector.java#L239) > and could be delayed arbitrarily past the test assertion. > > In short, you're getting lucky having these 2 (and maybe more) pass on > your machine. Needs careful re-review of the test infra and the new behavior > of the LeaderSelector using the thread pool to do leadership transitions > async. Good catch. The tests already have an `awaitCapture` and since I am adding a timeout, we can use that to ensure the listeners are fired instead of plain asserts. - Zameer --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/54288/#review158601 --- On Dec. 8, 2016, 3:28 p.m., Zameer Manji wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/54288/ > --- > > (Updated Dec. 8, 2016, 3:28 p.m.) > > > Review request for Aurora, David McLaughlin, Joshua Cohen, John Sirois, and > Stephan Erb. > > > Bugs: AURORA-1669 > https://issues.apache.org/jira/browse/AURORA-1669 > > > Repository: aurora > > > Description > --- > > As documented in AURORA-1840 the Curator `LeaderLatch` recipe abdicates > leadership if the ZK connection is lost or if there is a timeout. This is not > compatible with the commons based implementation which would only abdicate > leadership if the ZK session timeout occurred. > > This replaces the `LeaderLatch` recipe with the `LeaderSelector` recipe with a > custom listener that only loses leadership if a connection loss occurs. > > > Diffs > - > > > commons/src/main/java/org/apache/aurora/common/zookeeper/testing/ZooKeeperTestServer.java > 50acaeba82e163f8f2970a264cbd889c9eb3b5ed > > src/main/java/org/apache/aurora/scheduler/discovery/CuratorSingletonService.java > c378172c850aafe0a9381552b5067277b40dbfab > > src/test/java/org/apache/aurora/scheduler/discovery/BaseCuratorDiscoveryTest.java > a2b4125369d1f6c0a79bc4ac0fb3d2dab8a6c583 > > src/test/java/org/apache/aurora/scheduler/discovery/CuratorSingletonServiceTest.java > 6ea49b0c690d288ff59d1d4798144bfa2d153d3a > > Diff: https://reviews.apache.org/r/54288/diff/ > > > Testing > --- > > > Thanks, > > Zameer Manji > >