----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/62397/#review185645 -----------------------------------------------------------
Master (96a5287) is red with this patch. ./build-support/jenkins/build.sh :commons-args:jar :commons:compileJavaNote: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/commons/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.1 Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/commons/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor Note: Some input files use or override a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: Some input files use unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. :commons:generateThriftResources :commons:processResources :commons:classes :commons:jar :compileJava Download https://repo1.maven.org/maven2/com/h2database/h2/1.4.196/h2-1.4.196.pom Download https://repo1.maven.org/maven2/com/h2database/h2/1.4.196/h2-1.4.196.jar /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74: Note: Wrote forwarder org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder @Forward({ ^ Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2 Note: Writing file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java:220: error: cannot find symbol return reader.catchup(timeout, unit); ^ symbol: method catchup(long,TimeUnit) location: variable reader of type Reader 1 error :compileJava FAILED FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':compileJava'. > Compilation failed; see the compiler error output for details. * Try: Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. BUILD FAILED Total time: 2 mins 44.87 secs I will refresh this build result if you post a review containing "@ReviewBot retry" - Aurora ReviewBot On Sept. 19, 2017, 5:32 a.m., Jordan Ly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/62397/ > ----------------------------------------------------------- > > (Updated Sept. 19, 2017, 5:32 a.m.) > > > Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and > Stephan Erb. > > > Repository: aurora > > > Description > ------- > > This patch implements the option to turn on 'Replica Hot Standby', where > Scheduler replicas will be able to keep their volatile storage up to date > throughout normal operation. The main motivation behind this change is to > reduce failover time. If the leader fails over, the elected replica will be > able to serve production traffic much quicker as it has to rebuild less > state. However, this change enables future work such as snapshots on replicas > and serving traffic from replicas. > > There have been several discussions around this feature: > https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043eadfc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E > https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043eadfc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E > > Culminating in a design doc: > https://docs.google.com/document/d/1DOtKA4-vrtxat1MaUYMQ6Y1iXhA8ob6Mfztzt-R1Oss/edit#heading=h.gjdgxs > > The related Mesos patch can be found here: > https://reviews.apache.org/r/62288/ > > > Diffs > ----- > > config/legacy_untested_classes.txt ec3e934b2e0510b9339ac71182b78546cac0e7eb > src/main/java/org/apache/aurora/scheduler/log/Log.java > dc77eb435e5f8fdce56727a2f679e8e1907e977c > src/main/java/org/apache/aurora/scheduler/log/mesos/LogInterface.java > b0a7939131e1a3dceaf9635aec6746a5cd7ad394 > src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLog.java > 21855e184fe20dc339713978b32344b6950701ec > > src/main/java/org/apache/aurora/scheduler/log/mesos/MesosLogStreamModule.java > 6704a328a4023a178ed8f86ae4772cb04eb2fa8e > > src/main/java/org/apache/aurora/scheduler/storage/CallOrderEnforcingStorage.java > 2a5ec9c912979811c4badeee9362c22184d9cbbf > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorage.java > 387350c7667a5fb8ee674ad0d3dd17529232b25b > src/main/java/org/apache/aurora/scheduler/storage/log/LogStorageModule.java > 835f1604c0c5d913a87d570ee01d053bbbf92ecb > src/main/java/org/apache/aurora/scheduler/storage/log/StreamManager.java > ea147c0ba6aaa6d113144be0a8330bd2f73d2453 > > src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java > baf2647c54f1f9918139584e5151873a3853e83c > src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java > a7c9c83eebbbea7ae8a6c807f98d3ce8bd050137 > src/test/java/org/apache/aurora/scheduler/log/mesos/MesosLogTest.java > f142f545799d64f9352b0ac6c51942eedf5e9ced > src/test/java/org/apache/aurora/scheduler/storage/log/LogManagerTest.java > 3f445595a81a5655c6c486791a9b55d8dc7f2f76 > src/test/java/org/apache/aurora/scheduler/storage/log/LogStorageTest.java > 0eb54fdaddfbc2af76fd83ffee18ce4c6b61cc48 > > > Diff: https://reviews.apache.org/r/62397/diff/1/ > > > Testing > ------- > > Added unit tests. > > The current version of Mesos does not have the `catchup` function so the > tests will fail CI. However, they work on my box if I manually build Mesos > with the API change and add it to the dependencies. > > I have had this patch running on a test cluster for the past couple of weeks. > There are a few small issues to work out around catchup failure, but it is > generally stable. > > Initial (ad-hoc) observations show an improvement in > 'scheduler_storage_start' from 70-200 seconds to 5-80 seconds, depending on > if the failover occured immediately after a snapshot. I will compile more > comprehensive statistics around the results later (ex. time from scheduler > disconnect to new scheduler being elected and serving traffic). > > > Thanks, > > Jordan Ly > >
