Re: Review Request 30225: Modifying update controller to support heartbeats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review70992 --- Ship it! Master (8bcb2ba) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 Diff: https://reviews.apache.org/r/30225/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 30586: Fix exception when opening cron urls.
On Feb. 4, 2015, 1:47 a.m., Maxim Khutornenko wrote: src/test/python/apache/aurora/client/cli/test_cron.py, line 138 https://reviews.apache.org/r/30586/diff/2/?file=846914#file846914line138 You may want to drop unused CLUSTER patching in other tests as well. Zameer Manji wrote: The other tests require the CLUSTER patching. In their current shape of intergration tests punching through the API - they do. However, you can convert them to use fake api (similar to the test you added) and make them true unit tests. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/#review70895 --- On Feb. 4, 2015, 1:14 a.m., Zameer Manji wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 4, 2015, 1:14 a.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Re: Review Request 30225: Modifying update controller to support heartbeats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Changes --- Fixing method name escaped during rebase. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs (updated) - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 Diff: https://reviews.apache.org/r/30225/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 30585: Adding command hook for beta-update start.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30585/#review71014 --- Ship it! Ship It! - Zameer Manji On Feb. 3, 2015, 4:24 p.m., George Sirois wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30585/ --- (Updated Feb. 3, 2015, 4:24 p.m.) Review request for Aurora and Zameer Manji. Repository: aurora Description --- Adding command hook for beta-update start. Also contains an update for the hooks documentation. Diffs - docs/hooks.md 533c81df9a5934ea903e3dbfb9fca6a211ceba21 src/main/python/apache/aurora/client/hooks/hooked_api.py bc61e91af6de06ecfc37eddd846c096a5155d7eb src/test/python/apache/aurora/client/hooks/test_hooked_api.py a1f474e1a4f0bcdbd0062757314ede1b7bb37f38 Diff: https://reviews.apache.org/r/30585/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/hooks:hooked_api Thanks, George Sirois
Re: Review Request 30585: Adding command hook for beta-update start.
On Feb. 4, 2015, 1:59 a.m., David McLaughlin wrote: LGTM, thanks for the patch. Although I'm a little bit concerned about adding support for hooks in an environment where we can't support post-hooks. Thanks! I agree that it is certainly not ideal, but it does allow us to at least publish notifications when an update begins. - George --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30585/#review70898 --- On Feb. 4, 2015, 12:24 a.m., George Sirois wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30585/ --- (Updated Feb. 4, 2015, 12:24 a.m.) Review request for Aurora and Zameer Manji. Repository: aurora Description --- Adding command hook for beta-update start. Also contains an update for the hooks documentation. Diffs - docs/hooks.md 533c81df9a5934ea903e3dbfb9fca6a211ceba21 src/main/python/apache/aurora/client/hooks/hooked_api.py bc61e91af6de06ecfc37eddd846c096a5155d7eb src/test/python/apache/aurora/client/hooks/test_hooked_api.py a1f474e1a4f0bcdbd0062757314ede1b7bb37f38 Diff: https://reviews.apache.org/r/30585/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/hooks:hooked_api Thanks, George Sirois
Re: Review Request 30649: Upgrade pants to 0.0.28
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/#review71148 --- Please put the pants release notes in the commit. Or at least some sort of summary that explains what is new. - Zameer Manji On Feb. 4, 2015, 6:46 p.m., Joe Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/ --- (Updated Feb. 4, 2015, 6:46 p.m.) Review request for Aurora and Brian Wickman. Bugs: AURORA-1104 https://issues.apache.org/jira/browse/AURORA-1104 Repository: aurora Description --- Upgrade pants to 0.0.28 Diffs - .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 build-support/pants_requirements.txt fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 Diff: https://reviews.apache.org/r/30649/diff/ Testing --- Reviewed the diff [for twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194) and ``` $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71063 --- On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d
Re: Review Request 30225: Modifying update controller to support heartbeats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71063 --- I suggest you skip to the big comment before paying attention to the smaller ones. If you agree with the concern, i suggest an initial focused diff that implements receiving heartbeats, and asynchronously reacting to the lack of their presence. api/src/main/thrift/org/apache/aurora/gen/api.thrift https://reviews.apache.org/r/30225/#comment116606 Maybe s/BLOCKED/AWAITING_PULSE/? That would at least self-document and avoid additional terminology. src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java https://reviews.apache.org/r/30225/#comment116607 s/query/fetch/ to be consistent with the method above. src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java https://reviews.apache.org/r/30225/#comment116619 Remove Optional here, do Optional.of() at the call site. src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java https://reviews.apache.org/r/30225/#comment116620 TODO + ticket, this map needs to be exposed via a debug endpoint src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java https://reviews.apache.org/r/30225/#comment116626 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml https://reviews.apache.org/r/30225/#comment116643 is it possible to reduce the diff a bit by moving this block to its former location, or is it needed - Bill Farner On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde
Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/#review71079 --- Can you see any opportunity to break this diff apart? As it stands i'm having a hard time giving a thoughtful review. Perhaps you can start by introducing the `Assignment` class? src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java https://reviews.apache.org/r/28617/#comment116667 What motivates you to supply these as overridden methods rather than a configuration object? The latter seems like better encapsulation. - Bill Farner On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/ --- (Updated Feb. 4, 2015, 11:38 p.m.) Review request for Aurora, Kevin Sweeney and Bill Farner. Bugs: AURORA-909 https://issues.apache.org/jira/browse/AURORA-909 Repository: aurora Description --- Modified the task offer/task matching logic to skip offer matching for tasks previously vetoed statically. Real life testing in vagrant (see pictures) shows close to 50% improvement in task scheduling performance. Testing with JMH shows over 97% better perf when testing with disabled preemptor (1 scheduling loop): ``` Master Benchmark Mode SamplesScore ErrorUnits o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1008291046.074 ± 145251.995 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1007522269.050 ± 142446.265 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 204171.046 ± 3800.124 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 215854.129 ± 8959.851 ns/op ``` Testing with preemptor enabled and no tasks eligible for preemption gives around 40% improvement (2 scheduling loops): ``` Master Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001767479.299 ± 26907.571 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1001538682.287 ± 119119.911 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001105731.141 ± 10040.721 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 939230.662 ± 11091.505 ns/op ``` Testing with preemptor enabled and running the worst case possible scenario (every slave is eligible and all tasks are victims) yields the least improvement 2-3% (3 scheduling loops). ``` Master Benchmark Mode Samples ScoreError Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 11043701.243 ± 40550.259 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10478631.055 ± 178833.158 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 116258653.000 ± 403080.017 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 10886116.889 ± 193934.324 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10182572.955 ± 35740.891 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 113656994.000 ± 424163.759 ns/op ``` Diffs - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 8c11ef8bd6609f3e4d97ca154d922898f8362446 src/jmh/java/org/apache/aurora/benchmark/Tasks.java 1a35f9ee9e8e76def0f9bf5454cf8cbdf6a89c25
Re: Review Request 30467: Update mesos lib to 0.21.1
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30467/#review71096 --- Ship it! Master (b49e1a0) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 4, 2015, 11:35 p.m., Bill Farner wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30467/ --- (Updated Feb. 4, 2015, 11:35 p.m.) Review request for Aurora and Kevin Sweeney. Bugs: AURORA-1080 https://issues.apache.org/jira/browse/AURORA-1080 Repository: aurora Description --- This is needed to take advantage of the task reconciliation API and new status update fields. Diffs - 3rdparty/python/requirements.txt 0c88e4c299ffe5886bb3ab7d83b0187cb4b7888a build-support/python/make-mesos-native-egg 2cba8ede04e22aee1e6ec5c979161904a6f8fddb build.gradle 8cec4a78d8de6fd98986b9507c67a416e70735f8 docs/deploying-aurora-scheduler.md 9bf5b5a92bf65b31b2f8fda102003b113f61186a examples/vagrant/provision-dev-cluster.sh fa0de88314169bc9fe31aaf41126cedd0865ed5a examples/vagrant/upstart/aurora-scheduler.conf b31e735cd76204e226447dca7dbd3c8ee13cf56e examples/vagrant/upstart/mesos-master.conf 23d457b7adac79b75b790535b43082b55356d60f examples/vagrant/upstart/mesos-slave.conf e00c9fa096d497593e4d6b423e5c69ec0c1b964a src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 4e98929da19a2515d0c4de59dac9018531ef47c9 src/test/sh/org/apache/aurora/e2e/http/http_example_docker_updated.aurora 20f604bc775e93dd2324468b768a73a8626a3a64 src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora 98f5c47ca601bdd629114dd13c5c0e2dd60188ec Diff: https://reviews.apache.org/r/30467/diff/ Testing --- End-to-end tests pass. Thanks, Bill Farner
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. Maxim Khutornenko wrote: I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK) Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`). These fall back to the same loop, but give the updater a time after which the state should be re-evaluated. - Bill --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71063 --- On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010
Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/#review71084 --- Master (b49e1a0) is red with this patch. ./build-support/jenkins/build.sh src.test.python.apache.aurora.client.cli.config . SUCCESS src.test.python.apache.aurora.client.cli.cron . SUCCESS src.test.python.apache.aurora.client.cli.inspect . SUCCESS src.test.python.apache.aurora.client.cli.job . SUCCESS src.test.python.apache.aurora.client.cli.plugins . SUCCESS src.test.python.apache.aurora.client.cli.quota . SUCCESS src.test.python.apache.aurora.client.cli.sla . SUCCESS src.test.python.apache.aurora.client.cli.supdate . SUCCESS src.test.python.apache.aurora.client.cli.task . SUCCESS src.test.python.apache.aurora.client.cli.update . SUCCESS src.test.python.apache.aurora.client.cli.version . SUCCESS src.test.python.apache.aurora.client.config . SUCCESS src.test.python.apache.aurora.client.hooks.hooked_api . SUCCESS src.test.python.apache.aurora.client.hooks.non_hooked_api . SUCCESS src.test.python.apache.aurora.common.test_aurora_job_key . SUCCESS src.test.python.apache.aurora.common.test_cluster . SUCCESS src.test.python.apache.aurora.common.test_cluster_option . SUCCESS src.test.python.apache.aurora.common.test_clusters . SUCCESS src.test.python.apache.aurora.common.test_http_signaler . SUCCESS src.test.python.apache.aurora.common.test_pex_version . SUCCESS src.test.python.apache.aurora.common.test_shellify . SUCCESS src.test.python.apache.aurora.common.test_transport . SUCCESS src.test.python.apache.aurora.config.test_base . SUCCESS src.test.python.apache.aurora.config.test_constraint_parsing . SUCCESS src.test.python.apache.aurora.config.test_loader . SUCCESS src.test.python.apache.aurora.config.test_thrift . SUCCESS src.test.python.apache.aurora.executor.common.task_info . SUCCESS src.test.python.apache.aurora.executor.executor_base . SUCCESS src.test.python.apache.aurora.executor.executor_detector . SUCCESS src.test.python.apache.aurora.executor.executor_vars . SUCCESS src.test.python.apache.aurora.executor.status_manager . SUCCESS src.test.python.apache.aurora.executor.thermos_task_runner . FAILURE src.test.python.apache.thermos.common.test_pathspec . SUCCESS src.test.python.apache.thermos.core.test_runner_integration . SUCCESS src.test.python.apache.thermos.monitoring.test_disk . SUCCESS FAILURE [31m FAILURE[0m I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/ --- (Updated Feb. 4, 2015, 11:38 p.m.) Review request for Aurora, Kevin Sweeney and Bill Farner. Bugs:
Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.
On Feb. 5, 2015, 12:01 a.m., Bill Farner wrote: Can you see any opportunity to break this diff apart? As it stands i'm having a hard time giving a thoughtful review. Perhaps you can start by introducing the `Assignment` class? I'd really prefer keeping this diff as a whole. The Assignment class would not make sense without the entire picture in mind. On Feb. 5, 2015, 12:01 a.m., Bill Farner wrote: src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java, line 210 https://reviews.apache.org/r/28617/diff/6/?file=849086#file849086line210 What motivates you to supply these as overridden methods rather than a configuration object? The latter seems like better encapsulation. Nothing in particular. It has grown organically this way. Perhaps it's time to refactor. I would like to address it separately though to avoid growing this diff any further. Will leave a TODO. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/#review71079 --- On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/ --- (Updated Feb. 4, 2015, 11:38 p.m.) Review request for Aurora, Kevin Sweeney and Bill Farner. Bugs: AURORA-909 https://issues.apache.org/jira/browse/AURORA-909 Repository: aurora Description --- Modified the task offer/task matching logic to skip offer matching for tasks previously vetoed statically. Real life testing in vagrant (see pictures) shows close to 50% improvement in task scheduling performance. Testing with JMH shows over 97% better perf when testing with disabled preemptor (1 scheduling loop): ``` Master Benchmark Mode SamplesScore ErrorUnits o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1008291046.074 ± 145251.995 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1007522269.050 ± 142446.265 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 204171.046 ± 3800.124 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 215854.129 ± 8959.851 ns/op ``` Testing with preemptor enabled and no tasks eligible for preemption gives around 40% improvement (2 scheduling loops): ``` Master Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001767479.299 ± 26907.571 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1001538682.287 ± 119119.911 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001105731.141 ± 10040.721 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 939230.662 ± 11091.505 ns/op ``` Testing with preemptor enabled and running the worst case possible scenario (every slave is eligible and all tasks are victims) yields the least improvement 2-3% (3 scheduling loops). ``` Master Benchmark Mode Samples ScoreError Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 11043701.243 ± 40550.259 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10478631.055 ± 178833.158 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 116258653.000 ± 403080.017 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 10886116.889 ± 193934.324 ns/op
Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.
On Feb. 5, 2015, 12:01 a.m., Bill Farner wrote: Can you see any opportunity to break this diff apart? As it stands i'm having a hard time giving a thoughtful review. Perhaps you can start by introducing the `Assignment` class? Maxim Khutornenko wrote: I'd really prefer keeping this diff as a whole. The Assignment class would not make sense without the entire picture in mind. It's okay if you commit them back-to-back, but we really need to exercise a pattern for incrementally building features even if they appear as code islands in the commit history. Doing so makes it easier to review with confidence, and more likely that you get faster review turnaround. - Bill --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/#review71079 --- On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/ --- (Updated Feb. 4, 2015, 11:38 p.m.) Review request for Aurora, Kevin Sweeney and Bill Farner. Bugs: AURORA-909 https://issues.apache.org/jira/browse/AURORA-909 Repository: aurora Description --- Modified the task offer/task matching logic to skip offer matching for tasks previously vetoed statically. Real life testing in vagrant (see pictures) shows close to 50% improvement in task scheduling performance. Testing with JMH shows over 97% better perf when testing with disabled preemptor (1 scheduling loop): ``` Master Benchmark Mode SamplesScore ErrorUnits o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1008291046.074 ± 145251.995 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1007522269.050 ± 142446.265 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 204171.046 ± 3800.124 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 215854.129 ± 8959.851 ns/op ``` Testing with preemptor enabled and no tasks eligible for preemption gives around 40% improvement (2 scheduling loops): ``` Master Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001767479.299 ± 26907.571 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1001538682.287 ± 119119.911 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001105731.141 ± 10040.721 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 939230.662 ± 11091.505 ns/op ``` Testing with preemptor enabled and running the worst case possible scenario (every slave is eligible and all tasks are victims) yields the least improvement 2-3% (3 scheduling loops). ``` Master Benchmark Mode Samples ScoreError Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 11043701.243 ± 40550.259 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10478631.055 ± 178833.158 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 116258653.000 ± 403080.017 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 10886116.889 ± 193934.324 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10182572.955 ± 35740.891 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 113656994.000 ± 424163.759 ns/op ``` Diffs -
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. Maxim Khutornenko wrote: I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. Bill Farner wrote: I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK) Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`). These fall back to the same loop, but give the updater a time after which the state should be re-evaluated. Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are concerned about obuse I think having a rate
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. Maxim Khutornenko wrote: I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. Bill Farner wrote: I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK) Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`). These fall back to the same loop, but give the updater a time after which the state should be re-evaluated. Maxim Khutornenko wrote: Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. Maxim Khutornenko wrote: I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. Bill Farner wrote: I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK) Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`). These fall back to the same loop, but give the updater a time after which the state should be re-evaluated. Maxim Khutornenko wrote: Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are
Re: Review Request 30225: Modifying update controller to support heartbeats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71114 --- src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java https://reviews.apache.org/r/30225/#comment116725 I think we still want to update the last pulse time even if it's paused? - David McLaughlin On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 Diff: https://reviews.apache.org/r/30225/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 30647: Instrument the HealthChecker to export stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/ --- (Updated Feb. 5, 2015, 1:46 a.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Bugs: AURORA-1062 https://issues.apache.org/jira/browse/AURORA-1062 Repository: aurora Description --- Instrument the HealthChecker to export stats. HealthChecker plugin now should export three stats: consecutive_failures: number of consecutive failures experienced (resets on success) latency: how long health checks are taking in practice snoozed: whether or not the health checker is snoozed Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530 src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984 src/test/python/apache/aurora/executor/common/test_health_checker.py def249c2509a28f7145380f250f79202b653dc83 Diff: https://reviews.apache.org/r/30647/diff/ Testing --- ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common:: Thanks, Brian Wickman
Re: Review Request 30647: Instrument the HealthChecker to export stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/ --- (Updated Feb. 5, 2015, 1:45 a.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Changes --- Rename health checker stats to health_checker. Add reviewers. Repository: aurora Description --- Instrument the HealthChecker to export stats. HealthChecker plugin now should export three stats: consecutive_failures: number of consecutive failures experienced (resets on success) latency: how long health checks are taking in practice snoozed: whether or not the health checker is snoozed Diffs (updated) - src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530 src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984 src/test/python/apache/aurora/executor/common/test_health_checker.py def249c2509a28f7145380f250f79202b653dc83 Diff: https://reviews.apache.org/r/30647/diff/ Testing --- ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common:: Thanks, Brian Wickman
Review Request 30647: Instrument the HealthChecker to export stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/ --- Review request for Aurora. Repository: aurora Description --- Instrument the HealthChecker to export stats. HealthChecker plugin now should export three stats: consecutive_failures: number of consecutive failures experienced (resets on success) latency: how long health checks are taking in practice snoozed: whether or not the health checker is snoozed Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530 src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984 src/test/python/apache/aurora/executor/common/test_health_checker.py def249c2509a28f7145380f250f79202b653dc83 Diff: https://reviews.apache.org/r/30647/diff/ Testing --- ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common:: Thanks, Brian Wickman
Re: Review Request 30647: Instrument the HealthChecker to export stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/#review71127 --- Master (edcc252) is red with this patch. ./build-support/jenkins/build.sh src.test.python.apache.aurora.client.cli.plugins . SUCCESS src.test.python.apache.aurora.client.cli.quota . SUCCESS src.test.python.apache.aurora.client.cli.sla . SUCCESS src.test.python.apache.aurora.client.cli.supdate . SUCCESS src.test.python.apache.aurora.client.cli.task . SUCCESS src.test.python.apache.aurora.client.cli.update . SUCCESS src.test.python.apache.aurora.client.cli.version . SUCCESS src.test.python.apache.aurora.client.config . SUCCESS src.test.python.apache.aurora.client.hooks.hooked_api . SUCCESS src.test.python.apache.aurora.client.hooks.non_hooked_api . SUCCESS src.test.python.apache.aurora.common.test_aurora_job_key . SUCCESS src.test.python.apache.aurora.common.test_cluster . SUCCESS src.test.python.apache.aurora.common.test_cluster_option . SUCCESS src.test.python.apache.aurora.common.test_clusters . SUCCESS src.test.python.apache.aurora.common.test_http_signaler . SUCCESS src.test.python.apache.aurora.common.test_pex_version . SUCCESS src.test.python.apache.aurora.common.test_shellify . SUCCESS src.test.python.apache.aurora.common.test_transport . SUCCESS src.test.python.apache.aurora.config.test_base . SUCCESS src.test.python.apache.aurora.config.test_constraint_parsing . SUCCESS src.test.python.apache.aurora.config.test_loader . SUCCESS src.test.python.apache.aurora.config.test_thrift . SUCCESS src.test.python.apache.aurora.executor.common.announcer . SUCCESS src.test.python.apache.aurora.executor.common.directory_sandbox . SUCCESS src.test.python.apache.aurora.executor.common.executor_timeout . SUCCESS src.test.python.apache.aurora.executor.common.health_checker . FAILURE src.test.python.apache.aurora.executor.common.task_info . SUCCESS src.test.python.apache.aurora.executor.executor_base . SUCCESS src.test.python.apache.aurora.executor.executor_detector . SUCCESS src.test.python.apache.aurora.executor.executor_vars . SUCCESS src.test.python.apache.aurora.executor.status_manager . SUCCESS src.test.python.apache.aurora.executor.thermos_task_runner . SUCCESS src.test.python.apache.thermos.common.test_pathspec . SUCCESS src.test.python.apache.thermos.core.test_runner_integration . SUCCESS src.test.python.apache.thermos.monitoring.test_disk . SUCCESS FAILURE [31m FAILURE[0m I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 5, 2015, 1:46 a.m., Brian Wickman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/ --- (Updated Feb. 5, 2015, 1:46 a.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Bugs:
Re: Review Request 30586: Fix exception when opening cron urls.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 4, 2015, 6:08 p.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Changes --- Maxim's feedback. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs (updated) - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Re: Review Request 30586: Fix exception when opening cron urls.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/#review71128 --- Maxim, can you review the changes to the tests? - Zameer Manji On Feb. 4, 2015, 6:08 p.m., Zameer Manji wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 4, 2015, 6:08 p.m.) Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Re: Review Request 30586: Fix exception when opening cron urls.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/#review71140 --- Ship it! LGTM. Minor suggestion below. src/test/python/apache/aurora/client/cli/test_cron.py https://reviews.apache.org/r/30586/#comment116769 Why not just `api.schedule_cron.mock_calls == [call(...)]`? - Maxim Khutornenko On Feb. 5, 2015, 2:08 a.m., Zameer Manji wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 5, 2015, 2:08 a.m.) Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Review Request 30649: Upgrade pants to 0.0.28
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/ --- Review request for Aurora and Brian Wickman. Bugs: AURORA-1104 https://issues.apache.org/jira/browse/AURORA-1104 Repository: aurora Description --- Upgrade pants to 0.0.28 Diffs - .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 build-support/pants_requirements.txt fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 Diff: https://reviews.apache.org/r/30649/diff/ Testing --- Reviewed the diff [for twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194) and ``` $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith
Re: Review Request 30586: Fix exception when opening cron urls.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/#review71138 --- Ship it! Master (edcc252) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 5, 2015, 2:08 a.m., Zameer Manji wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 5, 2015, 2:08 a.m.) Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Re: Review Request 30225: Modifying update controller to support heartbeats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71143 --- Ship it! Master (edcc252) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 5, 2015, 2:34 a.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 5, 2015, 2:34 a.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 Diff: https://reviews.apache.org/r/30225/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 30586: Fix exception when opening cron urls.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/#review71109 --- Ship it! Ship It! - Bill Farner On Feb. 4, 2015, 1:14 a.m., Zameer Manji wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 4, 2015, 1:14 a.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. Maxim Khutornenko wrote: I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. Bill Farner wrote: I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK) Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature Transitions do indeed happen asynchronously when dealing with timeouts (relevant code is in `InstanceActionHandler`). These fall back to the same loop, but give the updater a time after which the state should be re-evaluated. Maxim Khutornenko wrote: Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. I don't think this would buy us much if external service starts following up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract harder to consume. If we are
Re: Review Request 30647: Instrument the HealthChecker to export stats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/#review71137 --- src/test/python/apache/aurora/executor/common/test_health_checker.py https://reviews.apache.org/r/30647/#comment116768 Any chance to have test coverage for the other two metrics? - Maxim Khutornenko On Feb. 5, 2015, 1:46 a.m., Brian Wickman wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30647/ --- (Updated Feb. 5, 2015, 1:46 a.m.) Review request for Aurora, Joshua Cohen and Bill Farner. Bugs: AURORA-1062 https://issues.apache.org/jira/browse/AURORA-1062 Repository: aurora Description --- Instrument the HealthChecker to export stats. HealthChecker plugin now should export three stats: consecutive_failures: number of consecutive failures experienced (resets on success) latency: how long health checks are taking in practice snoozed: whether or not the health checker is snoozed Diffs - src/main/python/apache/aurora/executor/common/health_checker.py 60676ba0fbd8a218fe4309f07de28e2c66d54530 src/main/python/apache/aurora/executor/common/status_checker.py 624921d68199df098ea51ee8a10815403bf58984 src/test/python/apache/aurora/executor/common/test_health_checker.py def249c2509a28f7145380f250f79202b653dc83 Diff: https://reviews.apache.org/r/30647/diff/ Testing --- ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common:: Thanks, Brian Wickman
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 565 https://reviews.apache.org/r/30225/diff/4/?file=848240#file848240line565 Maybe s/BLOCKED/AWAITING_PULSE/? That would at least self-document and avoid additional terminology. Renamed. On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java, line 49 https://reviews.apache.org/r/30225/diff/4/?file=848242#file848242line49 s/query/fetch/ to be consistent with the method above. I started with fetch but that resulted in an overloaded method that I did not quite liked. Changed it back. On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 106 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line106 Remove Optional here, do Optional.of() at the call site. Removed. On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 122 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line122 TODO + ticket, this map needs to be exposed via a debug endpoint Done. On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 273 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273 I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. For these reasons, i think it would be wise to decouple recording of pulses from reacting to them. It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. Maxim Khutornenko wrote: I think avoid acquisition of a write lock here is a good goal to aim for. If, for example, we are already holding the write lock for a large snapshot, we could cause it to appear as though we have not received a pulse for an extended duration. - makes sense, I can try to split the recording part from the status update via an async action. Also, given that there is likely to be an automated system on the other end, we should be somewhat defensive to a high call frequency. This suggests to me that potentially-expensive read operations should be avoided if possible. - I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK). It's also apparent that there is currently no asynchronous transition to the 'blocked' states, they're triggered either by a tardy pulse or another external action. I think this would be surprising behavior, and the scheduler should automatically transition to these states without any external input. - the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states do not happen instantenously and are triggered by instance activities (i.e. task state change events). I don't see why BLOCKED state should be any different. The update is not technically alive until the evaluation event arrives. Changing this behavior just for heartbeats is not worth the extra complexity and will be inconsistent with the rest of the feature. Bill Farner wrote: I don't see how this would be possible without breaking the RPC contract as I still have to pull JobUpdateDetails in order to respond to the pulse with proper status message (FINISHED|PAUSED|OK) Yeah, this crossed my mind. Perhaps we should poke at the API to see if we can shake those requirements out? For example, we could just return OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to differentiate between a paused and finished update? I don't feel really strongly about this, but i think we should feel free to question the requirements at this phase. the transition to BLOCKED state happens during the evaluation loop similar to other internally triggered state transitions
Re: Review Request 30225: Modifying update controller to support heartbeats.
On Feb. 5, 2015, 1:15 a.m., David McLaughlin wrote: src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java, line 293 https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line293 I think we still want to update the last pulse time even if it's paused? Makes sense. A paused update is still technically active. Changed to support PAUSED pulse. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/#review71114 --- On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 4, 2015, 5:24 p.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 Diff: https://reviews.apache.org/r/30225/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 30225: Modifying update controller to support heartbeats.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30225/ --- (Updated Feb. 5, 2015, 2:34 a.m.) Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner. Changes --- Bill's and David's feedback. Bugs: AURORA-1010 https://issues.apache.org/jira/browse/AURORA-1010 Repository: aurora Description --- Added pulsing support into the JobUpdateController. The qualified coordinated updates get blocked until a pulse arrives. An update then becomes active and proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a terminal state (whichever comes first). Not particularly happy with plumbing through OneWayJobUpdater but the alternative is a state machine change, which is much hairier and will require more changes in the JobUpdaterController. Going with the minimal diff here. Diffs (updated) - api/src/main/thrift/org/apache/aurora/gen/api.thrift 4f603f9e7ed004e53937a45ea8edf7241e15f5cf src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 7f2ec71d744d5fafac84291cc79feba3398d0e1e src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java d479d20259f284528b2291e2e486b36d8e47fe5e src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java 60f535988a20638fb16515d5027bfa65f1afb73c src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java d3b30d48b76d8d7c64cda006a34f7ed3296526f2 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java ec9d1e07abca1bf30262e1c0f741a34741e96f5d src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 76460f95e058181b711fb6869f5a34c1d5bdfe31 src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java ca7c0c2675477cc727ca006697665f997972dfde src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java 89765ac3d34a827d3748fb96a275d78e9d1b8b72 src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 41d422939209d0808235128e4242c11e8ef25969 src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 Diff: https://reviews.apache.org/r/30225/diff/ Testing --- ./gradlew -Pq build Thanks, Maxim Khutornenko
Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28617/ --- (Updated Feb. 4, 2015, 11:38 p.m.) Review request for Aurora, Kevin Sweeney and Bill Farner. Changes --- Rebased and modified benchmarks to support victim-less preemption run. Bugs: AURORA-909 https://issues.apache.org/jira/browse/AURORA-909 Repository: aurora Description (updated) --- Modified the task offer/task matching logic to skip offer matching for tasks previously vetoed statically. Real life testing in vagrant (see pictures) shows close to 50% improvement in task scheduling performance. Testing with JMH shows over 97% better perf when testing with disabled preemptor (1 scheduling loop): ``` Master Benchmark Mode SamplesScore ErrorUnits o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1008291046.074 ± 145251.995 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1007522269.050 ± 142446.265 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 204171.046 ± 3800.124 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 215854.129 ± 8959.851 ns/op ``` Testing with preemptor enabled and no tasks eligible for preemption gives around 40% improvement (2 scheduling loops): ``` Master Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001767479.299 ± 26907.571 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 1001538682.287 ± 119119.911 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 1001105731.141 ± 10040.721 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 939230.662 ± 11091.505 ns/op ``` Testing with preemptor enabled and running the worst case possible scenario (every slave is eligible and all tasks are victims) yields the least improvement 2-3% (3 scheduling loops). ``` Master Benchmark Mode Samples ScoreError Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 11043701.243 ± 40550.259 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10478631.055 ± 178833.158 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 116258653.000 ± 403080.017 ns/op This RB Benchmark Mode Samples Score Error Units o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark avgt 100 10886116.889 ± 193934.324 ns/op o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark avgt 100 10182572.955 ± 35740.891 ns/op o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark avgt 100 113656994.000 ± 424163.759 ns/op ``` Diffs (updated) - src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 8c11ef8bd6609f3e4d97ca154d922898f8362446 src/jmh/java/org/apache/aurora/benchmark/Tasks.java 1a35f9ee9e8e76def0f9bf5454cf8cbdf6a89c25 src/main/java/org/apache/aurora/scheduler/TaskVars.java f017cdd26ca40138a7e141f21613ed567314c399 src/main/java/org/apache/aurora/scheduler/async/OfferQueue.java f66383830140e5eaba436f35ebb5192eee65947a src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java ce47ff152e303fd2116bc3b9e91c0c1a8f76f258 src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java 6a43bcd1719e8aa32fd3fcb7387d0318c3c0b804 src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java f06fdaeb92e154d0982bdabed5df93e7bcba9048 src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java e1c29747c9854cf75bf63f6f085cf40ca68989af src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java
Re: Review Request 30650: Upgrade virtualenv to 12.0.7
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30650/#review71171 --- Ship it! Ship It! - Zameer Manji On Feb. 4, 2015, 8:13 p.m., Joe Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30650/ --- (Updated Feb. 4, 2015, 8:13 p.m.) Review request for Aurora, Brian Wickman and Zameer Manji. Bugs: AURORA-979 https://issues.apache.org/jira/browse/AURORA-979 Repository: aurora Description --- Upgrade virtualenv to 12.0.7 virtualenv == [12.0.7 changelog](https://pypi.python.org/pypi/virtualenv/12.0.7) and selected highlights: * Upgrade pip to 6.0.8 * Upgrade setuptools to 12.0.5 * Revert several sys.path changes new in 12.0 which were breaking virtualenv. * **PROCESS** Version numbers are now simply ``X.Y`` where the leading ``1`` has been dropped. * Now using pytest framework * Correct sys.path ordering for debian, issue #461 * Correctly throws error on older Pythons, issue #619 * Allow for empty $PATH, pull #601 * Don't set prompt if $env:VIRTUAL_ENV_DISABLE_PROMPT is set for Powershell * Updated setuptools to 7.0 Diffs - build-support/virtualenv 0175f0e38e5f995ff4e8335fb86ec14869894e69 Diff: https://reviews.apache.org/r/30650/diff/ Testing --- ```sh $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith
Re: Review Request 30649: Upgrade pants to 0.0.28
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/#review71172 --- Ship it! Ship It! - Zameer Manji On Feb. 4, 2015, 8:13 p.m., Joe Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/ --- (Updated Feb. 4, 2015, 8:13 p.m.) Review request for Aurora, Brian Wickman and Zameer Manji. Bugs: AURORA-1104 https://issues.apache.org/jira/browse/AURORA-1104 Repository: aurora Description --- Upgrade pants to 0.0.28 pants = [0.0.28 changelog](https://pypi.python.org/pypi/pantsbuild.pants/0.0.28) and selected highlights: * bump virtualenv version to 12.0.5 RB #1621 * Make ‘setup-py’ show up in ‘./pants goal goals’ RB #1466 * Support use of pytest’s –pdb mode RB #1570 * Upgrade pex dependency to 0.8.4 Pick up several perf wins Pick up fix that allows pex to read older pexes RB #1648 RB #1693 * Refactor setting of PYTHONPATH in pants.ini RB #1586 Diffs - .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 build-support/pants_requirements.txt fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 Diff: https://reviews.apache.org/r/30649/diff/ Testing --- Reviewed the diff [for twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194) ``` $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith
Re: Review Request 30586: Fix exception when opening cron urls.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/#review71150 --- src/test/python/apache/aurora/client/cli/test_cron.py https://reviews.apache.org/r/30586/#comment116782 The config object in the argument is very complex and difficult to create. With our current set of test fixtures. - Zameer Manji On Feb. 4, 2015, 6:08 p.m., Zameer Manji wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30586/ --- (Updated Feb. 4, 2015, 6:08 p.m.) Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner. Bugs: AURORA-1094 https://issues.apache.org/jira/browse/AURORA-1094 Repository: aurora Description --- Fix exception when opening cron urls. Diffs - src/main/python/apache/aurora/client/cli/cron.py 3416c8e1932056725880f2007b60d77112759428 src/test/python/apache/aurora/client/cli/test_cron.py f488432cd68cc68fab8fce968e8605625ea3f56a Diff: https://reviews.apache.org/r/30586/diff/ Testing --- ./pants goal test src/test/python/apache/aurora/client/cli:cron Thanks, Zameer Manji
Review Request 30650: Upgrade virtualenv to 12.0.7
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30650/ --- Review request for Aurora, Brian Wickman and Zameer Manji. Bugs: AURORA-979 https://issues.apache.org/jira/browse/AURORA-979 Repository: aurora Description --- Upgrade virtualenv to 12.0.7 virtualenv == [12.0.7 changelog](https://pypi.python.org/pypi/virtualenv/12.0.7) and selected highlights: * Upgrade pip to 6.0.8 * Upgrade setuptools to 12.0.5 * Revert several sys.path changes new in 12.0 which were breaking virtualenv. * **PROCESS** Version numbers are now simply ``X.Y`` where the leading ``1`` has been dropped. * Now using pytest framework * Correct sys.path ordering for debian, issue #461 * Correctly throws error on older Pythons, issue #619 * Allow for empty $PATH, pull #601 * Don't set prompt if $env:VIRTUAL_ENV_DISABLE_PROMPT is set for Powershell * Updated setuptools to 7.0 Diffs - build-support/virtualenv 0175f0e38e5f995ff4e8335fb86ec14869894e69 Diff: https://reviews.apache.org/r/30650/diff/ Testing --- ```sh $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith
Re: Review Request 30650: Upgrade virtualenv to 12.0.7
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30650/#review71157 --- Ship it! Master (1c78721) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Feb. 5, 2015, 4:13 a.m., Joe Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30650/ --- (Updated Feb. 5, 2015, 4:13 a.m.) Review request for Aurora, Brian Wickman and Zameer Manji. Bugs: AURORA-979 https://issues.apache.org/jira/browse/AURORA-979 Repository: aurora Description --- Upgrade virtualenv to 12.0.7 virtualenv == [12.0.7 changelog](https://pypi.python.org/pypi/virtualenv/12.0.7) and selected highlights: * Upgrade pip to 6.0.8 * Upgrade setuptools to 12.0.5 * Revert several sys.path changes new in 12.0 which were breaking virtualenv. * **PROCESS** Version numbers are now simply ``X.Y`` where the leading ``1`` has been dropped. * Now using pytest framework * Correct sys.path ordering for debian, issue #461 * Correctly throws error on older Pythons, issue #619 * Allow for empty $PATH, pull #601 * Don't set prompt if $env:VIRTUAL_ENV_DISABLE_PROMPT is set for Powershell * Updated setuptools to 7.0 Diffs - build-support/virtualenv 0175f0e38e5f995ff4e8335fb86ec14869894e69 Diff: https://reviews.apache.org/r/30650/diff/ Testing --- ```sh $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith
Re: Review Request 30649: Upgrade pants to 0.0.28
On Feb. 4, 2015, 7:39 p.m., Zameer Manji wrote: Please put the pants release notes in the commit. Or at least some sort of summary that explains what is new. Done. - Joe --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/#review71148 --- On Feb. 4, 2015, 8:07 p.m., Joe Smith wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30649/ --- (Updated Feb. 4, 2015, 8:07 p.m.) Review request for Aurora and Brian Wickman. Bugs: AURORA-1104 https://issues.apache.org/jira/browse/AURORA-1104 Repository: aurora Description --- Upgrade pants to 0.0.28 pants = [0.0.28 changelog](https://pypi.python.org/pypi/pantsbuild.pants/0.0.28) and selected highlights: * bump virtualenv version to 12.0.5 RB #1621 * Make ‘setup-py’ show up in ‘./pants goal goals’ RB #1466 * Support use of pytest’s –pdb mode RB #1570 * Upgrade pex dependency to 0.8.4 Pick up several perf wins Pick up fix that allows pex to read older pexes RB #1648 RB #1693 * Refactor setting of PYTHONPATH in pants.ini RB #1586 Diffs - .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 build-support/pants_requirements.txt fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 Diff: https://reviews.apache.org/r/30649/diff/ Testing --- Reviewed the diff [for twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194) ``` $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all ``` Thanks, Joe Smith