Review Request 29943: Uptime-driven scheduler job updates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 src/main/python/apache/aurora/client/api/updater_util.py 9d2e893a6ecff0fc48c7944575578443d41ced78 src/main/python/apache/aurora/config/schema/base.py d7897794c736778983d506c337a1392f3cc0cc20 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 987596f733b7155fbce772e6c74a8095d5da1827 src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java d36f5652357e06d6c8944d907ee011b91e84e9c6 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
Re: Review Request 29873: Fixing batched kill task filtering.
On Jan. 15, 2015, 10:07 p.m., Bill Farner wrote: src/main/python/apache/aurora/client/cli/context.py, line 218 https://reviews.apache.org/r/29873/diff/2/?file=822909#file822909line218 why convert an empty list to `None`? Blind following to the above pattern. No reason, dropped. On Jan. 15, 2015, 10:07 p.m., Bill Farner wrote: src/test/python/apache/aurora/client/cli/util.py, line 200 https://reviews.apache.org/r/29873/diff/2/?file=822920#file822920line200 Your call, but i'd prefer this be a single statement, `return ScheduledTask(...)`. Sure, converted. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29873/#review68329 --- On Jan. 15, 2015, 7:44 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29873/ --- (Updated Jan. 15, 2015, 7:44 p.m.) Review request for Aurora and Bill Farner. Bugs: AURORA-996 https://issues.apache.org/jira/browse/AURORA-996 Repository: aurora Description --- Fixed job status quering in client context to return active tasks where needed. Also, dropped active instance validation from both updater calls as it was hiding a legitimate feature: add new job instances with instance_spec. Diffs - src/main/python/apache/aurora/client/api/__init__.py 64f08046f6aca112a9f51c3f250ba6102d297216 src/main/python/apache/aurora/client/cli/context.py 93587c616afb0b7493a509197361cd76af2e2c97 src/main/python/apache/aurora/client/cli/jobs.py 508c9be556998e47bddcec8dee43f1595497b354 src/main/python/apache/aurora/client/cli/task.py 21548984f6f5cd894733220c6508458cc1318391 src/main/python/apache/aurora/client/cli/update.py a1617325f08ca252bdba38618aef141504ba7272 src/test/python/apache/aurora/client/cli/test_command_hooks.py e8432a1f3475e7e48a092526998d15cbf7e92e10 src/test/python/apache/aurora/client/cli/test_create.py 18b58383d209a50bd09fbdfa5a600e49d1d848f7 src/test/python/apache/aurora/client/cli/test_kill.py b475d737ede8ff0a669a9a9229196a76b43b46b6 src/test/python/apache/aurora/client/cli/test_plugins.py aa45851bd26f78583e0c16d0c3699c12f8a58697 src/test/python/apache/aurora/client/cli/test_restart.py a532ead256869c620e6bd96886ce9681b3423d0c src/test/python/apache/aurora/client/cli/test_supdate.py 9378b49ec5e4bb8189b00d9fd1d80558a731d668 src/test/python/apache/aurora/client/cli/test_update.py c12b32e3327af8c014fdad72d63ab4e68dc541c8 src/test/python/apache/aurora/client/cli/util.py 147d418ea1de679674dd93eaf648d84686b95d37 Diff: https://reviews.apache.org/r/29873/diff/ Testing --- ./pants src/test/python:all vagrant@vagrant-ubuntu-trusty-64:~$ aurora job killall devcluster/www-data/prod/hello INFO] Checking status of devcluster/www-data/prod/hello INFO] No tasks to kill found for job devcluster/www-data/prod/hello Job killall succeeded Thanks, Maxim Khutornenko
Re: Review Request 30010: [AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30010/ --- (Updated Jan. 18, 2015, 4:06 vorm.) Review request for Aurora and Zameer Manji. Repository: aurora Description --- [AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints This is the first step for AURORA-184, that removes the default hostrack limit constraints. The second step that's still missing would be to add s.th. like --default-constraints as start parameter to the scheduler. AURORA-174 could probably be closed with this?(since the rack limit constraint can be configured in the .aurora file) I can't really estimate the effect of my changes in StorageBackfillTestSchedulerThriftInterfaceTest, please have a closer look at the changes I did there. Since this is also my first code submit, comments about codestyleother bad habbits are very appreciated. Diffs - src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java 01b03508afac37b5a8f0ec5c3da1460695e1ef59 src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java dc2cb37adf32df0a6e4c7ee2ba776ba9f1f3c2f8 src/test/java/org/apache/aurora/scheduler/storage/StorageBackfillTest.java 7eafe074b686d55ad96018006cf4acfa823513c3 src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java ad9126c32893080e128d086ea3bfd7ad23d27b89 Diff: https://reviews.apache.org/r/30010/diff/ Testing --- Added test for ConfigurationManager.hasName Added test testNoHostAndRackConstraintsAdded, that checks if the constraints are present Tested on vagrant devcluster to see if constraints are also gone in real life Thanks, Florian Pfeiffer
Review Request 30010: [AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30010/ --- Review request for Aurora. Repository: aurora Description --- [AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints This is the first step for AURORA-184, that removes the default hostrack limit constraints. The second step that's still missing would be to add s.th. like --default-constraints as start parameter to the scheduler. AURORA-174 could probably be closed with this?(since the rack limit constraint can be configured in the .aurora file) I can't really estimate the effect of my changes in StorageBackfillTestSchedulerThriftInterfaceTest, please have a closer look at the changes I did there. Since this is also my first code submit, comments about codestyleother bad habbits are very appreciated. Diffs - src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java 01b03508afac37b5a8f0ec5c3da1460695e1ef59 src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java dc2cb37adf32df0a6e4c7ee2ba776ba9f1f3c2f8 src/test/java/org/apache/aurora/scheduler/storage/StorageBackfillTest.java 7eafe074b686d55ad96018006cf4acfa823513c3 src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java ad9126c32893080e128d086ea3bfd7ad23d27b89 Diff: https://reviews.apache.org/r/30010/diff/ Testing --- Added test for ConfigurationManager.hasName Added test testNoHostAndRackConstraintsAdded, that checks if the constraints are present Tested on vagrant devcluster to see if constraints are also gone in real life Thanks, Florian Pfeiffer