> On Feb. 24, 2015, 7:30 p.m., Kevin Sweeney wrote: > > Is this ready for review now?
It is. However, since AURORA-1041 is still in Open I am going to discard it and repost when the ticket moves into Accepted. - Maxim ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/#review73877 ----------------------------------------------------------- On Jan. 20, 2015, 9:12 p.m., Maxim Khutornenko wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/29943/ > ----------------------------------------------------------- > > (Updated Jan. 20, 2015, 9:12 p.m.) > > > Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. > > > Bugs: AURORA-1041 > https://issues.apache.org/jira/browse/AURORA-1041 > > > Repository: aurora > > > Description > ------- > > This is the first take on implementing job uptime driven updates. In addition > to the olde good "batch_size", instances can now be dispatched in arbitrary > sequence depending on the overall uptime (health) of the job. > > The uptime is specified by a tuple of **waitForUptimeMs** and > **waitForUptimePercentInstances** values. An excerpt from api.thrift > explaining the feature: > ``` > /** > * The uptime-driven update throttles the number of instances being updated > at any given moment > * according to the job uptime calculations. The "X% of instances up over Y > interval" invariant > * is preserved over the entire job update lifetime. No new instances are > dispatched for update > * unless that invariant is satisfied. Instances are dispatched in their > natural uptime order, > * shortest uptime first. > * > * For example, when set as below the update will block until at least 90% > of job instances are in > * RUNNING state for at least 1 minute: > * waitForUptimeMs = 60000 > * waitForUptimePercentInstances = 90 > * > * When using uptime-driven update, it's expected that updateGroupSize is > left unset to allow job > * uptime settings drive the update progress. However, if updateGroupSize > is set it will be > * pre-applied before SLA uptime calculations to determine the update > working set. As a side > * effect, the updateGroupSize results in a natural ordering of instances > taken for each group > * (instances within a group are still updated in a "shortest uptime first" > order). > * > * For example, if set as below the number of instances being updated at > any given moment will > * never exceed 5 even though the uptime calculations may allow more than 5: > * updateGroupSize = 5 > * waitForUptimeMs = 60000 > * waitForUptimePercentInstances = 90 > * > * NOTE on update rollback: with the uptime-driven update, there is no > reliable way to ensure a > * graceful throttled rollback as unstable/flapping instances may never > yield an acceptable uptime > * to perform an uptime-coordinated rollback. As such, when > rollbackOnFailure=True AND the > * updateGroupSize=0 the updater will dispatch all affected instances at > once. > * Use rollbackOnFailure=True with caution for uptime-driven updates. > */ > ``` > > For reviewers: recommend starting with api.thrift and then proceeding to the > InstanceUptimeStrategy.java that implements the core algo. > > TODO: > - vagrant e2e test > - more corner case unit test coverage in JobUpdaterIT > - client warning message in case uptime specs are used with client updater > - docs > > > Diffs > ----- > > api/src/main/thrift/org/apache/aurora/gen/api.thrift > 08ba1cdf88b712de22c26c04443079282db59ef9 > src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java > eae79d59b445ea58f46dc9e3107c03fbd83b6a95 > src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java > 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 > > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java > ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 > > src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java > a992938d4e12b20f81608be6bbdc24c0a211c3fd > src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java > 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 > src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java > b53086169aa53d27a39a01cadf8d3c4a8ecb68de > src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java > 5733da3daeacd8cb726310e5d9933635e3993687 > > src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java > PRE-CREATION > > src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java > c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 > src/main/python/apache/aurora/client/api/updater_util.py > 9d2e893a6ecff0fc48c7944575578443d41ced78 > src/main/python/apache/aurora/config/schema/base.py > d7897794c736778983d506c337a1392f3cc0cc20 > > src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml > f9c9ceddc559b43b4a5c45c745d54ff47484edde > src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql > 987596f733b7155fbce772e6c74a8095d5da1827 > src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java > d36f5652357e06d6c8944d907ee011b91e84e9c6 > > src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java > ca7c0c2675477cc727ca006697665f997972dfde > > src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java > ad9126c32893080e128d086ea3bfd7ad23d27b89 > > src/test/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderTest.java > PRE-CREATION > src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java > 4c827b183a87b4d97774edbfaa960bd1c3de72a5 > src/test/java/org/apache/aurora/scheduler/updater/TaskUtil.java > 0e67f91536ff89c07da9be82049719c854aa3d62 > > src/test/java/org/apache/aurora/scheduler/updater/UpdateFactoryImplTest.java > d6e855b879e7909e8ba66c03ed34c845bf978a8f > > src/test/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategyTest.java > PRE-CREATION > > src/test/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategyTest.java > PRE-CREATION > src/test/python/apache/aurora/client/api/test_api.py > ff1aff2eac391f219bc7c2483a16e35f916a224c > src/test/python/apache/aurora/client/api/test_updater.py > dd3f228c5062d388b4393aa4fd5b60a685bdb3a6 > src/test/python/apache/aurora/client/api/test_updater_util.py > fe3ac49491ca710761632405ac09de0cc0d038a5 > > Diff: https://reviews.apache.org/r/29943/diff/ > > > Testing > ------- > > ./gradlew -Pq build > ./pants src/test/python:all > manual testing in vagrant > > > Thanks, > > Maxim Khutornenko > >