Re: Review Request 29943: Uptime-driven scheduler job updates
On Feb. 24, 2015, 7:30 p.m., Kevin Sweeney wrote: Is this ready for review now? It is. However, since AURORA-1041 is still in Open I am going to discard it and repost when the ticket moves into Accepted. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/#review73877 --- On Jan. 20, 2015, 9:12 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- (Updated Jan. 20, 2015, 9:12 p.m.) Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Bugs: AURORA-1041 https://issues.apache.org/jira/browse/AURORA-1041 Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION
Re: Review Request 29943: Uptime-driven scheduler job updates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/#review73877 --- Is this ready for review now? - Kevin Sweeney On Jan. 20, 2015, 1:12 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- (Updated Jan. 20, 2015, 1:12 p.m.) Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Bugs: AURORA-1041 https://issues.apache.org/jira/browse/AURORA-1041 Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 src/main/python/apache/aurora/client/api/updater_util.py
Re: Review Request 29943: Uptime-driven scheduler job updates
On Jan. 20, 2015, 7:14 p.m., Bill Farner wrote: Before i dive in - can you please file a ticket to provide context (justification, plans) for this patch? It will also be helpful since it will add an entry to our changelog. Created AURORA-1041. - Maxim --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/#review68760 --- On Jan. 17, 2015, 8:54 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- (Updated Jan. 17, 2015, 8:54 p.m.) Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION
Re: Review Request 29943: Uptime-driven scheduler job updates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- (Updated Jan. 20, 2015, 9:12 p.m.) Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Changes --- Added ticket number. Bugs: AURORA-1041 https://issues.apache.org/jira/browse/AURORA-1041 Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 src/main/python/apache/aurora/client/api/updater_util.py 9d2e893a6ecff0fc48c7944575578443d41ced78 src/main/python/apache/aurora/config/schema/base.py d7897794c736778983d506c337a1392f3cc0cc20 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 987596f733b7155fbce772e6c74a8095d5da1827
Re: Review Request 29943: Uptime-driven scheduler job updates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/#review68760 --- Before i dive in - can you please file a ticket to provide context (justification, plans) for this patch? It will also be helpful since it will add an entry to our changelog. - Bill Farner On Jan. 17, 2015, 8:54 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- (Updated Jan. 17, 2015, 8:54 p.m.) Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160
Re: Review Request 29943: Uptime-driven scheduler job updates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/#review68768 --- Ship it! Master (c37de9a) is green with this patch. ./build-support/jenkins/build.sh I will refresh this build result if you post a review containing @ReviewBot retry - Aurora ReviewBot On Jan. 17, 2015, 8:54 p.m., Maxim Khutornenko wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- (Updated Jan. 17, 2015, 8:54 p.m.) Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160
Review Request 29943: Uptime-driven scheduler job updates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/29943/ --- Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman. Repository: aurora Description --- This is the first take on implementing job uptime driven updates. In addition to the olde good batch_size, instances can now be dispatched in arbitrary sequence depending on the overall uptime (health) of the job. The uptime is specified by a tuple of **waitForUptimeMs** and **waitForUptimePercentInstances** values. An excerpt from api.thrift explaining the feature: ``` /** * The uptime-driven update throttles the number of instances being updated at any given moment * according to the job uptime calculations. The X% of instances up over Y interval invariant * is preserved over the entire job update lifetime. No new instances are dispatched for update * unless that invariant is satisfied. Instances are dispatched in their natural uptime order, * shortest uptime first. * * For example, when set as below the update will block until at least 90% of job instances are in * RUNNING state for at least 1 minute: *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * When using uptime-driven update, it's expected that updateGroupSize is left unset to allow job * uptime settings drive the update progress. However, if updateGroupSize is set it will be * pre-applied before SLA uptime calculations to determine the update working set. As a side * effect, the updateGroupSize results in a natural ordering of instances taken for each group * (instances within a group are still updated in a shortest uptime first order). * * For example, if set as below the number of instances being updated at any given moment will * never exceed 5 even though the uptime calculations may allow more than 5: *updateGroupSize = 5 *waitForUptimeMs = 6 *waitForUptimePercentInstances = 90 * * NOTE on update rollback: with the uptime-driven update, there is no reliable way to ensure a * graceful throttled rollback as unstable/flapping instances may never yield an acceptable uptime * to perform an uptime-coordinated rollback. As such, when rollbackOnFailure=True AND the * updateGroupSize=0 the updater will dispatch all affected instances at once. * Use rollbackOnFailure=True with caution for uptime-driven updates. */ ``` For reviewers: recommend starting with api.thrift and then proceeding to the InstanceUptimeStrategy.java that implements the core algo. TODO: - vagrant e2e test - more corner case unit test coverage in JobUpdaterIT - client warning message in case uptime specs are used with client updater - docs Diffs - api/src/main/thrift/org/apache/aurora/gen/api.thrift 08ba1cdf88b712de22c26c04443079282db59ef9 src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java eae79d59b445ea58f46dc9e3107c03fbd83b6a95 src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java a992938d4e12b20f81608be6bbdc24c0a211c3fd src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java b53086169aa53d27a39a01cadf8d3c4a8ecb68de src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 5733da3daeacd8cb726310e5d9933635e3993687 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java PRE-CREATION src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 src/main/python/apache/aurora/client/api/updater_util.py 9d2e893a6ecff0fc48c7944575578443d41ced78 src/main/python/apache/aurora/config/schema/base.py d7897794c736778983d506c337a1392f3cc0cc20 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml f9c9ceddc559b43b4a5c45c745d54ff47484edde src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 987596f733b7155fbce772e6c74a8095d5da1827 src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java d36f5652357e06d6c8944d907ee011b91e84e9c6 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java