Re: Review Request 29943: Uptime-driven scheduler job updates

2015-02-24 Thread Maxim Khutornenko


 On Feb. 24, 2015, 7:30 p.m., Kevin Sweeney wrote:
  Is this ready for review now?

It is. However, since AURORA-1041 is still in Open I am going to discard it and 
repost when the ticket moves into Accepted.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/#review73877
---


On Jan. 20, 2015, 9:12 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29943/
 ---
 
 (Updated Jan. 20, 2015, 9:12 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
 
 
 Bugs: AURORA-1041
 https://issues.apache.org/jira/browse/AURORA-1041
 
 
 Repository: aurora
 
 
 Description
 ---
 
 This is the first take on implementing job uptime driven updates. In addition 
 to the olde good batch_size, instances can now be dispatched in arbitrary 
 sequence depending on the overall uptime (health) of the job. 
 
 The uptime is specified by a tuple of **waitForUptimeMs** and 
 **waitForUptimePercentInstances** values. An excerpt from api.thrift 
 explaining the feature:
 ```
 /**
* The uptime-driven update throttles the number of instances being updated 
 at any given moment
* according to the job uptime calculations. The X% of instances up over Y 
 interval invariant
* is preserved over the entire job update lifetime. No new instances are 
 dispatched for update
* unless that invariant is satisfied. Instances are dispatched in their 
 natural uptime order,
* shortest uptime first.
*
* For example, when set as below the update will block until at least 90% 
 of job instances are in
* RUNNING state for at least 1 minute:
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* When using uptime-driven update, it's expected that updateGroupSize is 
 left unset to allow job
* uptime settings drive the update progress. However, if updateGroupSize 
 is set it will be
* pre-applied before SLA uptime calculations to determine the update 
 working set. As a side
* effect, the updateGroupSize results in a natural ordering of instances 
 taken for each group
* (instances within a group are still updated in a shortest uptime first 
 order).
*
* For example, if set as below the number of instances being updated at 
 any given moment will
* never exceed 5 even though the uptime calculations may allow more than 5:
*updateGroupSize = 5
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* NOTE on update rollback: with the uptime-driven update, there is no 
 reliable way to ensure a
* graceful throttled rollback as unstable/flapping instances may never 
 yield an acceptable uptime
* to perform an uptime-coordinated rollback. As such, when 
 rollbackOnFailure=True AND the
* updateGroupSize=0 the updater will dispatch all affected instances at 
 once.
* Use rollbackOnFailure=True with caution for uptime-driven updates.
*/
 ```
 
 For reviewers: recommend starting with api.thrift and then proceeding to the 
 InstanceUptimeStrategy.java that implements the core algo.
 
 TODO: 
 - vagrant e2e test
 - more corner case unit test coverage in JobUpdaterIT
 - client warning message in case uptime specs are used with client updater
 - docs
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 08ba1cdf88b712de22c26c04443079282db59ef9 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
   
 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  a992938d4e12b20f81608be6bbdc24c0a211c3fd 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
 b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
 5733da3daeacd8cb726310e5d9933635e3993687 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
  PRE-CREATION 
   
 

Re: Review Request 29943: Uptime-driven scheduler job updates

2015-02-24 Thread Kevin Sweeney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/#review73877
---


Is this ready for review now?

- Kevin Sweeney


On Jan. 20, 2015, 1:12 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29943/
 ---
 
 (Updated Jan. 20, 2015, 1:12 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
 
 
 Bugs: AURORA-1041
 https://issues.apache.org/jira/browse/AURORA-1041
 
 
 Repository: aurora
 
 
 Description
 ---
 
 This is the first take on implementing job uptime driven updates. In addition 
 to the olde good batch_size, instances can now be dispatched in arbitrary 
 sequence depending on the overall uptime (health) of the job. 
 
 The uptime is specified by a tuple of **waitForUptimeMs** and 
 **waitForUptimePercentInstances** values. An excerpt from api.thrift 
 explaining the feature:
 ```
 /**
* The uptime-driven update throttles the number of instances being updated 
 at any given moment
* according to the job uptime calculations. The X% of instances up over Y 
 interval invariant
* is preserved over the entire job update lifetime. No new instances are 
 dispatched for update
* unless that invariant is satisfied. Instances are dispatched in their 
 natural uptime order,
* shortest uptime first.
*
* For example, when set as below the update will block until at least 90% 
 of job instances are in
* RUNNING state for at least 1 minute:
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* When using uptime-driven update, it's expected that updateGroupSize is 
 left unset to allow job
* uptime settings drive the update progress. However, if updateGroupSize 
 is set it will be
* pre-applied before SLA uptime calculations to determine the update 
 working set. As a side
* effect, the updateGroupSize results in a natural ordering of instances 
 taken for each group
* (instances within a group are still updated in a shortest uptime first 
 order).
*
* For example, if set as below the number of instances being updated at 
 any given moment will
* never exceed 5 even though the uptime calculations may allow more than 5:
*updateGroupSize = 5
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* NOTE on update rollback: with the uptime-driven update, there is no 
 reliable way to ensure a
* graceful throttled rollback as unstable/flapping instances may never 
 yield an acceptable uptime
* to perform an uptime-coordinated rollback. As such, when 
 rollbackOnFailure=True AND the
* updateGroupSize=0 the updater will dispatch all affected instances at 
 once.
* Use rollbackOnFailure=True with caution for uptime-driven updates.
*/
 ```
 
 For reviewers: recommend starting with api.thrift and then proceeding to the 
 InstanceUptimeStrategy.java that implements the core algo.
 
 TODO: 
 - vagrant e2e test
 - more corner case unit test coverage in JobUpdaterIT
 - client warning message in case uptime specs are used with client updater
 - docs
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 08ba1cdf88b712de22c26c04443079282db59ef9 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
   
 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  a992938d4e12b20f81608be6bbdc24c0a211c3fd 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
 b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
 5733da3daeacd8cb726310e5d9933635e3993687 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java
  c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 
   src/main/python/apache/aurora/client/api/updater_util.py 
 

Re: Review Request 29943: Uptime-driven scheduler job updates

2015-01-20 Thread Maxim Khutornenko


 On Jan. 20, 2015, 7:14 p.m., Bill Farner wrote:
  Before i dive in - can you please file a ticket to provide context 
  (justification, plans) for this patch?  It will also be helpful since it 
  will add an entry to our changelog.

Created AURORA-1041.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/#review68760
---


On Jan. 17, 2015, 8:54 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29943/
 ---
 
 (Updated Jan. 17, 2015, 8:54 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 This is the first take on implementing job uptime driven updates. In addition 
 to the olde good batch_size, instances can now be dispatched in arbitrary 
 sequence depending on the overall uptime (health) of the job. 
 
 The uptime is specified by a tuple of **waitForUptimeMs** and 
 **waitForUptimePercentInstances** values. An excerpt from api.thrift 
 explaining the feature:
 ```
 /**
* The uptime-driven update throttles the number of instances being updated 
 at any given moment
* according to the job uptime calculations. The X% of instances up over Y 
 interval invariant
* is preserved over the entire job update lifetime. No new instances are 
 dispatched for update
* unless that invariant is satisfied. Instances are dispatched in their 
 natural uptime order,
* shortest uptime first.
*
* For example, when set as below the update will block until at least 90% 
 of job instances are in
* RUNNING state for at least 1 minute:
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* When using uptime-driven update, it's expected that updateGroupSize is 
 left unset to allow job
* uptime settings drive the update progress. However, if updateGroupSize 
 is set it will be
* pre-applied before SLA uptime calculations to determine the update 
 working set. As a side
* effect, the updateGroupSize results in a natural ordering of instances 
 taken for each group
* (instances within a group are still updated in a shortest uptime first 
 order).
*
* For example, if set as below the number of instances being updated at 
 any given moment will
* never exceed 5 even though the uptime calculations may allow more than 5:
*updateGroupSize = 5
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* NOTE on update rollback: with the uptime-driven update, there is no 
 reliable way to ensure a
* graceful throttled rollback as unstable/flapping instances may never 
 yield an acceptable uptime
* to perform an uptime-coordinated rollback. As such, when 
 rollbackOnFailure=True AND the
* updateGroupSize=0 the updater will dispatch all affected instances at 
 once.
* Use rollbackOnFailure=True with caution for uptime-driven updates.
*/
 ```
 
 For reviewers: recommend starting with api.thrift and then proceeding to the 
 InstanceUptimeStrategy.java that implements the core algo.
 
 TODO: 
 - vagrant e2e test
 - more corner case unit test coverage in JobUpdaterIT
 - client warning message in case uptime specs are used with client updater
 - docs
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 08ba1cdf88b712de22c26c04443079282db59ef9 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
   
 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  a992938d4e12b20f81608be6bbdc24c0a211c3fd 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
 b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
 5733da3daeacd8cb726310e5d9933635e3993687 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
  PRE-CREATION 
   
 

Re: Review Request 29943: Uptime-driven scheduler job updates

2015-01-20 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/
---

(Updated Jan. 20, 2015, 9:12 p.m.)


Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Changes
---

Added ticket number.


Bugs: AURORA-1041
https://issues.apache.org/jira/browse/AURORA-1041


Repository: aurora


Description
---

This is the first take on implementing job uptime driven updates. In addition 
to the olde good batch_size, instances can now be dispatched in arbitrary 
sequence depending on the overall uptime (health) of the job. 

The uptime is specified by a tuple of **waitForUptimeMs** and 
**waitForUptimePercentInstances** values. An excerpt from api.thrift explaining 
the feature:
```
/**
   * The uptime-driven update throttles the number of instances being updated 
at any given moment
   * according to the job uptime calculations. The X% of instances up over Y 
interval invariant
   * is preserved over the entire job update lifetime. No new instances are 
dispatched for update
   * unless that invariant is satisfied. Instances are dispatched in their 
natural uptime order,
   * shortest uptime first.
   *
   * For example, when set as below the update will block until at least 90% of 
job instances are in
   * RUNNING state for at least 1 minute:
   *waitForUptimeMs = 6
   *waitForUptimePercentInstances = 90
   *
   * When using uptime-driven update, it's expected that updateGroupSize is 
left unset to allow job
   * uptime settings drive the update progress. However, if updateGroupSize is 
set it will be
   * pre-applied before SLA uptime calculations to determine the update working 
set. As a side
   * effect, the updateGroupSize results in a natural ordering of instances 
taken for each group
   * (instances within a group are still updated in a shortest uptime first 
order).
   *
   * For example, if set as below the number of instances being updated at any 
given moment will
   * never exceed 5 even though the uptime calculations may allow more than 5:
   *updateGroupSize = 5
   *waitForUptimeMs = 6
   *waitForUptimePercentInstances = 90
   *
   * NOTE on update rollback: with the uptime-driven update, there is no 
reliable way to ensure a
   * graceful throttled rollback as unstable/flapping instances may never yield 
an acceptable uptime
   * to perform an uptime-coordinated rollback. As such, when 
rollbackOnFailure=True AND the
   * updateGroupSize=0 the updater will dispatch all affected instances at once.
   * Use rollbackOnFailure=True with caution for uptime-driven updates.
   */
```

For reviewers: recommend starting with api.thrift and then proceeding to the 
InstanceUptimeStrategy.java that implements the core algo.

TODO: 
- vagrant e2e test
- more corner case unit test coverage in JobUpdaterIT
- client warning message in case uptime specs are used with client updater
- docs


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
08ba1cdf88b712de22c26c04443079282db59ef9 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
  src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
  
src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
a992938d4e12b20f81608be6bbdc24c0a211c3fd 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
  src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
5733da3daeacd8cb726310e5d9933635e3993687 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java 
c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 
  src/main/python/apache/aurora/client/api/updater_util.py 
9d2e893a6ecff0fc48c7944575578443d41ced78 
  src/main/python/apache/aurora/config/schema/base.py 
d7897794c736778983d506c337a1392f3cc0cc20 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
987596f733b7155fbce772e6c74a8095d5da1827 
  

Re: Review Request 29943: Uptime-driven scheduler job updates

2015-01-20 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/#review68760
---


Before i dive in - can you please file a ticket to provide context 
(justification, plans) for this patch?  It will also be helpful since it will 
add an entry to our changelog.

- Bill Farner


On Jan. 17, 2015, 8:54 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29943/
 ---
 
 (Updated Jan. 17, 2015, 8:54 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 This is the first take on implementing job uptime driven updates. In addition 
 to the olde good batch_size, instances can now be dispatched in arbitrary 
 sequence depending on the overall uptime (health) of the job. 
 
 The uptime is specified by a tuple of **waitForUptimeMs** and 
 **waitForUptimePercentInstances** values. An excerpt from api.thrift 
 explaining the feature:
 ```
 /**
* The uptime-driven update throttles the number of instances being updated 
 at any given moment
* according to the job uptime calculations. The X% of instances up over Y 
 interval invariant
* is preserved over the entire job update lifetime. No new instances are 
 dispatched for update
* unless that invariant is satisfied. Instances are dispatched in their 
 natural uptime order,
* shortest uptime first.
*
* For example, when set as below the update will block until at least 90% 
 of job instances are in
* RUNNING state for at least 1 minute:
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* When using uptime-driven update, it's expected that updateGroupSize is 
 left unset to allow job
* uptime settings drive the update progress. However, if updateGroupSize 
 is set it will be
* pre-applied before SLA uptime calculations to determine the update 
 working set. As a side
* effect, the updateGroupSize results in a natural ordering of instances 
 taken for each group
* (instances within a group are still updated in a shortest uptime first 
 order).
*
* For example, if set as below the number of instances being updated at 
 any given moment will
* never exceed 5 even though the uptime calculations may allow more than 5:
*updateGroupSize = 5
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* NOTE on update rollback: with the uptime-driven update, there is no 
 reliable way to ensure a
* graceful throttled rollback as unstable/flapping instances may never 
 yield an acceptable uptime
* to perform an uptime-coordinated rollback. As such, when 
 rollbackOnFailure=True AND the
* updateGroupSize=0 the updater will dispatch all affected instances at 
 once.
* Use rollbackOnFailure=True with caution for uptime-driven updates.
*/
 ```
 
 For reviewers: recommend starting with api.thrift and then proceeding to the 
 InstanceUptimeStrategy.java that implements the core algo.
 
 TODO: 
 - vagrant e2e test
 - more corner case unit test coverage in JobUpdaterIT
 - client warning message in case uptime specs are used with client updater
 - docs
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 08ba1cdf88b712de22c26c04443079282db59ef9 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
   
 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  a992938d4e12b20f81608be6bbdc24c0a211c3fd 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
 b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
 5733da3daeacd8cb726310e5d9933635e3993687 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java
  c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 
   

Re: Review Request 29943: Uptime-driven scheduler job updates

2015-01-20 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/#review68768
---

Ship it!


Master (c37de9a) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Jan. 17, 2015, 8:54 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29943/
 ---
 
 (Updated Jan. 17, 2015, 8:54 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 This is the first take on implementing job uptime driven updates. In addition 
 to the olde good batch_size, instances can now be dispatched in arbitrary 
 sequence depending on the overall uptime (health) of the job. 
 
 The uptime is specified by a tuple of **waitForUptimeMs** and 
 **waitForUptimePercentInstances** values. An excerpt from api.thrift 
 explaining the feature:
 ```
 /**
* The uptime-driven update throttles the number of instances being updated 
 at any given moment
* according to the job uptime calculations. The X% of instances up over Y 
 interval invariant
* is preserved over the entire job update lifetime. No new instances are 
 dispatched for update
* unless that invariant is satisfied. Instances are dispatched in their 
 natural uptime order,
* shortest uptime first.
*
* For example, when set as below the update will block until at least 90% 
 of job instances are in
* RUNNING state for at least 1 minute:
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* When using uptime-driven update, it's expected that updateGroupSize is 
 left unset to allow job
* uptime settings drive the update progress. However, if updateGroupSize 
 is set it will be
* pre-applied before SLA uptime calculations to determine the update 
 working set. As a side
* effect, the updateGroupSize results in a natural ordering of instances 
 taken for each group
* (instances within a group are still updated in a shortest uptime first 
 order).
*
* For example, if set as below the number of instances being updated at 
 any given moment will
* never exceed 5 even though the uptime calculations may allow more than 5:
*updateGroupSize = 5
*waitForUptimeMs = 6
*waitForUptimePercentInstances = 90
*
* NOTE on update rollback: with the uptime-driven update, there is no 
 reliable way to ensure a
* graceful throttled rollback as unstable/flapping instances may never 
 yield an acceptable uptime
* to perform an uptime-coordinated rollback. As such, when 
 rollbackOnFailure=True AND the
* updateGroupSize=0 the updater will dispatch all affected instances at 
 once.
* Use rollbackOnFailure=True with caution for uptime-driven updates.
*/
 ```
 
 For reviewers: recommend starting with api.thrift and then proceeding to the 
 InstanceUptimeStrategy.java that implements the core algo.
 
 TODO: 
 - vagrant e2e test
 - more corner case unit test coverage in JobUpdaterIT
 - client warning message in case uptime specs are used with client updater
 - docs
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 08ba1cdf88b712de22c26c04443079282db59ef9 
   src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
 eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
   src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
 156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
   
 src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
  ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
   
 src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  a992938d4e12b20f81608be6bbdc24c0a211c3fd 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
 b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
 5733da3daeacd8cb726310e5d9933635e3993687 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
  PRE-CREATION 
   
 src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java
  c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 
   

Review Request 29943: Uptime-driven scheduler job updates

2015-01-17 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/
---

Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Repository: aurora


Description
---

This is the first take on implementing job uptime driven updates. In addition 
to the olde good batch_size, instances can now be dispatched in arbitrary 
sequence depending on the overall uptime (health) of the job. 

The uptime is specified by a tuple of **waitForUptimeMs** and 
**waitForUptimePercentInstances** values. An excerpt from api.thrift explaining 
the feature:
```
/**
   * The uptime-driven update throttles the number of instances being updated 
at any given moment
   * according to the job uptime calculations. The X% of instances up over Y 
interval invariant
   * is preserved over the entire job update lifetime. No new instances are 
dispatched for update
   * unless that invariant is satisfied. Instances are dispatched in their 
natural uptime order,
   * shortest uptime first.
   *
   * For example, when set as below the update will block until at least 90% of 
job instances are in
   * RUNNING state for at least 1 minute:
   *waitForUptimeMs = 6
   *waitForUptimePercentInstances = 90
   *
   * When using uptime-driven update, it's expected that updateGroupSize is 
left unset to allow job
   * uptime settings drive the update progress. However, if updateGroupSize is 
set it will be
   * pre-applied before SLA uptime calculations to determine the update working 
set. As a side
   * effect, the updateGroupSize results in a natural ordering of instances 
taken for each group
   * (instances within a group are still updated in a shortest uptime first 
order).
   *
   * For example, if set as below the number of instances being updated at any 
given moment will
   * never exceed 5 even though the uptime calculations may allow more than 5:
   *updateGroupSize = 5
   *waitForUptimeMs = 6
   *waitForUptimePercentInstances = 90
   *
   * NOTE on update rollback: with the uptime-driven update, there is no 
reliable way to ensure a
   * graceful throttled rollback as unstable/flapping instances may never yield 
an acceptable uptime
   * to perform an uptime-coordinated rollback. As such, when 
rollbackOnFailure=True AND the
   * updateGroupSize=0 the updater will dispatch all affected instances at once.
   * Use rollbackOnFailure=True with caution for uptime-driven updates.
   */
```

For reviewers: recommend starting with api.thrift and then proceeding to the 
InstanceUptimeStrategy.java that implements the core algo.

TODO: 
- vagrant e2e test
- more corner case unit test coverage in JobUpdaterIT
- client warning message in case uptime specs are used with client updater
- docs


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
08ba1cdf88b712de22c26c04443079282db59ef9 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
  src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
  
src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
a992938d4e12b20f81608be6bbdc24c0a211c3fd 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
  src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
5733da3daeacd8cb726310e5d9933635e3993687 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java 
c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 
  src/main/python/apache/aurora/client/api/updater_util.py 
9d2e893a6ecff0fc48c7944575578443d41ced78 
  src/main/python/apache/aurora/config/schema/base.py 
d7897794c736778983d506c337a1392f3cc0cc20 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
987596f733b7155fbce772e6c74a8095d5da1827 
  src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
d36f5652357e06d6c8944d907ee011b91e84e9c6 
  
src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java