Review Request 29943: Uptime-driven scheduler job updates

2015-01-17 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29943/
---

Review request for Aurora, Kevin Sweeney, Bill Farner, and Brian Wickman.


Repository: aurora


Description
---

This is the first take on implementing job uptime driven updates. In addition 
to the olde good batch_size, instances can now be dispatched in arbitrary 
sequence depending on the overall uptime (health) of the job. 

The uptime is specified by a tuple of **waitForUptimeMs** and 
**waitForUptimePercentInstances** values. An excerpt from api.thrift explaining 
the feature:
```
/**
   * The uptime-driven update throttles the number of instances being updated 
at any given moment
   * according to the job uptime calculations. The X% of instances up over Y 
interval invariant
   * is preserved over the entire job update lifetime. No new instances are 
dispatched for update
   * unless that invariant is satisfied. Instances are dispatched in their 
natural uptime order,
   * shortest uptime first.
   *
   * For example, when set as below the update will block until at least 90% of 
job instances are in
   * RUNNING state for at least 1 minute:
   *waitForUptimeMs = 6
   *waitForUptimePercentInstances = 90
   *
   * When using uptime-driven update, it's expected that updateGroupSize is 
left unset to allow job
   * uptime settings drive the update progress. However, if updateGroupSize is 
set it will be
   * pre-applied before SLA uptime calculations to determine the update working 
set. As a side
   * effect, the updateGroupSize results in a natural ordering of instances 
taken for each group
   * (instances within a group are still updated in a shortest uptime first 
order).
   *
   * For example, if set as below the number of instances being updated at any 
given moment will
   * never exceed 5 even though the uptime calculations may allow more than 5:
   *updateGroupSize = 5
   *waitForUptimeMs = 6
   *waitForUptimePercentInstances = 90
   *
   * NOTE on update rollback: with the uptime-driven update, there is no 
reliable way to ensure a
   * graceful throttled rollback as unstable/flapping instances may never yield 
an acceptable uptime
   * to perform an uptime-coordinated rollback. As such, when 
rollbackOnFailure=True AND the
   * updateGroupSize=0 the updater will dispatch all affected instances at once.
   * Use rollbackOnFailure=True with caution for uptime-driven updates.
   */
```

For reviewers: recommend starting with api.thrift and then proceeding to the 
InstanceUptimeStrategy.java that implements the core algo.

TODO: 
- vagrant e2e test
- more corner case unit test coverage in JobUpdaterIT
- client warning message in case uptime specs are used with client updater
- docs


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
08ba1cdf88b712de22c26c04443079282db59ef9 
  src/main/java/org/apache/aurora/scheduler/sla/SlaAlgorithm.java 
eae79d59b445ea58f46dc9e3107c03fbd83b6a95 
  src/main/java/org/apache/aurora/scheduler/sla/SlaUtil.java 
156b9c0a2fa0c0ec4b7220d5ec2cc40c3e59d1d6 
  
src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java 
ac92959f34a3b0962d6aa018dc82a5ac72ea1b34 
  
src/main/java/org/apache/aurora/scheduler/updater/InstanceUptimeProviderImpl.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
a992938d4e12b20f81608be6bbdc24c0a211c3fd 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
b53086169aa53d27a39a01cadf8d3c4a8ecb68de 
  src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
5733da3daeacd8cb726310e5d9933635e3993687 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/FilteringStrategy.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeProvider.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/InstanceUptimeStrategy.java
 PRE-CREATION 
  
src/main/java/org/apache/aurora/scheduler/updater/strategy/UpdateStrategy.java 
c2a2ee8f3ad09d48918e4e62eb8fe7a71b428160 
  src/main/python/apache/aurora/client/api/updater_util.py 
9d2e893a6ecff0fc48c7944575578443d41ced78 
  src/main/python/apache/aurora/config/schema/base.py 
d7897794c736778983d506c337a1392f3cc0cc20 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  src/main/resources/org/apache/aurora/scheduler/storage/db/schema.sql 
987596f733b7155fbce772e6c74a8095d5da1827 
  src/test/java/org/apache/aurora/scheduler/sla/SlaAlgorithmTest.java 
d36f5652357e06d6c8944d907ee011b91e84e9c6 
  
src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 

Re: Review Request 29873: Fixing batched kill task filtering.

2015-01-17 Thread Maxim Khutornenko


 On Jan. 15, 2015, 10:07 p.m., Bill Farner wrote:
  src/main/python/apache/aurora/client/cli/context.py, line 218
  https://reviews.apache.org/r/29873/diff/2/?file=822909#file822909line218
 
  why convert an empty list to `None`?

Blind following to the above pattern. No reason, dropped.


 On Jan. 15, 2015, 10:07 p.m., Bill Farner wrote:
  src/test/python/apache/aurora/client/cli/util.py, line 200
  https://reviews.apache.org/r/29873/diff/2/?file=822920#file822920line200
 
  Your call, but i'd prefer this be a single statement, `return 
  ScheduledTask(...)`.

Sure, converted.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/29873/#review68329
---


On Jan. 15, 2015, 7:44 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/29873/
 ---
 
 (Updated Jan. 15, 2015, 7:44 p.m.)
 
 
 Review request for Aurora and Bill Farner.
 
 
 Bugs: AURORA-996
 https://issues.apache.org/jira/browse/AURORA-996
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fixed job status quering in client context to return active tasks where 
 needed.
 
 Also, dropped active instance validation from both updater calls as it was 
 hiding a legitimate feature: add new job instances with instance_spec.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/api/__init__.py 
 64f08046f6aca112a9f51c3f250ba6102d297216 
   src/main/python/apache/aurora/client/cli/context.py 
 93587c616afb0b7493a509197361cd76af2e2c97 
   src/main/python/apache/aurora/client/cli/jobs.py 
 508c9be556998e47bddcec8dee43f1595497b354 
   src/main/python/apache/aurora/client/cli/task.py 
 21548984f6f5cd894733220c6508458cc1318391 
   src/main/python/apache/aurora/client/cli/update.py 
 a1617325f08ca252bdba38618aef141504ba7272 
   src/test/python/apache/aurora/client/cli/test_command_hooks.py 
 e8432a1f3475e7e48a092526998d15cbf7e92e10 
   src/test/python/apache/aurora/client/cli/test_create.py 
 18b58383d209a50bd09fbdfa5a600e49d1d848f7 
   src/test/python/apache/aurora/client/cli/test_kill.py 
 b475d737ede8ff0a669a9a9229196a76b43b46b6 
   src/test/python/apache/aurora/client/cli/test_plugins.py 
 aa45851bd26f78583e0c16d0c3699c12f8a58697 
   src/test/python/apache/aurora/client/cli/test_restart.py 
 a532ead256869c620e6bd96886ce9681b3423d0c 
   src/test/python/apache/aurora/client/cli/test_supdate.py 
 9378b49ec5e4bb8189b00d9fd1d80558a731d668 
   src/test/python/apache/aurora/client/cli/test_update.py 
 c12b32e3327af8c014fdad72d63ab4e68dc541c8 
   src/test/python/apache/aurora/client/cli/util.py 
 147d418ea1de679674dd93eaf648d84686b95d37 
 
 Diff: https://reviews.apache.org/r/29873/diff/
 
 
 Testing
 ---
 
 ./pants src/test/python:all
 
 vagrant@vagrant-ubuntu-trusty-64:~$ aurora job killall 
 devcluster/www-data/prod/hello 
  INFO] Checking status of devcluster/www-data/prod/hello
  INFO] 
 No tasks to kill found for job devcluster/www-data/prod/hello
 Job killall succeeded
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 30010: [AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints

2015-01-17 Thread Florian Pfeiffer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30010/
---

(Updated Jan. 18, 2015, 4:06 vorm.)


Review request for Aurora and Zameer Manji.


Repository: aurora


Description
---

[AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints

This is the first step for AURORA-184, that removes the default hostrack limit 
constraints.
The second step that's still missing would be to add s.th. like 
--default-constraints as start parameter to the scheduler. 

AURORA-174 could probably be closed with this?(since the rack limit constraint 
can be configured in the .aurora file)

I can't really estimate the effect of my changes in 
StorageBackfillTestSchedulerThriftInterfaceTest, please have a closer look at 
the changes I did there.

Since this is also my first code submit, comments about codestyleother bad 
habbits are very appreciated.


Diffs
-

  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 01b03508afac37b5a8f0ec5c3da1460695e1ef59 
  
src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
 dc2cb37adf32df0a6e4c7ee2ba776ba9f1f3c2f8 
  src/test/java/org/apache/aurora/scheduler/storage/StorageBackfillTest.java 
7eafe074b686d55ad96018006cf4acfa823513c3 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 ad9126c32893080e128d086ea3bfd7ad23d27b89 

Diff: https://reviews.apache.org/r/30010/diff/


Testing
---

Added test for ConfigurationManager.hasName 
Added test testNoHostAndRackConstraintsAdded, that checks if the constraints 
are present
Tested on vagrant devcluster to see if constraints are also gone in real life


Thanks,

Florian Pfeiffer



Review Request 30010: [AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints

2015-01-17 Thread Florian Pfeiffer

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30010/
---

Review request for Aurora.


Repository: aurora


Description
---

[AURORA-184] Remove hardcoded 'host' and 'rack' limit constraints

This is the first step for AURORA-184, that removes the default hostrack limit 
constraints.
The second step that's still missing would be to add s.th. like 
--default-constraints as start parameter to the scheduler. 

AURORA-174 could probably be closed with this?(since the rack limit constraint 
can be configured in the .aurora file)

I can't really estimate the effect of my changes in 
StorageBackfillTestSchedulerThriftInterfaceTest, please have a closer look at 
the changes I did there.

Since this is also my first code submit, comments about codestyleother bad 
habbits are very appreciated.


Diffs
-

  
src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
 01b03508afac37b5a8f0ec5c3da1460695e1ef59 
  
src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
 dc2cb37adf32df0a6e4c7ee2ba776ba9f1f3c2f8 
  src/test/java/org/apache/aurora/scheduler/storage/StorageBackfillTest.java 
7eafe074b686d55ad96018006cf4acfa823513c3 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 ad9126c32893080e128d086ea3bfd7ad23d27b89 

Diff: https://reviews.apache.org/r/30010/diff/


Testing
---

Added test for ConfigurationManager.hasName 
Added test testNoHostAndRackConstraintsAdded, that checks if the constraints 
are present
Tested on vagrant devcluster to see if constraints are also gone in real life


Thanks,

Florian Pfeiffer