Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review70992
---

Ship it!


Master (8bcb2ba) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 4, 2015, 5:24 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 https://issues.apache.org/jira/browse/AURORA-1010
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Added pulsing support into the JobUpdateController. The qualified coordinated 
 updates get blocked until a pulse arrives. An update then becomes active and 
 proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
 terminal state (whichever comes first).
 
 Not particularly happy with plumbing through OneWayJobUpdater but the 
 alternative is a state machine change, which is much hairier and will require 
 more changes in the JobUpdaterController. Going with the minimal diff here.
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
 b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
 d479d20259f284528b2291e2e486b36d8e47fe5e 
   
 src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
  60f535988a20638fb16515d5027bfa65f1afb73c 
   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
 d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
 76460f95e058181b711fb6869f5a34c1d5bdfe31 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  f9c9ceddc559b43b4a5c45c745d54ff47484edde 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  ca7c0c2675477cc727ca006697665f997972dfde 
   
 src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
  89765ac3d34a827d3748fb96a275d78e9d1b8b72 
   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
 41d422939209d0808235128e4242c11e8ef25969 
   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
 
 Diff: https://reviews.apache.org/r/30225/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 4, 2015, 1:47 a.m., Maxim Khutornenko wrote:
  src/test/python/apache/aurora/client/cli/test_cron.py, line 138
  https://reviews.apache.org/r/30586/diff/2/?file=846914#file846914line138
 
  You may want to drop unused CLUSTER patching in other tests as well.
 
 Zameer Manji wrote:
 The other tests require the CLUSTER patching.

In their current shape of intergration tests punching through the API - they 
do. However, you can convert them to use fake api (similar to the test you 
added) and make them true unit tests.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/#review70895
---


On Feb. 4, 2015, 1:14 a.m., Zameer Manji wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30586/
 ---
 
 (Updated Feb. 4, 2015, 1:14 a.m.)
 
 
 Review request for Aurora, Joshua Cohen and Bill Farner.
 
 
 Bugs: AURORA-1094
 https://issues.apache.org/jira/browse/AURORA-1094
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fix exception when opening cron urls.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/cli/cron.py 
 3416c8e1932056725880f2007b60d77112759428 
   src/test/python/apache/aurora/client/cli/test_cron.py 
 f488432cd68cc68fab8fce968e8605625ea3f56a 
 
 Diff: https://reviews.apache.org/r/30586/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/cli:cron
 
 
 Thanks,
 
 Zameer Manji
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
---

(Updated Feb. 4, 2015, 5:24 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
---

Fixing method name escaped during rebase.


Bugs: AURORA-1010
https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
---

Added pulsing support into the JobUpdateController. The qualified coordinated 
updates get blocked until a pulse arrives. An update then becomes active and 
proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
terminal state (whichever comes first).

Not particularly happy with plumbing through OneWayJobUpdater but the 
alternative is a state machine change, which is much hairier and will require 
more changes in the JobUpdaterController. Going with the minimal diff here.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
d479d20259f284528b2291e2e486b36d8e47fe5e 
  
src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  
src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 
ca7c0c2675477cc727ca006697665f997972dfde 
  
src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
41d422939209d0808235128e4242c11e8ef25969 
  src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
---

./gradlew -Pq build


Thanks,

Maxim Khutornenko



Re: Review Request 30585: Adding command hook for beta-update start.

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30585/#review71014
---

Ship it!


Ship It!

- Zameer Manji


On Feb. 3, 2015, 4:24 p.m., George Sirois wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30585/
 ---
 
 (Updated Feb. 3, 2015, 4:24 p.m.)
 
 
 Review request for Aurora and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding command hook for beta-update start.
 
 Also contains an update for the hooks documentation.
 
 
 Diffs
 -
 
   docs/hooks.md 533c81df9a5934ea903e3dbfb9fca6a211ceba21 
   src/main/python/apache/aurora/client/hooks/hooked_api.py 
 bc61e91af6de06ecfc37eddd846c096a5155d7eb 
   src/test/python/apache/aurora/client/hooks/test_hooked_api.py 
 a1f474e1a4f0bcdbd0062757314ede1b7bb37f38 
 
 Diff: https://reviews.apache.org/r/30585/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/hooks:hooked_api
 
 
 Thanks,
 
 George Sirois
 




Re: Review Request 30585: Adding command hook for beta-update start.

2015-02-04 Thread George Sirois


 On Feb. 4, 2015, 1:59 a.m., David McLaughlin wrote:
  LGTM, thanks for the patch. 
  
  Although I'm a little bit concerned about adding support for hooks in an 
  environment where we can't support post-hooks.

Thanks!

I agree that it is certainly not ideal, but it does allow us to at least 
publish notifications when an update begins.


- George


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30585/#review70898
---


On Feb. 4, 2015, 12:24 a.m., George Sirois wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30585/
 ---
 
 (Updated Feb. 4, 2015, 12:24 a.m.)
 
 
 Review request for Aurora and Zameer Manji.
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Adding command hook for beta-update start.
 
 Also contains an update for the hooks documentation.
 
 
 Diffs
 -
 
   docs/hooks.md 533c81df9a5934ea903e3dbfb9fca6a211ceba21 
   src/main/python/apache/aurora/client/hooks/hooked_api.py 
 bc61e91af6de06ecfc37eddd846c096a5155d7eb 
   src/test/python/apache/aurora/client/hooks/test_hooked_api.py 
 a1f474e1a4f0bcdbd0062757314ede1b7bb37f38 
 
 Diff: https://reviews.apache.org/r/30585/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/hooks:hooked_api
 
 
 Thanks,
 
 George Sirois
 




Re: Review Request 30649: Upgrade pants to 0.0.28

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30649/#review71148
---


Please put the pants release notes in the commit. Or at least some sort of 
summary that explains what is new.

- Zameer Manji


On Feb. 4, 2015, 6:46 p.m., Joe Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30649/
 ---
 
 (Updated Feb. 4, 2015, 6:46 p.m.)
 
 
 Review request for Aurora and Brian Wickman.
 
 
 Bugs: AURORA-1104
 https://issues.apache.org/jira/browse/AURORA-1104
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Upgrade pants to 0.0.28
 
 
 Diffs
 -
 
   .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 
   build-support/pants_requirements.txt 
 fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 
 
 Diff: https://reviews.apache.org/r/30649/diff/
 
 
 Testing
 ---
 
 Reviewed the diff [for 
 twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194)
  and
 
 ```
 $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
 ```
 
 
 Thanks,
 
 Joe Smith
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.

I think avoid acquisition of a write lock here is a good goal to aim for.  If, 
for example, we are already holding the write lock for a large snapshot, we 
could cause it to appear as though we have not received a pulse for an extended 
duration. - makes sense, I can try to split the recording part from the status 
update via an async action. 

Also, given that there is likely to be an automated system on the other end, 
we should be somewhat defensive to a high call frequency.  This suggests to me 
that potentially-expensive read operations should be avoided if possible. - I 
don't see how this would be possible without breaking the RPC contract as I 
still have to pull JobUpdateDetails in order to respond to the pulse with 
proper status message (FINISHED|PAUSED|OK).


It's also apparent that there is currently no asynchronous transition to the 
'blocked' states, they're triggered either by a tardy pulse or another external 
action.  I think this would be surprising behavior, and the scheduler should 
automatically transition to these states without any external input. - the 
transition to BLOCKED state happens during the evaluation loop similar to other 
internally triggered state transitions (FAILED|ERROR|ROLL_FORWARD_COMPLETED and 
etc.). Transitions to these states do not happen instantenously and are 
triggered by instance activities (i.e. task state change events). I don't see 
why BLOCKED state should be any different. The update is not technically alive 
until the evaluation event arrives. Changing this behavior just for heartbeats 
is not worth the extra complexity and will be inconsistent with the rest of the 
feature.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
---


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 4, 2015, 5:24 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 https://issues.apache.org/jira/browse/AURORA-1010
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Added pulsing support into the JobUpdateController. The qualified coordinated 
 updates get blocked until a pulse arrives. An update then becomes active and 
 proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
 terminal state (whichever comes first).
 
 Not particularly happy with plumbing through OneWayJobUpdater but the 
 alternative is a state machine change, which is much hairier and will require 
 more changes in the JobUpdaterController. Going with the minimal diff here.
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
 b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
 d479d20259f284528b2291e2e486b36d8e47fe5e 
   
 src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
  60f535988a20638fb16515d5027bfa65f1afb73c 
   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
 d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
   
 

Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
---


I suggest you skip to the big comment before paying attention to the smaller 
ones.  If you agree with the concern, i suggest an initial focused diff that 
implements receiving heartbeats, and asynchronously reacting to the lack of 
their presence.


api/src/main/thrift/org/apache/aurora/gen/api.thrift
https://reviews.apache.org/r/30225/#comment116606

Maybe s/BLOCKED/AWAITING_PULSE/?

That would at least self-document and avoid additional terminology.



src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java
https://reviews.apache.org/r/30225/#comment116607

s/query/fetch/ to be consistent with the method above.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
https://reviews.apache.org/r/30225/#comment116619

Remove Optional here, do Optional.of() at the call site.



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
https://reviews.apache.org/r/30225/#comment116620

TODO + ticket, this map needs to be exposed via a debug endpoint



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
https://reviews.apache.org/r/30225/#comment116626

I think avoid acquisition of a write lock here is a good goal to aim for.  
If, for example, we are already holding the write lock for a large snapshot, we 
could cause it to appear as though we have not received a pulse for an extended 
duration.

Also, given that there is likely to be an automated system on the other 
end, we should be somewhat defensive to a high call frequency.  This suggests 
to me that potentially-expensive read operations should be avoided if possible.

For these reasons, i think it would be wise to decouple recording of pulses 
from reacting to them.  It's also apparent that there is currently no 
asynchronous transition to the 'blocked' states, they're triggered either by a 
tardy pulse or another external action.  I think this would be surprising 
behavior, and the scheduler should automatically transition to these states 
without any external input.



src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
https://reviews.apache.org/r/30225/#comment116643

is it possible to reduce the diff a bit by moving this block to its former 
location, or is it needed


- Bill Farner


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 4, 2015, 5:24 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 https://issues.apache.org/jira/browse/AURORA-1010
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Added pulsing support into the JobUpdateController. The qualified coordinated 
 updates get blocked until a pulse arrives. An update then becomes active and 
 proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
 terminal state (whichever comes first).
 
 Not particularly happy with plumbing through OneWayJobUpdater but the 
 alternative is a state machine change, which is much hairier and will require 
 more changes in the JobUpdaterController. Going with the minimal diff here.
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
 b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
 d479d20259f284528b2291e2e486b36d8e47fe5e 
   
 src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
  60f535988a20638fb16515d5027bfa65f1afb73c 
   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
 d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
 76460f95e058181b711fb6869f5a34c1d5bdfe31 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  f9c9ceddc559b43b4a5c45c745d54ff47484edde 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  ca7c0c2675477cc727ca006697665f997972dfde 
   
 

Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.

2015-02-04 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28617/#review71079
---


Can you see any opportunity to break this diff apart?  As it stands i'm having 
a hard time giving a thoughtful review.  Perhaps you can start by introducing 
the `Assignment` class?


src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java
https://reviews.apache.org/r/28617/#comment116667

What motivates you to supply these as overridden methods rather than a 
configuration object?  The latter seems like better encapsulation.


- Bill Farner


On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28617/
 ---
 
 (Updated Feb. 4, 2015, 11:38 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney and Bill Farner.
 
 
 Bugs: AURORA-909
 https://issues.apache.org/jira/browse/AURORA-909
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Modified the task offer/task matching logic to skip offer matching for tasks 
 previously vetoed statically.
 
 Real life testing in vagrant (see pictures) shows close to 50% improvement in 
 task scheduling performance.
 
 Testing with JMH shows over 97% better perf when testing with disabled 
 preemptor (1 scheduling loop):
 ```
 Master
 Benchmark 
  Mode  SamplesScore  ErrorUnits
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1008291046.074 ±   145251.995  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  1007522269.050 ±   142446.265  ns/op
 
 This RB
 Benchmark 
  Mode  Samples   Score   Error Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100 204171.046 ±   3800.124  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100 215854.129 ±   8959.851  ns/op
 ```
 
 Testing with preemptor enabled and no tasks eligible for preemption gives 
 around 40% improvement (2 scheduling loops):
 ```
 Master
 Benchmark 
  Mode  Samples  Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1001767479.299 ±   26907.571  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  1001538682.287 ±  119119.911  ns/op
 
 This RB
 Benchmark 
  Mode  Samples  Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1001105731.141 ±   10040.721  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100 939230.662 ±   11091.505  ns/op
 ```
 
 Testing with preemptor enabled and running the worst case possible scenario 
 (every slave is eligible and all tasks are victims) yields the least 
 improvement 2-3% (3 scheduling loops).
 ```
 Master
 Benchmark 
  Mode  Samples  ScoreError  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100   11043701.243 ±  40550.259  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100   10478631.055 ± 178833.158  ns/op
 o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
 avgt  100  116258653.000 ± 403080.017  ns/op
 
 This RB
 Benchmark 
  Mode  Samples Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100   10886116.889 ± 193934.324  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100   10182572.955 ±  35740.891  ns/op
 o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
 avgt  100  113656994.000 ± 424163.759  ns/op
 ```
 
 
 Diffs
 -
 
   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
 8c11ef8bd6609f3e4d97ca154d922898f8362446 
   src/jmh/java/org/apache/aurora/benchmark/Tasks.java 
 1a35f9ee9e8e76def0f9bf5454cf8cbdf6a89c25 
   

Re: Review Request 30467: Update mesos lib to 0.21.1

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30467/#review71096
---

Ship it!


Master (b49e1a0) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 4, 2015, 11:35 p.m., Bill Farner wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30467/
 ---
 
 (Updated Feb. 4, 2015, 11:35 p.m.)
 
 
 Review request for Aurora and Kevin Sweeney.
 
 
 Bugs: AURORA-1080
 https://issues.apache.org/jira/browse/AURORA-1080
 
 
 Repository: aurora
 
 
 Description
 ---
 
 This is needed to take advantage of the task reconciliation API and new 
 status update fields.
 
 
 Diffs
 -
 
   3rdparty/python/requirements.txt 0c88e4c299ffe5886bb3ab7d83b0187cb4b7888a 
   build-support/python/make-mesos-native-egg 
 2cba8ede04e22aee1e6ec5c979161904a6f8fddb 
   build.gradle 8cec4a78d8de6fd98986b9507c67a416e70735f8 
   docs/deploying-aurora-scheduler.md 9bf5b5a92bf65b31b2f8fda102003b113f61186a 
   examples/vagrant/provision-dev-cluster.sh 
 fa0de88314169bc9fe31aaf41126cedd0865ed5a 
   examples/vagrant/upstart/aurora-scheduler.conf 
 b31e735cd76204e226447dca7dbd3c8ee13cf56e 
   examples/vagrant/upstart/mesos-master.conf 
 23d457b7adac79b75b790535b43082b55356d60f 
   examples/vagrant/upstart/mesos-slave.conf 
 e00c9fa096d497593e4d6b423e5c69ec0c1b964a 
   src/test/sh/org/apache/aurora/e2e/http/http_example.aurora 
 4e98929da19a2515d0c4de59dac9018531ef47c9 
   src/test/sh/org/apache/aurora/e2e/http/http_example_docker_updated.aurora 
 20f604bc775e93dd2324468b768a73a8626a3a64 
   src/test/sh/org/apache/aurora/e2e/http/http_example_updated.aurora 
 98f5c47ca601bdd629114dd13c5c0e2dd60188ec 
 
 Diff: https://reviews.apache.org/r/30467/diff/
 
 
 Testing
 ---
 
 End-to-end tests pass.
 
 
 Thanks,
 
 Bill Farner
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Bill Farner


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.
 
 Maxim Khutornenko wrote:
 I think avoid acquisition of a write lock here is a good goal to aim 
 for.  If, for example, we are already holding the write lock for a large 
 snapshot, we could cause it to appear as though we have not received a pulse 
 for an extended duration. - makes sense, I can try to split the recording 
 part from the status update via an async action. 
 
 Also, given that there is likely to be an automated system on the other 
 end, we should be somewhat defensive to a high call frequency.  This suggests 
 to me that potentially-expensive read operations should be avoided if 
 possible. - I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK).
 
 
 It's also apparent that there is currently no asynchronous transition to 
 the 'blocked' states, they're triggered either by a tardy pulse or another 
 external action.  I think this would be surprising behavior, and the 
 scheduler should automatically transition to these states without any 
 external input. - the transition to BLOCKED state happens during the 
 evaluation loop similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature.

 I don't see how this would be possible without breaking the RPC contract as I 
 still have to pull JobUpdateDetails in order to respond to the pulse with 
 proper status message (FINISHED|PAUSED|OK)

Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we can 
shake those requirements out?  For example, we could just return 
OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
differentiate between a paused and finished update?

I don't feel really strongly about this, but i think we should feel free to 
question the requirements at this phase.

 the transition to BLOCKED state happens during the evaluation loop similar to 
 other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature

Transitions do indeed happen asynchronously when dealing with timeouts 
(relevant code is in `InstanceActionHandler`).  These fall back to the same 
loop, but give the updater a time after which the state should be re-evaluated.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71063
---


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 4, 2015, 5:24 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 

Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28617/#review71084
---


Master (b49e1a0) is red with this patch.
  ./build-support/jenkins/build.sh

 src.test.python.apache.aurora.client.cli.config
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.cron  
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.inspect   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.job   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.plugins   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.quota 
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.sla   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.supdate   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.task  
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.update
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.version   
 .   SUCCESS
 src.test.python.apache.aurora.client.config
 .   SUCCESS
 src.test.python.apache.aurora.client.hooks.hooked_api  
 .   SUCCESS
 src.test.python.apache.aurora.client.hooks.non_hooked_api  
 .   SUCCESS
 src.test.python.apache.aurora.common.test_aurora_job_key   
 .   SUCCESS
 src.test.python.apache.aurora.common.test_cluster  
 .   SUCCESS
 src.test.python.apache.aurora.common.test_cluster_option   
 .   SUCCESS
 src.test.python.apache.aurora.common.test_clusters 
 .   SUCCESS
 src.test.python.apache.aurora.common.test_http_signaler
 .   SUCCESS
 src.test.python.apache.aurora.common.test_pex_version  
 .   SUCCESS
 src.test.python.apache.aurora.common.test_shellify 
 .   SUCCESS
 src.test.python.apache.aurora.common.test_transport
 .   SUCCESS
 src.test.python.apache.aurora.config.test_base 
 .   SUCCESS
 
src.test.python.apache.aurora.config.test_constraint_parsing
.   SUCCESS
 src.test.python.apache.aurora.config.test_loader   
 .   SUCCESS
 src.test.python.apache.aurora.config.test_thrift   
 .   SUCCESS
 src.test.python.apache.aurora.executor.common.task_info
 .   SUCCESS
 src.test.python.apache.aurora.executor.executor_base   
 .   SUCCESS
 src.test.python.apache.aurora.executor.executor_detector   
 .   SUCCESS
 src.test.python.apache.aurora.executor.executor_vars   
 .   SUCCESS
 src.test.python.apache.aurora.executor.status_manager  
 .   SUCCESS
 src.test.python.apache.aurora.executor.thermos_task_runner 
 .   FAILURE
 src.test.python.apache.thermos.common.test_pathspec
 .   SUCCESS
 
src.test.python.apache.thermos.core.test_runner_integration 
.   SUCCESS
 src.test.python.apache.thermos.monitoring.test_disk
 .   SUCCESS
 
FAILURE


   FAILURE


I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28617/
 ---
 
 (Updated Feb. 4, 2015, 11:38 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney and Bill Farner.
 
 
 Bugs: 

Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 5, 2015, 12:01 a.m., Bill Farner wrote:
  Can you see any opportunity to break this diff apart?  As it stands i'm 
  having a hard time giving a thoughtful review.  Perhaps you can start by 
  introducing the `Assignment` class?

I'd really prefer keeping this diff as a whole. The Assignment class would not 
make sense without the entire picture in mind.


 On Feb. 5, 2015, 12:01 a.m., Bill Farner wrote:
  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java, line 210
  https://reviews.apache.org/r/28617/diff/6/?file=849086#file849086line210
 
  What motivates you to supply these as overridden methods rather than a 
  configuration object?  The latter seems like better encapsulation.

Nothing in particular. It has grown organically this way. Perhaps it's time to 
refactor. I would like to address it separately though to avoid growing this 
diff any further. Will leave a TODO.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28617/#review71079
---


On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28617/
 ---
 
 (Updated Feb. 4, 2015, 11:38 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney and Bill Farner.
 
 
 Bugs: AURORA-909
 https://issues.apache.org/jira/browse/AURORA-909
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Modified the task offer/task matching logic to skip offer matching for tasks 
 previously vetoed statically.
 
 Real life testing in vagrant (see pictures) shows close to 50% improvement in 
 task scheduling performance.
 
 Testing with JMH shows over 97% better perf when testing with disabled 
 preemptor (1 scheduling loop):
 ```
 Master
 Benchmark 
  Mode  SamplesScore  ErrorUnits
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1008291046.074 ±   145251.995  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  1007522269.050 ±   142446.265  ns/op
 
 This RB
 Benchmark 
  Mode  Samples   Score   Error Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100 204171.046 ±   3800.124  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100 215854.129 ±   8959.851  ns/op
 ```
 
 Testing with preemptor enabled and no tasks eligible for preemption gives 
 around 40% improvement (2 scheduling loops):
 ```
 Master
 Benchmark 
  Mode  Samples  Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1001767479.299 ±   26907.571  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  1001538682.287 ±  119119.911  ns/op
 
 This RB
 Benchmark 
  Mode  Samples  Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1001105731.141 ±   10040.721  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100 939230.662 ±   11091.505  ns/op
 ```
 
 Testing with preemptor enabled and running the worst case possible scenario 
 (every slave is eligible and all tasks are victims) yields the least 
 improvement 2-3% (3 scheduling loops).
 ```
 Master
 Benchmark 
  Mode  Samples  ScoreError  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100   11043701.243 ±  40550.259  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100   10478631.055 ± 178833.158  ns/op
 o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
 avgt  100  116258653.000 ± 403080.017  ns/op
 
 This RB
 Benchmark 
  Mode  Samples Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100   10886116.889 ± 193934.324  ns/op
 

Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.

2015-02-04 Thread Bill Farner


 On Feb. 5, 2015, 12:01 a.m., Bill Farner wrote:
  Can you see any opportunity to break this diff apart?  As it stands i'm 
  having a hard time giving a thoughtful review.  Perhaps you can start by 
  introducing the `Assignment` class?
 
 Maxim Khutornenko wrote:
 I'd really prefer keeping this diff as a whole. The Assignment class 
 would not make sense without the entire picture in mind.

It's okay if you commit them back-to-back, but we really need to exercise a 
pattern for incrementally building features even if they appear as code islands 
in the commit history.  Doing so makes it easier to review with confidence, and 
more likely that you get faster review turnaround.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28617/#review71079
---


On Feb. 4, 2015, 11:38 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/28617/
 ---
 
 (Updated Feb. 4, 2015, 11:38 p.m.)
 
 
 Review request for Aurora, Kevin Sweeney and Bill Farner.
 
 
 Bugs: AURORA-909
 https://issues.apache.org/jira/browse/AURORA-909
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Modified the task offer/task matching logic to skip offer matching for tasks 
 previously vetoed statically.
 
 Real life testing in vagrant (see pictures) shows close to 50% improvement in 
 task scheduling performance.
 
 Testing with JMH shows over 97% better perf when testing with disabled 
 preemptor (1 scheduling loop):
 ```
 Master
 Benchmark 
  Mode  SamplesScore  ErrorUnits
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1008291046.074 ±   145251.995  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  1007522269.050 ±   142446.265  ns/op
 
 This RB
 Benchmark 
  Mode  Samples   Score   Error Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100 204171.046 ±   3800.124  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100 215854.129 ±   8959.851  ns/op
 ```
 
 Testing with preemptor enabled and no tasks eligible for preemption gives 
 around 40% improvement (2 scheduling loops):
 ```
 Master
 Benchmark 
  Mode  Samples  Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1001767479.299 ±   26907.571  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  1001538682.287 ±  119119.911  ns/op
 
 This RB
 Benchmark 
  Mode  Samples  Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  1001105731.141 ±   10040.721  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100 939230.662 ±   11091.505  ns/op
 ```
 
 Testing with preemptor enabled and running the worst case possible scenario 
 (every slave is eligible and all tasks are victims) yields the least 
 improvement 2-3% (3 scheduling loops).
 ```
 Master
 Benchmark 
  Mode  Samples  ScoreError  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100   11043701.243 ±  40550.259  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100   10478631.055 ± 178833.158  ns/op
 o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
 avgt  100  116258653.000 ± 403080.017  ns/op
 
 This RB
 Benchmark 
  Mode  Samples Score Error  Units
 o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
avgt  100   10886116.889 ± 193934.324  ns/op
 o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
  avgt  100   10182572.955 ±  35740.891  ns/op
 o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
 avgt  100  113656994.000 ± 424163.759  ns/op
 ```
 
 
 Diffs
 -
 
 

Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.
 
 Maxim Khutornenko wrote:
 I think avoid acquisition of a write lock here is a good goal to aim 
 for.  If, for example, we are already holding the write lock for a large 
 snapshot, we could cause it to appear as though we have not received a pulse 
 for an extended duration. - makes sense, I can try to split the recording 
 part from the status update via an async action. 
 
 Also, given that there is likely to be an automated system on the other 
 end, we should be somewhat defensive to a high call frequency.  This suggests 
 to me that potentially-expensive read operations should be avoided if 
 possible. - I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK).
 
 
 It's also apparent that there is currently no asynchronous transition to 
 the 'blocked' states, they're triggered either by a tardy pulse or another 
 external action.  I think this would be surprising behavior, and the 
 scheduler should automatically transition to these states without any 
 external input. - the transition to BLOCKED state happens during the 
 evaluation loop similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature.
 
 Bill Farner wrote:
  I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK)
 
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update?
 
 I don't feel really strongly about this, but i think we should feel free 
 to question the requirements at this phase.
 
  the transition to BLOCKED state happens during the evaluation loop 
 similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature
 
 Transitions do indeed happen asynchronously when dealing with timeouts 
 (relevant code is in `InstanceActionHandler`).  These fall back to the same 
 loop, but give the updater a time after which the state should be 
 re-evaluated.

Yeah, this crossed my mind.  Perhaps we should poke at the API to see if we 
can shake those requirements out?  For example, we could just return 
OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
differentiate between a paused and finished update? I don't feel really 
strongly about this, but i think we should feel free to question the 
requirements at this phase.

I don't think this would buy us much if external service starts following up 
each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the contract 
harder to consume. If we are concerned about obuse I think having a rate 

Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Bill Farner


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.
 
 Maxim Khutornenko wrote:
 I think avoid acquisition of a write lock here is a good goal to aim 
 for.  If, for example, we are already holding the write lock for a large 
 snapshot, we could cause it to appear as though we have not received a pulse 
 for an extended duration. - makes sense, I can try to split the recording 
 part from the status update via an async action. 
 
 Also, given that there is likely to be an automated system on the other 
 end, we should be somewhat defensive to a high call frequency.  This suggests 
 to me that potentially-expensive read operations should be avoided if 
 possible. - I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK).
 
 
 It's also apparent that there is currently no asynchronous transition to 
 the 'blocked' states, they're triggered either by a tardy pulse or another 
 external action.  I think this would be surprising behavior, and the 
 scheduler should automatically transition to these states without any 
 external input. - the transition to BLOCKED state happens during the 
 evaluation loop similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature.
 
 Bill Farner wrote:
  I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK)
 
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update?
 
 I don't feel really strongly about this, but i think we should feel free 
 to question the requirements at this phase.
 
  the transition to BLOCKED state happens during the evaluation loop 
 similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature
 
 Transitions do indeed happen asynchronously when dealing with timeouts 
 (relevant code is in `InstanceActionHandler`).  These fall back to the same 
 loop, but give the updater a time after which the state should be 
 re-evaluated.
 
 Maxim Khutornenko wrote:
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update? I don't feel really 
 strongly about this, but i think we should feel free to question the 
 requirements at this phase.
 
 I don't think this would buy us much if external service starts following 
 up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the 
 contract harder to consume. If we are 

Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.
 
 Maxim Khutornenko wrote:
 I think avoid acquisition of a write lock here is a good goal to aim 
 for.  If, for example, we are already holding the write lock for a large 
 snapshot, we could cause it to appear as though we have not received a pulse 
 for an extended duration. - makes sense, I can try to split the recording 
 part from the status update via an async action. 
 
 Also, given that there is likely to be an automated system on the other 
 end, we should be somewhat defensive to a high call frequency.  This suggests 
 to me that potentially-expensive read operations should be avoided if 
 possible. - I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK).
 
 
 It's also apparent that there is currently no asynchronous transition to 
 the 'blocked' states, they're triggered either by a tardy pulse or another 
 external action.  I think this would be surprising behavior, and the 
 scheduler should automatically transition to these states without any 
 external input. - the transition to BLOCKED state happens during the 
 evaluation loop similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature.
 
 Bill Farner wrote:
  I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK)
 
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update?
 
 I don't feel really strongly about this, but i think we should feel free 
 to question the requirements at this phase.
 
  the transition to BLOCKED state happens during the evaluation loop 
 similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature
 
 Transitions do indeed happen asynchronously when dealing with timeouts 
 (relevant code is in `InstanceActionHandler`).  These fall back to the same 
 loop, but give the updater a time after which the state should be 
 re-evaluated.
 
 Maxim Khutornenko wrote:
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update? I don't feel really 
 strongly about this, but i think we should feel free to question the 
 requirements at this phase.
 
 I don't think this would buy us much if external service starts following 
 up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the 
 contract harder to consume. If we are 

Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71114
---



src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
https://reviews.apache.org/r/30225/#comment116725

I think we still want to update the last pulse time even if it's paused?


- David McLaughlin


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 4, 2015, 5:24 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 https://issues.apache.org/jira/browse/AURORA-1010
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Added pulsing support into the JobUpdateController. The qualified coordinated 
 updates get blocked until a pulse arrives. An update then becomes active and 
 proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
 terminal state (whichever comes first).
 
 Not particularly happy with plumbing through OneWayJobUpdater but the 
 alternative is a state machine change, which is much hairier and will require 
 more changes in the JobUpdaterController. Going with the minimal diff here.
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
 b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
 d479d20259f284528b2291e2e486b36d8e47fe5e 
   
 src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
  60f535988a20638fb16515d5027bfa65f1afb73c 
   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
 d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
 76460f95e058181b711fb6869f5a34c1d5bdfe31 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  f9c9ceddc559b43b4a5c45c745d54ff47484edde 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  ca7c0c2675477cc727ca006697665f997972dfde 
   
 src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
  89765ac3d34a827d3748fb96a275d78e9d1b8b72 
   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
 41d422939209d0808235128e4242c11e8ef25969 
   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
 
 Diff: https://reviews.apache.org/r/30225/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 30647: Instrument the HealthChecker to export stats.

2015-02-04 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/
---

(Updated Feb. 5, 2015, 1:46 a.m.)


Review request for Aurora, Joshua Cohen and Bill Farner.


Bugs: AURORA-1062
https://issues.apache.org/jira/browse/AURORA-1062


Repository: aurora


Description
---

Instrument the HealthChecker to export stats.

HealthChecker plugin now should export three stats:
  consecutive_failures: number of consecutive failures experienced (resets on 
success)
  latency: how long health checks are taking in practice
  snoozed: whether or not the health checker is snoozed


Diffs
-

  src/main/python/apache/aurora/executor/common/health_checker.py 
60676ba0fbd8a218fe4309f07de28e2c66d54530 
  src/main/python/apache/aurora/executor/common/status_checker.py 
624921d68199df098ea51ee8a10815403bf58984 
  src/test/python/apache/aurora/executor/common/test_health_checker.py 
def249c2509a28f7145380f250f79202b653dc83 

Diff: https://reviews.apache.org/r/30647/diff/


Testing
---

./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common::


Thanks,

Brian Wickman



Re: Review Request 30647: Instrument the HealthChecker to export stats.

2015-02-04 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/
---

(Updated Feb. 5, 2015, 1:45 a.m.)


Review request for Aurora, Joshua Cohen and Bill Farner.


Changes
---

Rename health checker stats to health_checker.  Add reviewers.


Repository: aurora


Description
---

Instrument the HealthChecker to export stats.

HealthChecker plugin now should export three stats:
  consecutive_failures: number of consecutive failures experienced (resets on 
success)
  latency: how long health checks are taking in practice
  snoozed: whether or not the health checker is snoozed


Diffs (updated)
-

  src/main/python/apache/aurora/executor/common/health_checker.py 
60676ba0fbd8a218fe4309f07de28e2c66d54530 
  src/main/python/apache/aurora/executor/common/status_checker.py 
624921d68199df098ea51ee8a10815403bf58984 
  src/test/python/apache/aurora/executor/common/test_health_checker.py 
def249c2509a28f7145380f250f79202b653dc83 

Diff: https://reviews.apache.org/r/30647/diff/


Testing
---

./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common::


Thanks,

Brian Wickman



Review Request 30647: Instrument the HealthChecker to export stats.

2015-02-04 Thread Brian Wickman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/
---

Review request for Aurora.


Repository: aurora


Description
---

Instrument the HealthChecker to export stats.

HealthChecker plugin now should export three stats:
  consecutive_failures: number of consecutive failures experienced (resets on 
success)
  latency: how long health checks are taking in practice
  snoozed: whether or not the health checker is snoozed


Diffs
-

  src/main/python/apache/aurora/executor/common/health_checker.py 
60676ba0fbd8a218fe4309f07de28e2c66d54530 
  src/main/python/apache/aurora/executor/common/status_checker.py 
624921d68199df098ea51ee8a10815403bf58984 
  src/test/python/apache/aurora/executor/common/test_health_checker.py 
def249c2509a28f7145380f250f79202b653dc83 

Diff: https://reviews.apache.org/r/30647/diff/


Testing
---

./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common::


Thanks,

Brian Wickman



Re: Review Request 30647: Instrument the HealthChecker to export stats.

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/#review71127
---


Master (edcc252) is red with this patch.
  ./build-support/jenkins/build.sh

 src.test.python.apache.aurora.client.cli.plugins   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.quota 
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.sla   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.supdate   
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.task  
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.update
 .   SUCCESS
 src.test.python.apache.aurora.client.cli.version   
 .   SUCCESS
 src.test.python.apache.aurora.client.config
 .   SUCCESS
 src.test.python.apache.aurora.client.hooks.hooked_api  
 .   SUCCESS
 src.test.python.apache.aurora.client.hooks.non_hooked_api  
 .   SUCCESS
 src.test.python.apache.aurora.common.test_aurora_job_key   
 .   SUCCESS
 src.test.python.apache.aurora.common.test_cluster  
 .   SUCCESS
 src.test.python.apache.aurora.common.test_cluster_option   
 .   SUCCESS
 src.test.python.apache.aurora.common.test_clusters 
 .   SUCCESS
 src.test.python.apache.aurora.common.test_http_signaler
 .   SUCCESS
 src.test.python.apache.aurora.common.test_pex_version  
 .   SUCCESS
 src.test.python.apache.aurora.common.test_shellify 
 .   SUCCESS
 src.test.python.apache.aurora.common.test_transport
 .   SUCCESS
 src.test.python.apache.aurora.config.test_base 
 .   SUCCESS
 
src.test.python.apache.aurora.config.test_constraint_parsing
.   SUCCESS
 src.test.python.apache.aurora.config.test_loader   
 .   SUCCESS
 src.test.python.apache.aurora.config.test_thrift   
 .   SUCCESS
 src.test.python.apache.aurora.executor.common.announcer
 .   SUCCESS
 
src.test.python.apache.aurora.executor.common.directory_sandbox 
.   SUCCESS
 
src.test.python.apache.aurora.executor.common.executor_timeout  
.   SUCCESS
 
src.test.python.apache.aurora.executor.common.health_checker
.   FAILURE
 src.test.python.apache.aurora.executor.common.task_info
 .   SUCCESS
 src.test.python.apache.aurora.executor.executor_base   
 .   SUCCESS
 src.test.python.apache.aurora.executor.executor_detector   
 .   SUCCESS
 src.test.python.apache.aurora.executor.executor_vars   
 .   SUCCESS
 src.test.python.apache.aurora.executor.status_manager  
 .   SUCCESS
 src.test.python.apache.aurora.executor.thermos_task_runner 
 .   SUCCESS
 src.test.python.apache.thermos.common.test_pathspec
 .   SUCCESS
 
src.test.python.apache.thermos.core.test_runner_integration 
.   SUCCESS
 src.test.python.apache.thermos.monitoring.test_disk
 .   SUCCESS
 
FAILURE


   FAILURE


I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 5, 2015, 1:46 a.m., Brian Wickman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30647/
 ---
 
 (Updated Feb. 5, 2015, 1:46 a.m.)
 
 
 Review request for Aurora, Joshua Cohen and Bill Farner.
 
 
 Bugs: 

Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/
---

(Updated Feb. 4, 2015, 6:08 p.m.)


Review request for Aurora, Joshua Cohen and Bill Farner.


Changes
---

Maxim's feedback.


Bugs: AURORA-1094
https://issues.apache.org/jira/browse/AURORA-1094


Repository: aurora


Description
---

Fix exception when opening cron urls.


Diffs (updated)
-

  src/main/python/apache/aurora/client/cli/cron.py 
3416c8e1932056725880f2007b60d77112759428 
  src/test/python/apache/aurora/client/cli/test_cron.py 
f488432cd68cc68fab8fce968e8605625ea3f56a 

Diff: https://reviews.apache.org/r/30586/diff/


Testing
---

./pants goal test src/test/python/apache/aurora/client/cli:cron


Thanks,

Zameer Manji



Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/#review71128
---


Maxim, can you review the changes to the tests?

- Zameer Manji


On Feb. 4, 2015, 6:08 p.m., Zameer Manji wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30586/
 ---
 
 (Updated Feb. 4, 2015, 6:08 p.m.)
 
 
 Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner.
 
 
 Bugs: AURORA-1094
 https://issues.apache.org/jira/browse/AURORA-1094
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fix exception when opening cron urls.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/cli/cron.py 
 3416c8e1932056725880f2007b60d77112759428 
   src/test/python/apache/aurora/client/cli/test_cron.py 
 f488432cd68cc68fab8fce968e8605625ea3f56a 
 
 Diff: https://reviews.apache.org/r/30586/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/cli:cron
 
 
 Thanks,
 
 Zameer Manji
 




Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/#review71140
---

Ship it!


LGTM. Minor suggestion below.


src/test/python/apache/aurora/client/cli/test_cron.py
https://reviews.apache.org/r/30586/#comment116769

Why not just `api.schedule_cron.mock_calls == [call(...)]`?


- Maxim Khutornenko


On Feb. 5, 2015, 2:08 a.m., Zameer Manji wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30586/
 ---
 
 (Updated Feb. 5, 2015, 2:08 a.m.)
 
 
 Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner.
 
 
 Bugs: AURORA-1094
 https://issues.apache.org/jira/browse/AURORA-1094
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fix exception when opening cron urls.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/cli/cron.py 
 3416c8e1932056725880f2007b60d77112759428 
   src/test/python/apache/aurora/client/cli/test_cron.py 
 f488432cd68cc68fab8fce968e8605625ea3f56a 
 
 Diff: https://reviews.apache.org/r/30586/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/cli:cron
 
 
 Thanks,
 
 Zameer Manji
 




Review Request 30649: Upgrade pants to 0.0.28

2015-02-04 Thread Joe Smith

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30649/
---

Review request for Aurora and Brian Wickman.


Bugs: AURORA-1104
https://issues.apache.org/jira/browse/AURORA-1104


Repository: aurora


Description
---

Upgrade pants to 0.0.28


Diffs
-

  .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 
  build-support/pants_requirements.txt fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 

Diff: https://reviews.apache.org/r/30649/diff/


Testing
---

Reviewed the diff [for 
twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194)
 and

```
$ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
```


Thanks,

Joe Smith



Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/#review71138
---

Ship it!


Master (edcc252) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 5, 2015, 2:08 a.m., Zameer Manji wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30586/
 ---
 
 (Updated Feb. 5, 2015, 2:08 a.m.)
 
 
 Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner.
 
 
 Bugs: AURORA-1094
 https://issues.apache.org/jira/browse/AURORA-1094
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fix exception when opening cron urls.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/cli/cron.py 
 3416c8e1932056725880f2007b60d77112759428 
   src/test/python/apache/aurora/client/cli/test_cron.py 
 f488432cd68cc68fab8fce968e8605625ea3f56a 
 
 Diff: https://reviews.apache.org/r/30586/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/cli:cron
 
 
 Thanks,
 
 Zameer Manji
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71143
---

Ship it!


Master (edcc252) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 5, 2015, 2:34 a.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 5, 2015, 2:34 a.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 https://issues.apache.org/jira/browse/AURORA-1010
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Added pulsing support into the JobUpdateController. The qualified coordinated 
 updates get blocked until a pulse arrives. An update then becomes active and 
 proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
 terminal state (whichever comes first).
 
 Not particularly happy with plumbing through OneWayJobUpdater but the 
 alternative is a state machine change, which is much hairier and will require 
 more changes in the JobUpdaterController. Going with the minimal diff here.
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
 b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
 d479d20259f284528b2291e2e486b36d8e47fe5e 
   
 src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
  60f535988a20638fb16515d5027bfa65f1afb73c 
   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
 d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
 76460f95e058181b711fb6869f5a34c1d5bdfe31 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  f9c9ceddc559b43b4a5c45c745d54ff47484edde 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  ca7c0c2675477cc727ca006697665f997972dfde 
   
 src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 
 92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
   
 src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
  89765ac3d34a827d3748fb96a275d78e9d1b8b72 
   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
 41d422939209d0808235128e4242c11e8ef25969 
   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
 
 Diff: https://reviews.apache.org/r/30225/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/#review71109
---

Ship it!


Ship It!

- Bill Farner


On Feb. 4, 2015, 1:14 a.m., Zameer Manji wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30586/
 ---
 
 (Updated Feb. 4, 2015, 1:14 a.m.)
 
 
 Review request for Aurora, Joshua Cohen and Bill Farner.
 
 
 Bugs: AURORA-1094
 https://issues.apache.org/jira/browse/AURORA-1094
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fix exception when opening cron urls.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/cli/cron.py 
 3416c8e1932056725880f2007b60d77112759428 
   src/test/python/apache/aurora/client/cli/test_cron.py 
 f488432cd68cc68fab8fce968e8605625ea3f56a 
 
 Diff: https://reviews.apache.org/r/30586/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/cli:cron
 
 
 Thanks,
 
 Zameer Manji
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread David McLaughlin


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.
 
 Maxim Khutornenko wrote:
 I think avoid acquisition of a write lock here is a good goal to aim 
 for.  If, for example, we are already holding the write lock for a large 
 snapshot, we could cause it to appear as though we have not received a pulse 
 for an extended duration. - makes sense, I can try to split the recording 
 part from the status update via an async action. 
 
 Also, given that there is likely to be an automated system on the other 
 end, we should be somewhat defensive to a high call frequency.  This suggests 
 to me that potentially-expensive read operations should be avoided if 
 possible. - I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK).
 
 
 It's also apparent that there is currently no asynchronous transition to 
 the 'blocked' states, they're triggered either by a tardy pulse or another 
 external action.  I think this would be surprising behavior, and the 
 scheduler should automatically transition to these states without any 
 external input. - the transition to BLOCKED state happens during the 
 evaluation loop similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature.
 
 Bill Farner wrote:
  I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK)
 
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update?
 
 I don't feel really strongly about this, but i think we should feel free 
 to question the requirements at this phase.
 
  the transition to BLOCKED state happens during the evaluation loop 
 similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature
 
 Transitions do indeed happen asynchronously when dealing with timeouts 
 (relevant code is in `InstanceActionHandler`).  These fall back to the same 
 loop, but give the updater a time after which the state should be 
 re-evaluated.
 
 Maxim Khutornenko wrote:
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update? I don't feel really 
 strongly about this, but i think we should feel free to question the 
 requirements at this phase.
 
 I don't think this would buy us much if external service starts following 
 up each pulse that returns UPDATE_NOT_ACTIVE. It would just makes the 
 contract harder to consume. If we are 

Re: Review Request 30647: Instrument the HealthChecker to export stats.

2015-02-04 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30647/#review71137
---



src/test/python/apache/aurora/executor/common/test_health_checker.py
https://reviews.apache.org/r/30647/#comment116768

Any chance to have test coverage for the other two metrics?


- Maxim Khutornenko


On Feb. 5, 2015, 1:46 a.m., Brian Wickman wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30647/
 ---
 
 (Updated Feb. 5, 2015, 1:46 a.m.)
 
 
 Review request for Aurora, Joshua Cohen and Bill Farner.
 
 
 Bugs: AURORA-1062
 https://issues.apache.org/jira/browse/AURORA-1062
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Instrument the HealthChecker to export stats.
 
 HealthChecker plugin now should export three stats:
   consecutive_failures: number of consecutive failures experienced (resets on 
 success)
   latency: how long health checks are taking in practice
   snoozed: whether or not the health checker is snoozed
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/executor/common/health_checker.py 
 60676ba0fbd8a218fe4309f07de28e2c66d54530 
   src/main/python/apache/aurora/executor/common/status_checker.py 
 624921d68199df098ea51ee8a10815403bf58984 
   src/test/python/apache/aurora/executor/common/test_health_checker.py 
 def249c2509a28f7145380f250f79202b653dc83 
 
 Diff: https://reviews.apache.org/r/30647/diff/
 
 
 Testing
 ---
 
 ./pants test.pytest --no-fast src/test/python/apache/aurora/executor/common::
 
 
 Thanks,
 
 Brian Wickman
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  api/src/main/thrift/org/apache/aurora/gen/api.thrift, line 565
  https://reviews.apache.org/r/30225/diff/4/?file=848240#file848240line565
 
  Maybe s/BLOCKED/AWAITING_PULSE/?
  
  That would at least self-document and avoid additional terminology.

Renamed.


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java, line 
  49
  https://reviews.apache.org/r/30225/diff/4/?file=848242#file848242line49
 
  s/query/fetch/ to be consistent with the method above.

I started with fetch but that resulted in an overloaded method that I did not 
quite liked. Changed it back.


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 106
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line106
 
  Remove Optional here, do Optional.of() at the call site.

Removed.


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 122
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line122
 
  TODO + ticket, this map needs to be exposed via a debug endpoint

Done.


 On Feb. 4, 2015, 11:34 p.m., Bill Farner wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 273
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line273
 
  I think avoid acquisition of a write lock here is a good goal to aim 
  for.  If, for example, we are already holding the write lock for a large 
  snapshot, we could cause it to appear as though we have not received a 
  pulse for an extended duration.
  
  Also, given that there is likely to be an automated system on the other 
  end, we should be somewhat defensive to a high call frequency.  This 
  suggests to me that potentially-expensive read operations should be avoided 
  if possible.
  
  For these reasons, i think it would be wise to decouple recording of 
  pulses from reacting to them.  It's also apparent that there is currently 
  no asynchronous transition to the 'blocked' states, they're triggered 
  either by a tardy pulse or another external action.  I think this would be 
  surprising behavior, and the scheduler should automatically transition to 
  these states without any external input.
 
 Maxim Khutornenko wrote:
 I think avoid acquisition of a write lock here is a good goal to aim 
 for.  If, for example, we are already holding the write lock for a large 
 snapshot, we could cause it to appear as though we have not received a pulse 
 for an extended duration. - makes sense, I can try to split the recording 
 part from the status update via an async action. 
 
 Also, given that there is likely to be an automated system on the other 
 end, we should be somewhat defensive to a high call frequency.  This suggests 
 to me that potentially-expensive read operations should be avoided if 
 possible. - I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK).
 
 
 It's also apparent that there is currently no asynchronous transition to 
 the 'blocked' states, they're triggered either by a tardy pulse or another 
 external action.  I think this would be surprising behavior, and the 
 scheduler should automatically transition to these states without any 
 external input. - the transition to BLOCKED state happens during the 
 evaluation loop similar to other internally triggered state transitions 
 (FAILED|ERROR|ROLL_FORWARD_COMPLETED and etc.). Transitions to these states 
 do not happen instantenously and are triggered by instance activities (i.e. 
 task state change events). I don't see why BLOCKED state should be any 
 different. The update is not technically alive until the evaluation event 
 arrives. Changing this behavior just for heartbeats is not worth the extra 
 complexity and will be inconsistent with the rest of the feature.
 
 Bill Farner wrote:
  I don't see how this would be possible without breaking the RPC 
 contract as I still have to pull JobUpdateDetails in order to respond to the 
 pulse with proper status message (FINISHED|PAUSED|OK)
 
 Yeah, this crossed my mind.  Perhaps we should poke at the API to see if 
 we can shake those requirements out?  For example, we could just return 
 OK/UPDATE_NOT_ACTIVE, and the caller should follow up with a query to 
 differentiate between a paused and finished update?
 
 I don't feel really strongly about this, but i think we should feel free 
 to question the requirements at this phase.
 
  the transition to BLOCKED state happens during the evaluation loop 
 similar to other internally triggered state transitions 
 

Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko


 On Feb. 5, 2015, 1:15 a.m., David McLaughlin wrote:
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java,
   line 293
  https://reviews.apache.org/r/30225/diff/4/?file=848246#file848246line293
 
  I think we still want to update the last pulse time even if it's paused?

Makes sense. A paused update is still technically active. Changed to support 
PAUSED pulse.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/#review71114
---


On Feb. 4, 2015, 5:24 p.m., Maxim Khutornenko wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30225/
 ---
 
 (Updated Feb. 4, 2015, 5:24 p.m.)
 
 
 Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.
 
 
 Bugs: AURORA-1010
 https://issues.apache.org/jira/browse/AURORA-1010
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Added pulsing support into the JobUpdateController. The qualified coordinated 
 updates get blocked until a pulse arrives. An update then becomes active and 
 proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
 terminal state (whichever comes first).
 
 Not particularly happy with plumbing through OneWayJobUpdater but the 
 alternative is a state machine change, which is much hairier and will require 
 more changes in the JobUpdaterController. Going with the minimal diff here.
 
 
 Diffs
 -
 
   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
 4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
   src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
 7f2ec71d744d5fafac84291cc79feba3398d0e1e 
   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
 b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
   src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
 d479d20259f284528b2291e2e486b36d8e47fe5e 
   
 src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
  60f535988a20638fb16515d5027bfa65f1afb73c 
   src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
 d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
  ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
   
 src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
 76460f95e058181b711fb6869f5a34c1d5bdfe31 
   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
 27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
   
 src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
  f9c9ceddc559b43b4a5c45c745d54ff47484edde 
   
 src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java
  ca7c0c2675477cc727ca006697665f997972dfde 
   
 src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
  89765ac3d34a827d3748fb96a275d78e9d1b8b72 
   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
 41d422939209d0808235128e4242c11e8ef25969 
   src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
 7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 
 
 Diff: https://reviews.apache.org/r/30225/diff/
 
 
 Testing
 ---
 
 ./gradlew -Pq build
 
 
 Thanks,
 
 Maxim Khutornenko
 




Re: Review Request 30225: Modifying update controller to support heartbeats.

2015-02-04 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30225/
---

(Updated Feb. 5, 2015, 2:34 a.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Bill Farner.


Changes
---

Bill's and David's feedback.


Bugs: AURORA-1010
https://issues.apache.org/jira/browse/AURORA-1010


Repository: aurora


Description
---

Added pulsing support into the JobUpdateController. The qualified coordinated 
updates get blocked until a pulse arrives. An update then becomes active and 
proceeds until `blockIfNoPulsesAfterMs` expires or the update reaches a 
terminal state (whichever comes first).

Not particularly happy with plumbing through OneWayJobUpdater but the 
alternative is a state machine change, which is much hairier and will require 
more changes in the JobUpdaterController. Going with the minimal diff here.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
4f603f9e7ed004e53937a45ea8edf7241e15f5cf 
  src/main/java/org/apache/aurora/scheduler/storage/ForwardingStore.java 
7f2ec71d744d5fafac84291cc79feba3398d0e1e 
  src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
b7d8d524e15f101416889c00efc3ecf2c8d9c1a1 
  src/main/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStore.java 
d479d20259f284528b2291e2e486b36d8e47fe5e 
  
src/main/java/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.java
 60f535988a20638fb16515d5027bfa65f1afb73c 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateController.java 
d3b30d48b76d8d7c64cda006a34f7ed3296526f2 
  
src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java 
ec9d1e07abca1bf30262e1c0f741a34741e96f5d 
  src/main/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachine.java 
76460f95e058181b711fb6869f5a34c1d5bdfe31 
  src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
27a5b9026f5ac3b3bdeb32813b10435bc3dab173 
  
src/main/resources/org/apache/aurora/scheduler/storage/db/JobUpdateDetailsMapper.xml
 f9c9ceddc559b43b4a5c45c745d54ff47484edde 
  
src/test/java/org/apache/aurora/scheduler/storage/db/DBJobUpdateStoreTest.java 
ca7c0c2675477cc727ca006697665f997972dfde 
  src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 
92cfa2b30c1c4daa3ae225fc8609fbeaecdaff7a 
  
src/test/java/org/apache/aurora/scheduler/updater/JobUpdateStateMachineTest.java
 89765ac3d34a827d3748fb96a275d78e9d1b8b72 
  src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
41d422939209d0808235128e4242c11e8ef25969 
  src/test/java/org/apache/aurora/scheduler/updater/OneWayJobUpdaterTest.java 
7d0a7438b4a517e5e0d44f4e99aceb1a6d19f987 

Diff: https://reviews.apache.org/r/30225/diff/


Testing
---

./gradlew -Pq build


Thanks,

Maxim Khutornenko



Re: Review Request 28617: Implemented offer filtering for tasks with static vetoes.

2015-02-04 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28617/
---

(Updated Feb. 4, 2015, 11:38 p.m.)


Review request for Aurora, Kevin Sweeney and Bill Farner.


Changes
---

Rebased and modified benchmarks to support victim-less preemption run.


Bugs: AURORA-909
https://issues.apache.org/jira/browse/AURORA-909


Repository: aurora


Description (updated)
---

Modified the task offer/task matching logic to skip offer matching for tasks 
previously vetoed statically.

Real life testing in vagrant (see pictures) shows close to 50% improvement in 
task scheduling performance.

Testing with JMH shows over 97% better perf when testing with disabled 
preemptor (1 scheduling loop):
```
Master
Benchmark   
   Mode  SamplesScore  ErrorUnits
o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
   avgt  1008291046.074 ±   145251.995  ns/op
o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
 avgt  1007522269.050 ±   142446.265  ns/op

This RB
Benchmark   
   Mode  Samples   Score   Error Units
o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
   avgt  100 204171.046 ±   3800.124  ns/op
o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
 avgt  100 215854.129 ±   8959.851  ns/op
```

Testing with preemptor enabled and no tasks eligible for preemption gives 
around 40% improvement (2 scheduling loops):
```
Master
Benchmark   
   Mode  Samples  Score Error  Units
o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
   avgt  1001767479.299 ±   26907.571  ns/op
o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
 avgt  1001538682.287 ±  119119.911  ns/op

This RB
Benchmark   
   Mode  Samples  Score Error  Units
o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
   avgt  1001105731.141 ±   10040.721  ns/op
o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
 avgt  100 939230.662 ±   11091.505  ns/op
```

Testing with preemptor enabled and running the worst case possible scenario 
(every slave is eligible and all tasks are victims) yields the least 
improvement 2-3% (3 scheduling loops).
```
Master
Benchmark   
   Mode  Samples  ScoreError  Units
o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
   avgt  100   11043701.243 ±  40550.259  ns/op
o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
 avgt  100   10478631.055 ± 178833.158  ns/op
o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
avgt  100  116258653.000 ± 403080.017  ns/op

This RB
Benchmark   
   Mode  Samples Score Error  Units
o.a.a.b.SchedulingBenchmarks.ConstraintMismatchsSchedulingBenchmark.runBenchmark
   avgt  100   10886116.889 ± 193934.324  ns/op
o.a.a.b.SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark
 avgt  100   10182572.955 ±  35740.891  ns/op
o.a.a.b.SchedulingBenchmarks.PreemptorFallbackForLargeClusterBenchmark.runBenchmark
avgt  100  113656994.000 ± 424163.759  ns/op
```


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
8c11ef8bd6609f3e4d97ca154d922898f8362446 
  src/jmh/java/org/apache/aurora/benchmark/Tasks.java 
1a35f9ee9e8e76def0f9bf5454cf8cbdf6a89c25 
  src/main/java/org/apache/aurora/scheduler/TaskVars.java 
f017cdd26ca40138a7e141f21613ed567314c399 
  src/main/java/org/apache/aurora/scheduler/async/OfferQueue.java 
f66383830140e5eaba436f35ebb5192eee65947a 
  src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java 
ce47ff152e303fd2116bc3b9e91c0c1a8f76f258 
  src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilter.java 
6a43bcd1719e8aa32fd3fcb7387d0318c3c0b804 
  src/main/java/org/apache/aurora/scheduler/filter/SchedulingFilterImpl.java 
f06fdaeb92e154d0982bdabed5df93e7bcba9048 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
e1c29747c9854cf75bf63f6f085cf40ca68989af 
  src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 

Re: Review Request 30650: Upgrade virtualenv to 12.0.7

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30650/#review71171
---

Ship it!


Ship It!

- Zameer Manji


On Feb. 4, 2015, 8:13 p.m., Joe Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30650/
 ---
 
 (Updated Feb. 4, 2015, 8:13 p.m.)
 
 
 Review request for Aurora, Brian Wickman and Zameer Manji.
 
 
 Bugs: AURORA-979
 https://issues.apache.org/jira/browse/AURORA-979
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Upgrade virtualenv to 12.0.7
 
 virtualenv
 ==
 
 [12.0.7 changelog](https://pypi.python.org/pypi/virtualenv/12.0.7) and 
 selected highlights:
 
 * Upgrade pip to 6.0.8
 
 * Upgrade setuptools to 12.0.5
 
 * Revert several sys.path changes new in 12.0 which were breaking virtualenv.
 
 * **PROCESS** Version numbers are now simply ``X.Y`` where the leading ``1``
 has been dropped.
 * Now using pytest framework
 * Correct sys.path ordering for debian, issue #461
 * Correctly throws error on older Pythons, issue #619
 * Allow for empty $PATH, pull #601
 * Don't set prompt if $env:VIRTUAL_ENV_DISABLE_PROMPT is set for Powershell
 * Updated setuptools to 7.0
 
 
 Diffs
 -
 
   build-support/virtualenv 0175f0e38e5f995ff4e8335fb86ec14869894e69 
 
 Diff: https://reviews.apache.org/r/30650/diff/
 
 
 Testing
 ---
 
 ```sh
 $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
 ```
 
 
 Thanks,
 
 Joe Smith
 




Re: Review Request 30649: Upgrade pants to 0.0.28

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30649/#review71172
---

Ship it!


Ship It!

- Zameer Manji


On Feb. 4, 2015, 8:13 p.m., Joe Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30649/
 ---
 
 (Updated Feb. 4, 2015, 8:13 p.m.)
 
 
 Review request for Aurora, Brian Wickman and Zameer Manji.
 
 
 Bugs: AURORA-1104
 https://issues.apache.org/jira/browse/AURORA-1104
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Upgrade pants to 0.0.28
 
 pants
 =
 
 [0.0.28 changelog](https://pypi.python.org/pypi/pantsbuild.pants/0.0.28) and 
 selected highlights:
 
 * bump virtualenv version to 12.0.5 RB #1621
 * Make ‘setup-py’ show up in ‘./pants goal goals’ RB #1466
 * Support use of pytest’s –pdb mode RB #1570
 * Upgrade pex dependency to 0.8.4
 
 Pick up several perf wins
 Pick up fix that allows pex to read older pexes
 RB #1648 RB #1693
 * Refactor setting of PYTHONPATH in pants.ini RB #1586
 
 
 Diffs
 -
 
   .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 
   build-support/pants_requirements.txt 
 fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 
 
 Diff: https://reviews.apache.org/r/30649/diff/
 
 
 Testing
 ---
 
 Reviewed the diff [for 
 twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194)
 
 ```
 $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
 ```
 
 
 Thanks,
 
 Joe Smith
 




Re: Review Request 30586: Fix exception when opening cron urls.

2015-02-04 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30586/#review71150
---



src/test/python/apache/aurora/client/cli/test_cron.py
https://reviews.apache.org/r/30586/#comment116782

The config object in the argument is very complex and difficult to create. 
With our current set of test fixtures.


- Zameer Manji


On Feb. 4, 2015, 6:08 p.m., Zameer Manji wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30586/
 ---
 
 (Updated Feb. 4, 2015, 6:08 p.m.)
 
 
 Review request for Aurora, Joshua Cohen, Maxim Khutornenko, and Bill Farner.
 
 
 Bugs: AURORA-1094
 https://issues.apache.org/jira/browse/AURORA-1094
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Fix exception when opening cron urls.
 
 
 Diffs
 -
 
   src/main/python/apache/aurora/client/cli/cron.py 
 3416c8e1932056725880f2007b60d77112759428 
   src/test/python/apache/aurora/client/cli/test_cron.py 
 f488432cd68cc68fab8fce968e8605625ea3f56a 
 
 Diff: https://reviews.apache.org/r/30586/diff/
 
 
 Testing
 ---
 
 ./pants goal test src/test/python/apache/aurora/client/cli:cron
 
 
 Thanks,
 
 Zameer Manji
 




Review Request 30650: Upgrade virtualenv to 12.0.7

2015-02-04 Thread Joe Smith

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30650/
---

Review request for Aurora, Brian Wickman and Zameer Manji.


Bugs: AURORA-979
https://issues.apache.org/jira/browse/AURORA-979


Repository: aurora


Description
---

Upgrade virtualenv to 12.0.7

virtualenv
==

[12.0.7 changelog](https://pypi.python.org/pypi/virtualenv/12.0.7) and selected 
highlights:

* Upgrade pip to 6.0.8

* Upgrade setuptools to 12.0.5

* Revert several sys.path changes new in 12.0 which were breaking virtualenv.

* **PROCESS** Version numbers are now simply ``X.Y`` where the leading ``1``
has been dropped.
* Now using pytest framework
* Correct sys.path ordering for debian, issue #461
* Correctly throws error on older Pythons, issue #619
* Allow for empty $PATH, pull #601
* Don't set prompt if $env:VIRTUAL_ENV_DISABLE_PROMPT is set for Powershell
* Updated setuptools to 7.0


Diffs
-

  build-support/virtualenv 0175f0e38e5f995ff4e8335fb86ec14869894e69 

Diff: https://reviews.apache.org/r/30650/diff/


Testing
---

```sh
$ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
```


Thanks,

Joe Smith



Re: Review Request 30650: Upgrade virtualenv to 12.0.7

2015-02-04 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30650/#review71157
---

Ship it!


Master (1c78721) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing @ReviewBot 
retry

- Aurora ReviewBot


On Feb. 5, 2015, 4:13 a.m., Joe Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30650/
 ---
 
 (Updated Feb. 5, 2015, 4:13 a.m.)
 
 
 Review request for Aurora, Brian Wickman and Zameer Manji.
 
 
 Bugs: AURORA-979
 https://issues.apache.org/jira/browse/AURORA-979
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Upgrade virtualenv to 12.0.7
 
 virtualenv
 ==
 
 [12.0.7 changelog](https://pypi.python.org/pypi/virtualenv/12.0.7) and 
 selected highlights:
 
 * Upgrade pip to 6.0.8
 
 * Upgrade setuptools to 12.0.5
 
 * Revert several sys.path changes new in 12.0 which were breaking virtualenv.
 
 * **PROCESS** Version numbers are now simply ``X.Y`` where the leading ``1``
 has been dropped.
 * Now using pytest framework
 * Correct sys.path ordering for debian, issue #461
 * Correctly throws error on older Pythons, issue #619
 * Allow for empty $PATH, pull #601
 * Don't set prompt if $env:VIRTUAL_ENV_DISABLE_PROMPT is set for Powershell
 * Updated setuptools to 7.0
 
 
 Diffs
 -
 
   build-support/virtualenv 0175f0e38e5f995ff4e8335fb86ec14869894e69 
 
 Diff: https://reviews.apache.org/r/30650/diff/
 
 
 Testing
 ---
 
 ```sh
 $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
 ```
 
 
 Thanks,
 
 Joe Smith
 




Re: Review Request 30649: Upgrade pants to 0.0.28

2015-02-04 Thread Joe Smith


 On Feb. 4, 2015, 7:39 p.m., Zameer Manji wrote:
  Please put the pants release notes in the commit. Or at least some sort of 
  summary that explains what is new.

Done.


- Joe


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30649/#review71148
---


On Feb. 4, 2015, 8:07 p.m., Joe Smith wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30649/
 ---
 
 (Updated Feb. 4, 2015, 8:07 p.m.)
 
 
 Review request for Aurora and Brian Wickman.
 
 
 Bugs: AURORA-1104
 https://issues.apache.org/jira/browse/AURORA-1104
 
 
 Repository: aurora
 
 
 Description
 ---
 
 Upgrade pants to 0.0.28
 
 pants
 =
 
 [0.0.28 changelog](https://pypi.python.org/pypi/pantsbuild.pants/0.0.28) and 
 selected highlights:
 
 * bump virtualenv version to 12.0.5 RB #1621
 * Make ‘setup-py’ show up in ‘./pants goal goals’ RB #1466
 * Support use of pytest’s –pdb mode RB #1570
 * Upgrade pex dependency to 0.8.4
 
 Pick up several perf wins
 Pick up fix that allows pex to read older pexes
 RB #1648 RB #1693
 * Refactor setting of PYTHONPATH in pants.ini RB #1586
 
 
 Diffs
 -
 
   .pantsversion 24ff85581f81976c5f70fe1a8c3c0f62b5275c91 
   build-support/pants_requirements.txt 
 fb9a2d2dd5f17b3523eea1e4a7a77d69feddff39 
 
 Diff: https://reviews.apache.org/r/30649/diff/
 
 
 Testing
 ---
 
 Reviewed the diff [for 
 twitter/commons](https://github.com/twitter/commons/commit/574d79891d74a3a142e1db6b085b4c7e13c67659#diff-570e3786705ee6965f4f2f72c715382bR194)
 
 ```
 $ ./pants test.pytest --no-fast ./src/test/python/apache/aurora:all
 ```
 
 
 Thanks,
 
 Joe Smith