Re: Review Request 68110: Update URLS for git repositories

2018-07-30 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68110/#review206602
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On July 30, 2018, 10:36 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68110/
> ---
> 
> (Updated July 30, 2018, 10:36 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, and Santhosh 
> Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Update URLS for git repositories.
> 
> Updating the references from the old git URL to the new gitbox one. We 
> probably also need to update the documentation for submitting reviews to also 
> allow github.
> 
> Testing if RBs still work :)
> 
> 
> Diffs
> -
> 
>   CONTRIBUTING.md daf95bc94befcf425da2fd32fffcdaec93f3706f 
>   docs/development/committers-guide.md 
> 2650f19d057cd17b62b80833dc4a53f7f5398edf 
>   ui/package.json 567dd78a359ec4a9f167689648decb108a6247cf 
> 
> 
> Diff: https://reviews.apache.org/r/68110/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 68071: Prune updates that have no surviving job keys in the TaskStore

2018-07-26 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68071/#review206521
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On July 26, 2018, 3:04 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68071/
> ---
> 
> (Updated July 26, 2018, 3:04 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We are running into a situation where we have a lot of short-lived ad-hoc 
> services launched and their updates are sticking around for 30 days, even 
> though the tasks are garbage collected much sooner. This change picks up 
> those updates and prunes them as soon as the tasks are gone.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/pruning/JobUpdateHistoryPruner.java 
> 05ada3ccba8facc63d86736199b741bfcaca9697 
>   
> src/test/java/org/apache/aurora/scheduler/pruning/JobUpdateHistoryPrunerTest.java
>  a1bf04ab8206fc0ca301d4b1b1cbe854df209bbe 
> 
> 
> Diff: https://reviews.apache.org/r/68071/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 68047: Add size metric for memory stores, add MemSchedulerStoreTest

2018-07-25 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68047/#review206481
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On July 25, 2018, 3:37 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68047/
> ---
> 
> (Updated July 25, 2018, 3:37 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Renan DelValle, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Currently, we only track the size metrics for:
> - # of tasks via `task_store_index_(host|job)`
> - # of crons via `mem_storage_cron_size`
> 
> I am hoping to add:
> - # of attributes via `mem_storage_attributes_size`
> - # of maintenance requests via `mem_storage_maintenance_size`
> - # of job updates via `mem_storage_update_size`
> - # of quotas via `mem_storage_quota_size`
> 
> This will help us track the growth of stores over time. Additionally, I added 
> a `MemSchedulerStoreTest` since one did not exist previously and nothing was 
> extending the abtract version of the test.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemAttributeStore.java 
> 67684cfd9c17c6a86999a66dbd4dd9c2ef9a9938 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemCronJobStore.java 
> a1e1f1ef7ab3bc1b0082c31c860f144f95e78fae 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  c8d96f2cfd2fdcf8d80fd089b032dbd14a1e72b9 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java 
> 9e86b9e276ea90a249284a824705b5bbf19dcbce 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemQuotaStore.java 
> afb29fce68c432a2155d3e503d35e10c58b262be 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractAttributeStoreTest.java
>  687fd963a5e782c7892b2cbbbcaf283653aed30f 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractCronJobStoreTest.java
>  889cb01a0f1845ebddc44d6d737228f32665a628 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
>  e95955c29c28d2c4474debcde2ad3fa2a9047578 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractJobUpdateStoreTest.java
>  6af66aa79788c07b006f163b4546c25b8ff36012 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractQuotaStoreTest.java 
> e1d7da51503a1ea1c1748e14e439b81109b45047 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractSchedulerStoreTest.java
>  cb8051927b66676a1e7afbbd9e1a3d10a037f429 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractTaskStoreTest.java 
> c53e5847a673e398413b80fbd1a9bde9c3774cab 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemAttributeStoreTest.java
>  64b19a9f2ca7c60c983b4dac1704c3056513fc2b 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemCronJobStoreTest.java
>  15e0e309f6b92ffb8b268b5e0e94e81054ee7a2c 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
>  ce1a9d6a0749df465129fc8fc22b9483551ca02a 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStoreTest.java
>  cbf1bc4fe4a9ca814cd948cc99b9107c0be59615 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemQuotaStoreTest.java 
> e8324eecafd91789b6ddee24300e59399641a05e 
>   
> src/test/java/org/apache/aurora/scheduler/storage/mem/MemSchedulerStoreTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68047/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> 
> I will deploy to vagrant and ensure the new metrics are being recorded.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Santhosh Kumar Shanmugham


> On July 18, 2018, 2:21 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py
> > Lines 159 (patched)
> > <https://reviews.apache.org/r/67967/diff/1/?file=2061542#file2061542line159>
> >
> > Should we use TASK_LOST here instead? Most users interpret TASK_FAILED 
> > as their responsibility whereas TASK_LOST is more of a misshap of 
> > Aurora/Mesos/Thermos. I would think an unknown exception in the runner is 
> > part of the latter category.
> 
> Santhosh Kumar Shanmugham wrote:
> Hmm. Then we can argue that failure to create sandbox or fork the process 
> etc also should be treated as TASK_LOST? At Twitter this is really not going 
> to help us, since we have platform wrapper that cause TASK_FAILED and it is 
> already hard to differentiate user configuration failures against platform 
> dependency failures.
> 
> I wanted to keep this consistent with the rest. If TASK_LOST makes more 
> sense for you I can update it.
> 
> Stephan Erb wrote:
> You make a good point. Let's keep it at FAILED. If really necessary, I 
> could always come back later with a more complete proposal.

I think the differentiation of user vs platform failure needs a whole lot of 
clean up in the executor codebase. We have been putting this work off for 
sometime but we are starting to realize that we need this data to make better 
decisions.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
-------


On July 18, 2018, 1:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 1:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Santhosh Kumar Shanmugham


> On July 18, 2018, 2:21 p.m., Stephan Erb wrote:
> > src/main/python/apache/aurora/executor/aurora_executor.py
> > Lines 159 (patched)
> > <https://reviews.apache.org/r/67967/diff/1/?file=2061542#file2061542line159>
> >
> > Should we use TASK_LOST here instead? Most users interpret TASK_FAILED 
> > as their responsibility whereas TASK_LOST is more of a misshap of 
> > Aurora/Mesos/Thermos. I would think an unknown exception in the runner is 
> > part of the latter category.

Hmm. Then we can argue that failure to create sandbox or fork the process etc 
also should be treated as TASK_LOST? At Twitter this is really not going to 
help us, since we have platform wrapper that cause TASK_FAILED and it is 
already hard to differentiate user configuration failures against platform 
dependency failures.

I wanted to keep this consistent with the rest. If TASK_LOST makes more sense 
for you I can update it.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/#review206209
-------


On July 18, 2018, 1:27 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67967/
> ---
> 
> (Updated July 18, 2018, 1:27 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> If the ThermoTaskRunner encounters an Exception when trying to
> fork the process, it bubbles this up to the Executor which does
> not handle execptions other than TaskError. This leads to the
> executor leaving the task in STARTING state and we end up with
> tasks that get stranded in this state.
> 
> Fix it so that any unknown expection that is thrown when starting
> a runner leads to task failure and get marked as FAILED.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 8a9958fffc2312686dccc7daf6d216631d4c956e 
>   src/test/python/apache/aurora/executor/test_thermos_executor.py 
> f6ae1be5d56bfd845bd09db67efa92091136 
> 
> 
> Diff: https://reviews.apache.org/r/67967/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./pants test src/test/python/apache::
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Review Request 67967: Unhandled exception should not strand runner in STARTING state.

2018-07-18 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67967/
---

Review request for Aurora, David McLaughlin, Jordan Ly, Reza Motamedi, and 
Stephan Erb.


Repository: aurora


Description
---

If the ThermoTaskRunner encounters an Exception when trying to
fork the process, it bubbles this up to the Executor which does
not handle execptions other than TaskError. This leads to the
executor leaving the task in STARTING state and we end up with
tasks that get stranded in this state.

Fix it so that any unknown expection that is thrown when starting
a runner leads to task failure and get marked as FAILED.


Diffs
-

  src/main/python/apache/aurora/executor/aurora_executor.py 
8a9958fffc2312686dccc7daf6d216631d4c956e 
  src/test/python/apache/aurora/executor/test_thermos_executor.py 
f6ae1be5d56bfd845bd09db67efa92091136 


Diff: https://reviews.apache.org/r/67967/diff/1/


Testing
---

./gradlew test
./pants test src/test/python/apache::


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67696: Enable SLA-aware updates

2018-07-17 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review206137
---


Fix it, then Ship it!




LGTM. Minor comments.


src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java
Lines 212-216 (patched)
<https://reviews.apache.org/r/67696/#comment289042>

It will be better if we hard-fail here, since this is an unexpected 
behavior. Allowing it to be silently ignored might hide issues.



src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
Lines 1080 (patched)
<https://reviews.apache.org/r/67696/#comment289043>

s/policy/sla-policy/


- Santhosh Kumar Shanmugham


On July 16, 2018, 4 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 16, 2018, 4 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that will be respected by the scheduler during maintenance. This 
> patch integrates into the same system to allow users to specify if they want 
> their updates to also respect SLA. Please see 
> https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing
>  for a more detailed description.
> 
> This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` 
> and `message` to `JobInstanceUpdateEvent`. These should be forward and 
> backwards compatible.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
>   docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 
>   docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 
>   docs/reference/scheduler-configuration.md 
> 805e516689be019101f7c220c89fd9c391bb93b3 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> 2e13aacf576e648d9fffe989e4fc05c8954e72d8 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  6a28bc274acdd6d3ac239166771ef2d45648d60f 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> 9fa68b2dd55b4e4f5436356c1b94af1393967679 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> a002d955c3bc7b7c39da5e130e8c10c536bdcebd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec577cccb86914ebd679ca235103f79dd7e7b79d 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a 
>   src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 74ee1745e1fd7c308e2bdfa46aeb18a7ecfe14d2 
>   src/main/java/org/apache/aurora/scheduler/updater/Updates.java 
> f949fd54f524780672167e12fcadf268da08e679 
>   src/main/python/apache/aurora/client/api/updater_util.py 
> 4e3986220aaa4c9b138394b962120b176185af12 
>   src/main/python/apache/aurora/config/schema/base.py 
> 7baded79acdf863670afc183d740dcad602490c2 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> abee0951ca998894b29ee32c5362ef30da6421c7 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> dcf58896f1e866c0369ba1b78060236e98d9d46b 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractJobUpdateStoreTest.java
>  3a06a451da4ef3acccb33b5495b9fae141557148 
>   
> src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 
> 0aea369d8a8f75291de9691b6d61f3d48895507c 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
>  aa1cb2b287642e87d787e160e04a17ad0e4690d9 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> 43f857d893a54e19e71b36f2f06fef3a3ef6e874 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 5667a1b59681a6

Re: Review Request 67696: Enable SLA-aware updates

2018-07-12 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review206024
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On July 12, 2018, 2:10 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 12, 2018, 2:10 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that will be respected by the scheduler during maintenance. This 
> patch integrates into the same system to allow users to specify if they want 
> their updates to also respect SLA. Please see 
> https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing
>  for a more detailed description.
> 
> This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` 
> and `message` to `JobInstanceUpdateEvent`. These should be forward and 
> backwards compatible.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
>   docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 
>   docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 
>   docs/reference/scheduler-configuration.md 
> 805e516689be019101f7c220c89fd9c391bb93b3 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> 2e13aacf576e648d9fffe989e4fc05c8954e72d8 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> 9fa68b2dd55b4e4f5436356c1b94af1393967679 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> a002d955c3bc7b7c39da5e130e8c10c536bdcebd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec577cccb86914ebd679ca235103f79dd7e7b79d 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a 
>   src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 74ee1745e1fd7c308e2bdfa46aeb18a7ecfe14d2 
>   src/main/java/org/apache/aurora/scheduler/updater/Updates.java 
> f949fd54f524780672167e12fcadf268da08e679 
>   src/main/python/apache/aurora/client/api/updater_util.py 
> 4e3986220aaa4c9b138394b962120b176185af12 
>   src/main/python/apache/aurora/config/schema/base.py 
> 7baded79acdf863670afc183d740dcad602490c2 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> dcf58896f1e866c0369ba1b78060236e98d9d46b 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractJobUpdateStoreTest.java
>  3a06a451da4ef3acccb33b5495b9fae141557148 
>   
> src/test/java/org/apache/aurora/scheduler/testing/FakeScheduledExecutor.java 
> 0aea369d8a8f75291de9691b6d61f3d48895507c 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> 43f857d893a54e19e71b36f2f06fef3a3ef6e874 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 5667a1b59681a6de87149d7161d760bff5da3818 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 2c27ec7136ff22a3570a4ec278c73f7ee310f628 
>   
> src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java 
> PRE-CREATION 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 2baba2aa55865ec298a4c9e5af3952b56cb9a910 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobInstanceUpdateEvent
>  48902d39b9d2cbeae1a52180669aba8349e4dd65 
>   
> src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
>  08dfa5b9a67989083a0d405ce8100698a4d096ae 
>   ui/src/main/js/components/UpdateInstanceEvents.js 
> 8351f2c4c256e27625d94b70842be0e91065a551 
>   ui/src/main/js/components/UpdateSettings.js 
> d756f5916fd8c39dcbb5578ee5eda198f807f458 
> 
> 
> Diff: https://reviews.apache.org/r/67696/diff/8/
> 
> 
> Testing
> ---
> 
> Added unit tests, `./gradlew test`.
> 
> Tested at scale with over 10,000 SLA-aware instance update events occuring 
> concurrently. Scheduler stability did not seem to be affected.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 67696: Enable SLA-aware updates

2018-07-12 Thread Santhosh Kumar Shanmugham


> On July 9, 2018, 2:04 p.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java
> > Lines 140 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054728#file2054728line140>
> >
> > Repeating my comment from internal review - 
> > 
> > Technically this is can be any kind of work. Should we call this class 
> > `SlaAwareInstanceUpdater`? Since this has more to do with actually update 
> > logic than kill itself. And perhaps the method can become 
> > `startSlaAwareUpdate`?
> > 
> > The job of this class is around checking safety and scheduling the 
> > supplied work to be evaulated. Naming the class based on it usage rather 
> > than its function seems weird to me.
> 
> Jordan Ly wrote:
> This would be ideal but I think there are some small parts of the logic 
> that changing the name would be misleading towards.
> 
> For example:
> 
> ```
> storeProvider
> .getTaskStore()
> .fetchTask(taskId)
> .filter(task -> isKillable(task.getStatus()))
> .ifPresent(task -> {
>   incrementJobStatCounter(killAttemptsByJob, 
> SLA_KILL_ATTEMPT, instance.getJobKey());
>   slaManager.checkSlaThenAct(
>   task,
>   instructions.getDesiredState().getTask().getSlaPolicy(),
>   slaStoreProvider -> performKill(
>   slaStoreProvider,
>   instance,
>   key,
>   status,
>   killCommand),
>   ImmutableMap.of(),
>   // If the task is not assigned, force the update since 
> it does not affect the
>   // SLA. For example, if a task is THROTTLED or PENDING, 
> we probably don't care
>   // if the update replaces it with a new instance.
>   !SLAVE_ASSIGNED_STATES.contains(task.getStatus()));
> ```
> 
> Within the SLA aware code, we have kill-specific checks in order to 
> determine if we should continue with the action (or if it should be a NOOP), 
> or if we should force the kill regardless (the task would not affect SLA). 
> Changing the name would require a refactoring for a more complex interface 
> but I am not sure if we would ever use it for anything else.
> 
> I am leaning towards keeping the current name and functionality right now 
> until another sla-aware update use case comes in and we have a better idea on 
> how we can refactor the current abstraction (from kill to generic, which I 
> don't think should be too difficult but I'm sure I'll regret those words...)

These checks can be moved into the `killCommand`. I think the abstraction is 
not perfect as it is today.


> On July 9, 2018, 2:04 p.m., Santhosh Kumar Shanmugham wrote:
> > src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java
> > Lines 180 (patched)
> > <https://reviews.apache.org/r/67696/diff/5/?file=2054739#file2054739line180>
> >
> > Drop `_` here and everywhere. `SLA_CHECKING` sounds like a valid event 
> > that is definied in code.
> 
> Jordan Ly wrote:
> Can you elaborate on this? Did you mean drop `_MESSAGE`? I added 
> `_MESSAGE` because it is not a true "event" in the sense that it piggybacks 
> on `INSTANCE_UPDATING` or `INSTANCE_ROLLING_BACK` with a message showing 
> progress.

Can use `SLA_CHECKING_MESSAGE` or simply `sla checking event is added`. I 
started searching for an event type of `SLA_CHECKING` after reading this 
comment, which is a little confusing.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review205866
---


On July 9, 2018, 7:22 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 9, 2018, 7:22 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that

Re: Review Request 67696: Enable SLA-aware updates

2018-07-09 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67696/#review205866
---



end-to-end test for this feature?


docs/reference/scheduler-configuration.md
Lines 162 (patched)
<https://reviews.apache.org/r/67696/#comment288763>

s/The number of/The maximum number of/

This is only used for SLA-aware updates, should we name it so that they are 
grouped together with the other settings?



docs/reference/scheduler-configuration.md
Lines 227-230 (patched)
<https://reviews.apache.org/r/67696/#comment288766>

`sla_aware_kill_retry_min_delay` will be more readable. Same for the max 
delay.



src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java
Lines 56 (patched)
<https://reviews.apache.org/r/67696/#comment288767>

s/SLA passes/SLA passes or update is cancelled/



src/main/java/org/apache/aurora/scheduler/updater/SlaKillController.java
Lines 140 (patched)
<https://reviews.apache.org/r/67696/#comment288765>

Repeating my comment from internal review - 

Technically this is can be any kind of work. Should we call this class 
`SlaAwareInstanceUpdater`? Since this has more to do with actually update logic 
than kill itself. And perhaps the method can become `startSlaAwareUpdate`?

The job of this class is around checking safety and scheduling the supplied 
work to be evaulated. Naming the class based on it usage rather than its 
function seems weird to me.



src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java
Lines 78 (patched)
<https://reviews.apache.org/r/67696/#comment288772>

test case when sla kill is a noop (task is already killed for a different 
reason) - is this already covered?



src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java
Lines 180 (patched)
<https://reviews.apache.org/r/67696/#comment288769>

Drop `_` here and everywhere. `SLA_CHECKING` sounds like a valid event that 
is definied in code.



src/test/java/org/apache/aurora/scheduler/updater/SlaKillControllerTest.java
Lines 194 (patched)
<https://reviews.apache.org/r/67696/#comment288768>

Drop `_` here and everywhere. `SLA_PASSED` sounds like a real event.


- Santhosh Kumar Shanmugham


On July 9, 2018, 11:20 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67696/
> ---
> 
> (Updated July 9, 2018, 11:20 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Renan DelValle, 
> Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This patch enables SLA-aware updates.
> 
> Following https://reviews.apache.org/r/66716/, tasks may now specify custom 
> SLA policies that will be respected by the scheduler during maintenance. This 
> patch integrates into the same system to allow users to specify if they want 
> their updates to also respect SLA. Please see 
> https://docs.google.com/document/d/1lCoDyoX26qrGrptrgO7vJHqYR_L2CBRGFIywsAd8uQo/edit?usp=sharing
>  for a more detailed description.
> 
> This patch adds two optional Thrift fields, `slaAware` to `JobUpdateSettings` 
> and `message` to `JobInstanceUpdateEvent`. These should be forward and 
> backwards compatible.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
>   docs/features/job-updates.md b52eb35a1de9da40f8a00ef0b905df30069029d3 
>   docs/reference/configuration.md acab4c58d9ab3c04d156fed3636e77aed6d1faf4 
>   docs/reference/scheduler-configuration.md 
> 805e516689be019101f7c220c89fd9c391bb93b3 
>   src/main/java/org/apache/aurora/scheduler/base/Tasks.java 
> 2e13aacf576e648d9fffe989e4fc05c8954e72d8 
>   src/main/java/org/apache/aurora/scheduler/pruning/TaskHistoryPruner.java 
> 9aa51c3637df74cca088bd65c5539e1ebb8e5f0d 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> 9fa68b2dd55b4e4f5436356c1b94af1393967679 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> a002d955c3bc7b7c39da5e130e8c10c536bdcebd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  ec577cccb86914ebd679ca235103f79dd7e7b79d 
>   src/main/java/org/apache/aurora/scheduler/updater/OneWayJobUpdater.java 
> f2d33fb9ab6bd2c3ff199ab03dc75b1d6d618f3a 
>   src/main/java/org/apache/aurora/scheduler/updater/SlaKillC

Re: Review Request 67757: TaskQuery struct needs to be optional

2018-07-09 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67757/#review205863
---


Ship it!




Since Renan has tested it. I am okay with this change.

- Santhosh Kumar Shanmugham


On June 29, 2018, 4:07 p.m., Ezequiel Torres wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67757/
> ---
> 
> (Updated June 29, 2018, 4:07 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/AURORA-1991
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/AURORA-1991
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In languages like Go, types are not optionals by default.
> The actual api.thift don't let create queries with just a
> few fields in Go since all the fields are required
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
> 
> 
> Diff: https://reviews.apache.org/r/67757/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> End2End Output - branch master
>   
> https://reviews.apache.org/media/uploaded/files/2018/06/29/22d45287-bbce-4603-b21b-e3ddd46cb178__branch_master_end2end_test_output.txt
> 
> 
> Thanks,
> 
> Ezequiel Torres
> 
>



Re: Review Request 67757: TaskQuery struct needs to be optional

2018-07-09 Thread Santhosh Kumar Shanmugham


> On June 28, 2018, 11:23 a.m., Santhosh Kumar Shanmugham wrote:
> > Can you include the end-to-end test results? This takes about an hour to 
> > complete and can be invoked like so from the project root,
> > 
> > ```
> > {~/oss/aurora} (master)$ 
> > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> > ```
> 
> Ezequiel Torres wrote:
> I've tried to execute the e2e tests but they seem to failed even from the 
> master branch. I've uploaded the output of the test to the review. I'd really 
> appretiate if someone could give some insight about the cause of the errors 
> I'm seeing in the output.
> 
> Kind Regards
> Ezequiel
> 
> Santhosh Kumar Shanmugham wrote:
> Sorry. I am not able to find the output of the test in the review. Is 
> this added as a file to the diff or in the testing section?
> 
> Renan DelValle wrote:
> Unable to see the log file Ezequiel posted as well but I ran e2e tests 
> locally with his patch locally and all tests passed.
> 
> Ezequiel Torres wrote:
> Sorry guys, I forgot to hit the `publish` button after the file was 
> uploaded. I think that you're going to be able to download it right now. 
> Thanks Renan for running the e2e tests. Really appretiate it.
> 
> Even though RenĂ¡n said the tests passed, I'd really appretiate if you 
> could throw some light on the output of the e2e file that I've uploaded to be 
> able to run the tests on my own. I think it would be really helpful for 
> future contributions to the project
> 
> Kind Regards
> Ezequiel

Sorry for the delay. It looks like the end-to-end test was just flaky (I have 
seen it few times. You can vagrant destroy and recreate the VM to make sure to 
have clean start.). You should be able to retry it.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67757/#review205521
---


On June 29, 2018, 4:07 p.m., Ezequiel Torres wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67757/
> -------
> 
> (Updated June 29, 2018, 4:07 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/AURORA-1991
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/AURORA-1991
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In languages like Go, types are not optionals by default.
> The actual api.thift don't let create queries with just a
> few fields in Go since all the fields are required
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
> 
> 
> Diff: https://reviews.apache.org/r/67757/diff/1/
> 
> 
> Testing
> ---
> 
> 
> File Attachments
> 
> 
> End2End Output - branch master
>   
> https://reviews.apache.org/media/uploaded/files/2018/06/29/22d45287-bbce-4603-b21b-e3ddd46cb178__branch_master_end2end_test_output.txt
> 
> 
> Thanks,
> 
> Ezequiel Torres
> 
>



Re: Review Request 67757: TaskQuery struct needs to be optional

2018-06-29 Thread Santhosh Kumar Shanmugham


> On June 28, 2018, 11:23 a.m., Santhosh Kumar Shanmugham wrote:
> > Can you include the end-to-end test results? This takes about an hour to 
> > complete and can be invoked like so from the project root,
> > 
> > ```
> > {~/oss/aurora} (master)$ 
> > ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> > ```
> 
> Ezequiel Torres wrote:
> I've tried to execute the e2e tests but they seem to failed even from the 
> master branch. I've uploaded the output of the test to the review. I'd really 
> appretiate if someone could give some insight about the cause of the errors 
> I'm seeing in the output.
> 
> Kind Regards
> Ezequiel

Sorry. I am not able to find the output of the test in the review. Is this 
added as a file to the diff or in the testing section?


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67757/#review205521
---


On June 27, 2018, 12:04 p.m., Ezequiel Torres wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67757/
> ---
> 
> (Updated June 27, 2018, 12:04 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/AURORA-1991
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/AURORA-1991
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In languages like Go, types are not optionals by default.
> The actual api.thift don't let create queries with just a
> few fields in Go since all the fields are required
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
> 
> 
> Diff: https://reviews.apache.org/r/67757/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Ezequiel Torres
> 
>



Re: Review Request 67705: Updated restore instructions to reflect using offline rehydration tool

2018-06-29 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67705/#review205590
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On June 28, 2018, 1:47 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67705/
> ---
> 
> (Updated June 28, 2018, 1:47 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Rewrote the instructions for recovering from backup based upon using Bill's 
> tool to recover with all instances offline.
> 
> Please verify that the instructions make sense (and more importantly work) as 
> this is one of those documents that will be super critical in times of 
> catastrophic failures.
> 
> 
> Diffs
> -
> 
>   docs/operations/backup-restore.md 15e6dd22187fc4cf61624184a5c8400be83d6b6a 
> 
> 
> Diff: https://reviews.apache.org/r/67705/diff/1/
> 
> 
> Testing
> ---
> 
> Tested these instructions on a local vagrant cluster as well as in an HA 
> cluster with a quorum of 2.
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 67757: TaskQuery struct needs to be optional

2018-06-28 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67757/#review205521
---



Can you include the end-to-end test results? This takes about an hour to 
complete and can be invoked like so from the project root,

```
{~/oss/aurora} (master)$ ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
```

- Santhosh Kumar Shanmugham


On June 27, 2018, 12:04 p.m., Ezequiel Torres wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67757/
> ---
> 
> (Updated June 27, 2018, 12:04 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/AURORA-1991
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/AURORA-1991
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In languages like Go, types are not optionals by default.
> The actual api.thift don't let create queries with just a
> few fields in Go since all the fields are required
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
> 
> 
> Diff: https://reviews.apache.org/r/67757/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Ezequiel Torres
> 
>



Re: Review Request 67757: TaskQuery struct needs to be optional

2018-06-28 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67757/#review205520
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On June 27, 2018, 12:04 p.m., Ezequiel Torres wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67757/
> ---
> 
> (Updated June 27, 2018, 12:04 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Bugs: https://issues.apache.org/jira/browse/AURORA-1991
> 
> https://issues.apache.org/jira/browse/https://issues.apache.org/jira/browse/AURORA-1991
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> In languages like Go, types are not optionals by default.
> The actual api.thift don't let create queries with just a
> few fields in Go since all the fields are required
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> 7265b11103aa12743c42355163ae64e98e965d7f 
> 
> 
> Diff: https://reviews.apache.org/r/67757/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Ezequiel Torres
> 
>



Review Request 67734: Display negation of constraint in TaskConfigSummary.

2018-06-25 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67734/
---

Review request for Aurora and David McLaughlin.


Repository: aurora


Description
---

Display negation of constraint in TaskConfigSummary.


Diffs
-

  ui/src/main/js/utils/Task.js e3b4a4905839498b3e8c6692b85560f0af254f2d 


Diff: https://reviews.apache.org/r/67734/diff/1/


Testing
---

./gradlew test


File Attachments


constraints
  
https://reviews.apache.org/media/uploaded/files/2018/06/26/792ca9e1-85bc-4223-8153-159bf85f84aa__Screen_Shot_2018-06-25_at_5.18.01_PM.png


Thanks,

Santhosh Kumar Shanmugham



Review Request 67706: Fix style of TaskConfigSummary.

2018-06-22 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67706/
---

Review request for Aurora and David McLaughlin.


Repository: aurora


Description
---

Fix style of TaskConfigSummary.


Diffs
-

  ui/src/main/js/components/TaskConfigSummary.js 
f03d44d3b1d71e4bf06b6bf1156d6bf8d251bb70 


Diff: https://reviews.apache.org/r/67706/diff/1/


Testing
---

Tested on vagrant.


File Attachments


Before
  
https://reviews.apache.org/media/uploaded/files/2018/06/22/7f7b45b4-e5d2-49b6-8248-d91cf2d2c3b7__Screen_Shot_2018-06-22_at_1.14.31_PM.png
After
  
https://reviews.apache.org/media/uploaded/files/2018/06/22/144e4760-3b87-4f38-9883-f5453bd420a3__Screen_Shot_2018-06-22_at_1.16.05_PM.png


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67657: Introduce a `countdown-ms` param in Coordinator request.

2018-06-20 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67657/#review205133
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On June 20, 2018, 3:53 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67657/
> ---
> 
> (Updated June 20, 2018, 3:53 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Franck Cuny, and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> With the introduction of `timeoutSecs` for HostMaintenanceRequest
> and the `CoordinatorSlaPolicy`, it will be beneficial to expose the
> time remaining until forced maintenance to the Coordinator. Send
> the time remaining until force task maintenance as an extra query
> param to the Coordinator.
> 
> 
> Diffs
> -
> 
>   docs/features/sla-requirements.md 555b174d2324b0b1b596a3da72b0a5a67fcca153 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
>  626a68263d6118f138cd6012fd49e033b09b75f0 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
> 9c5caf4af8e8c8bad408100af2bc4fe045603340 
>   
> src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
>  c9390df25f7eacbab14a508b1926a05aac8112d6 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
> 759a1bca4814b2cf70a20eac26aaabfcef682332 
> 
> 
> Diff: https://reviews.apache.org/r/67657/diff/3/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> ./build-support/jenkins/build.sh
> 
> **Tested on Vagrant**
> 
> ***Logs from Coordinator***
> Request received for {'task': ['devcluster/vagrant/test/coordinator/0']}
> {
>   "forceMaintenanceCountdownMs": "604755646", 
>   "task": "devcluster/vagrant/test/coordinator/0", 
>   "taskConfig": {
> "assignedTask": {
>       "assignedPorts": {}, 
>   "instanceId": 0, 
>   "slaveHost": "192.168.33.7", 
>   "slaveId": "f0336813-864b-4c8f-914c-80f8cef3b61d-S0", 
>   "task": {
>   ...
> }
> Responded: True
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67657: Introduce a `countdown-ms` param in Coordinator request.

2018-06-20 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67657/
---

(Updated June 20, 2018, 3:53 p.m.)


Review request for Aurora, David McLaughlin, Franck Cuny, and Jordan Ly.


Changes
---

style fix


Repository: aurora


Description
---

With the introduction of `timeoutSecs` for HostMaintenanceRequest
and the `CoordinatorSlaPolicy`, it will be beneficial to expose the
time remaining until forced maintenance to the Coordinator. Send
the time remaining until force task maintenance as an extra query
param to the Coordinator.


Diffs (updated)
-

  docs/features/sla-requirements.md 555b174d2324b0b1b596a3da72b0a5a67fcca153 
  
src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
 626a68263d6118f138cd6012fd49e033b09b75f0 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
9c5caf4af8e8c8bad408100af2bc4fe045603340 
  
src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
 c9390df25f7eacbab14a508b1926a05aac8112d6 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
759a1bca4814b2cf70a20eac26aaabfcef682332 


Diff: https://reviews.apache.org/r/67657/diff/3/

Changes: https://reviews.apache.org/r/67657/diff/2-3/


Testing (updated)
---

./gradlew test
./build-support/jenkins/build.sh

**Tested on Vagrant**

***Logs from Coordinator***
Request received for {'task': ['devcluster/vagrant/test/coordinator/0']}
{
  "forceMaintenanceCountdownMs": "604755646", 
  "task": "devcluster/vagrant/test/coordinator/0", 
  "taskConfig": {
"assignedTask": {
  "assignedPorts": {}, 
  "instanceId": 0, 
  "slaveHost": "192.168.33.7", 
  "slaveId": "f0336813-864b-4c8f-914c-80f8cef3b61d-S0", 
  "task": {
  ...
}
Responded: True


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67657: Introduce a `countdown-ms` param in Coordinator request.

2018-06-20 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67657/
---

(Updated June 20, 2018, 3:14 p.m.)


Review request for Aurora, David McLaughlin, Franck Cuny, and Jordan Ly.


Changes
---

Moved the params to the request body.


Repository: aurora


Description
---

With the introduction of `timeoutSecs` for HostMaintenanceRequest
and the `CoordinatorSlaPolicy`, it will be beneficial to expose the
time remaining until forced maintenance to the Coordinator. Send
the time remaining until force task maintenance as an extra query
param to the Coordinator.


Diffs (updated)
-

  docs/features/sla-requirements.md 555b174d2324b0b1b596a3da72b0a5a67fcca153 
  
src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
 626a68263d6118f138cd6012fd49e033b09b75f0 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
9c5caf4af8e8c8bad408100af2bc4fe045603340 
  
src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
 c9390df25f7eacbab14a508b1926a05aac8112d6 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
759a1bca4814b2cf70a20eac26aaabfcef682332 


Diff: https://reviews.apache.org/r/67657/diff/2/

Changes: https://reviews.apache.org/r/67657/diff/1-2/


Testing (updated)
---

./gradlew test

**Tested on Vagrant**

***Logs from Coordinator***
Request received for {'task': ['devcluster/vagrant/test/coordinator/0']}
{
  "forceMaintenanceCountdownMs": "604755646", 
  "task": "devcluster/vagrant/test/coordinator/0", 
  "taskConfig": {
"assignedTask": {
  "assignedPorts": {}, 
  "instanceId": 0, 
  "slaveHost": "192.168.33.7", 
  "slaveId": "f0336813-864b-4c8f-914c-80f8cef3b61d-S0", 
  "task": {
  ...
}
Responded: True


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67657: Introduce a `countdown-ms` param in Coordinator request.

2018-06-20 Thread Santhosh Kumar Shanmugham


> On June 20, 2018, 10:24 a.m., Jordan Ly wrote:
> > The code LGTM.
> > 
> > I think it is a bit odd to have metadata like countdown-ms inside the query 
> > param of a POST request. If we add more metadata, we could soon have very 
> > long query strings.
> > 
> > Overall, I don't have a strong preference either way. I would probably err 
> > towards adding a metadata field to the json body, but no blocking concerns.

I had the same concern. Decided to go this route since I am not fully certain 
the number of params will grow beyond ~5. What customers would like to know if 
the `task` - that is going to undergo maintenance, `countdown-ms` - any forced 
maintenance countdown, `destructive` - if the maintenace will be destructive to 
state (hybrid usecases) and `message` - some human readable message.

We can do the elaborate Metadata in the JSON body approach if a need arises.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67657/#review205102
---


On June 19, 2018, 3:30 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67657/
> ---
> 
> (Updated June 19, 2018, 3:30 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Franck Cuny, and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> With the introduction of `timeoutSecs` for HostMaintenanceRequest
> and the `CoordinatorSlaPolicy`, it will be beneficial to expose the
> time remaining until forced maintenance to the Coordinator. Send
> the time remaining until force task maintenance as an extra query
> param to the Coordinator.
> 
> 
> Diffs
> -
> 
>   docs/features/sla-requirements.md 555b174d2324b0b1b596a3da72b0a5a67fcca153 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
>  626a68263d6118f138cd6012fd49e033b09b75f0 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
> 9c5caf4af8e8c8bad408100af2bc4fe045603340 
>   
> src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
>  c9390df25f7eacbab14a508b1926a05aac8112d6 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
> 759a1bca4814b2cf70a20eac26aaabfcef682332 
> 
> 
> Diff: https://reviews.apache.org/r/67657/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> **Tested on Vagrant**
> 
> ***Logs from Coordinator***
> Request received for {'countdown-ms': ['94784'], 'task': 
> ['devcluster/vagrant/test/coordinator/1']}
> Responded: False
> Request received for {'countdown-ms': ['34777'], 'task': 
> ['devcluster/vagrant/test/coordinator/1']}
> Responded: False
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Review Request 67657: Introduce a `countdown-ms` param in Coordinator request.

2018-06-19 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67657/
---

Review request for Aurora, David McLaughlin, Franck Cuny, and Jordan Ly.


Repository: aurora


Description
---

With the introduction of `timeoutSecs` for HostMaintenanceRequest
and the `CoordinatorSlaPolicy`, it will be beneficial to expose the
time remaining until forced maintenance to the Coordinator. Send
the time remaining until force task maintenance as an extra query
param to the Coordinator.


Diffs
-

  docs/features/sla-requirements.md 555b174d2324b0b1b596a3da72b0a5a67fcca153 
  
src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
 626a68263d6118f138cd6012fd49e033b09b75f0 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
9c5caf4af8e8c8bad408100af2bc4fe045603340 
  
src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
 c9390df25f7eacbab14a508b1926a05aac8112d6 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
759a1bca4814b2cf70a20eac26aaabfcef682332 


Diff: https://reviews.apache.org/r/67657/diff/1/


Testing
---

./gradlew test

**Tested on Vagrant**

***Logs from Coordinator***
Request received for {'countdown-ms': ['94784'], 'task': 
['devcluster/vagrant/test/coordinator/1']}
Responded: False
Request received for {'countdown-ms': ['34777'], 'task': 
['devcluster/vagrant/test/coordinator/1']}
Responded: False


Thanks,

Santhosh Kumar Shanmugham



Review Request 67639: Export count-down to forceful Maintenace as a metric.

2018-06-18 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67639/
---

Review request for Aurora, Franck Cuny and Jordan Ly.


Repository: aurora


Description
---

Since the scheduler enforces a maximum timeout on each
maintenance request and we now allow CoordinatorSlaPolicy
to block maintenance, we need to know which tasks are
running into the force maintenance timeout. Export maintenace
count down time as a metric brokwen down by task keys.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/base/InstanceKeys.java 
b12ac83168401c15fb1d30179ea8e4816f09cd3d 
  
src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
 7fc5990dfb04c5528a44142c3efdd6d60d08188d 


Diff: https://reviews.apache.org/r/67639/diff/1/


Testing
---

./gradlew test

**Tested in Vagrant**
sshanmugham::tw-mbp-sshanmugham {~}$ curl http://192.168.33.7:8081/vars | grep 
maintenance_countdown
 100.0%
maintenance_countdown_ms_vagrant/test/coordinator/0 264523
maintenance_countdown_ms_vagrant/test/coordinator/1 24476
sshanmugham::tw-mbp-sshanmugham {~}$ curl http://192.168.33.7:8081/vars | grep 
maintenance_countdown
 100.0%
maintenance_countdown_ms_vagrant/test/coordinator/0 264523
maintenance_countdown_ms_vagrant/test/coordinator/1 24476
sshanmugham::tw-mbp-sshanmugham {~}$ curl http://192.168.33.7:8081/vars | grep 
maintenance_countdown
 100.0%
maintenance_countdown_ms_vagrant/test/coordinator/0 264523
maintenance_countdown_ms_vagrant/test/coordinator/1 0


Thanks,

Santhosh Kumar Shanmugham



Review Request 67638: Export number of tasks lost per dedicated role.

2018-06-18 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67638/
---

Review request for Aurora, Franck Cuny and Jordan Ly.


Repository: aurora


Description
---

Export number of tasks lost per dedicated role.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/TaskVars.java 
ee20ed3ad7c17bd4ca11239a467113fc8a9e8f00 
  src/test/java/org/apache/aurora/scheduler/TaskVarsTest.java 
6321ec068bd16737f96b39d5fdd8db25f3dea15c 


Diff: https://reviews.apache.org/r/67638/diff/1/


Testing
---

./gradlew test

**Tested on Vagrant**
tasks_lost_dedicatedweb.multi 0
tasks_lost_dedicated_vagrant 2


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67627: Add observer flag to disable resource metric collection

2018-06-18 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67627/#review204936
---



Mostly LGTM.

Will the UI show 0s or empty spaces?

Can you expand on why PID namespaces breaks metrics?


docs/reference/observer-configuration.md
Lines 27 (patched)
<https://reviews.apache.org/r/67627/#comment287754>

also disk metrics



src/main/python/apache/aurora/tools/thermos_observer.py
Lines 68 (patched)
<https://reviews.apache.org/r/67627/#comment287753>

also disk metrics


- Santhosh Kumar Shanmugham


On June 18, 2018, 1:57 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67627/
> ---
> 
> (Updated June 18, 2018, 1:57 a.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Reza Motamedi, and Santhosh Kumar 
> Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add observer command line option `--disable_task_resource_collection` to
> disable the collection of CPU, memory, and disk metrics for observed tasks.
> This is useful in setups where metrics cannot be gathered reliable (e.g. when
> using PID namespaces) or when it is expensive due to hundreds of active tasks
> per host.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md edc081f502370190597ad028f3275cdfd572f5ca 
>   docs/reference/observer-configuration.md 
> c791b3480e5bf35e6eb0fbea908ff3242eab315d 
>   src/main/python/apache/aurora/config/BUILD 
> 12e7fe973f456d0847ce63d3b293131a7f4c3bdd 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> fd9465d2e2b3135f3fdf8230777117adaa89337c 
>   src/main/python/apache/thermos/monitoring/resource.py 
> 72ed4e5a82dfd8a09e0a8262f6da4992ac98542a 
>   src/main/python/apache/thermos/observer/task_observer.py 
> 94cd6c541bb7f8a4c153cc51caa63d2c0a49 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> 44450647a180f86903ebd37f2a9f4327496597e9 
> 
> 
> Diff: https://reviews.apache.org/r/67627/diff/1/
> 
> 
> Testing
> ---
> 
> We are running our Mesos agents with enabled PID namespaces (i.e.
> `--isolation='namespaces/ipc,namespaces/pid,...'`). Sometimes the hosts are
> also tightly packed with many small tasks (e.g. `~130` active tasks and 
> `~1000`
> finished tasks). Even with very relaxed scrape settings of 
> `--task_process_collection_interval_secs=3000` and
> `--task_disk_collection_interval_secs=3000` it can take between `150ms-2500ms`
> to render the observer landing page `/main`. This patch reduces this to about
> `100ms-150ms`. There is no immediate downside as metrics reporting is broken
> anyway due to the PID namespacing.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 67613: Close AsyncHttpClient on scheduler shutdown.

2018-06-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67613/
---

(Updated June 15, 2018, 1:59 p.m.)


Review request for Aurora, David McLaughlin and Jordan Ly.


Changes
---

- style fix


Bugs: AURORA-1990
https://issues.apache.org/jira/browse/AURORA-1990


Repository: aurora


Description
---

Convert SlaManager into an AbstractIdleService and explicitly
close the AsyncHttpClient on scheduler shutdown. Otherwise
we run the rise of having a stuck scheduler JVM that is unable
to shutdown due to any on the remaining non-daemon http client
threads.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
5ad12511e3ec7dda227d133b7e0a2063c352c016 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
98bec4857f1b3c247c24059150de3e4aac080a02 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
07082a99701ea1e428164e76267b908ae20508ad 


Diff: https://reviews.apache.org/r/67613/diff/3/

Changes: https://reviews.apache.org/r/67613/diff/2-3/


Testing
---

./gradlew test

**Tested in vagrant:**
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.456 
[BlockingDriverJoin, StateMachine] SchedulerLifecycle state machine transition 
DEAD -> DEAD
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.457 
[BlockingDriverJoin, SchedulerLifecycle] Shutdown already invoked, ignoring 
extra call.
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.458 
[TearDownShutdownRegistry STOPPING, StateMachine] storage state machine 
transition READY -> STOPPED
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.459 
[TearDownShutdownRegistry STOPPING, Lifecycle] Shutting down application
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.459 
[TearDownShutdownRegistry STOPPING, ShutdownRegistry$ShutdownRegistryImpl] 
Action controller has already completed, subsequent calls ignored.
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.461 [main, 
SchedulerMain] Stopping scheduler services.
**Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.470 
[SlaManager$$EnhancerByGuice$$40d3047 STOPPING, SlaManager] Shutting down 
SlaManager async http client.**
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.475 
[CronLifecycle STOPPING, CronLifecycle] Shutting down Quartz cron scheduler.
...
Jun 15 20:48:56 aurora aurora-scheduler[8719]: I0615 20:48:56.167 [main, 
SchedulerMain] Application run() exited.


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67613: Close AsyncHttpClient on scheduler shutdown.

2018-06-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67613/
---

(Updated June 15, 2018, 1:55 p.m.)


Review request for Aurora, David McLaughlin and Jordan Ly.


Changes
---

Vagrant testing


Bugs: AURORA-1990
https://issues.apache.org/jira/browse/AURORA-1990


Repository: aurora


Description
---

Convert SlaManager into an AbstractIdleService and explicitly
close the AsyncHttpClient on scheduler shutdown. Otherwise
we run the rise of having a stuck scheduler JVM that is unable
to shutdown due to any on the remaining non-daemon http client
threads.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
5ad12511e3ec7dda227d133b7e0a2063c352c016 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
98bec4857f1b3c247c24059150de3e4aac080a02 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
07082a99701ea1e428164e76267b908ae20508ad 


Diff: https://reviews.apache.org/r/67613/diff/2/


Testing (updated)
---

./gradlew test

**Tested in vagrant:**
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.456 
[BlockingDriverJoin, StateMachine] SchedulerLifecycle state machine transition 
DEAD -> DEAD
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.457 
[BlockingDriverJoin, SchedulerLifecycle] Shutdown already invoked, ignoring 
extra call.
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.458 
[TearDownShutdownRegistry STOPPING, StateMachine] storage state machine 
transition READY -> STOPPED
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.459 
[TearDownShutdownRegistry STOPPING, Lifecycle] Shutting down application
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.459 
[TearDownShutdownRegistry STOPPING, ShutdownRegistry$ShutdownRegistryImpl] 
Action controller has already completed, subsequent calls ignored.
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.461 [main, 
SchedulerMain] Stopping scheduler services.
**Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.470 
[SlaManager$$EnhancerByGuice$$40d3047 STOPPING, SlaManager] Shutting down 
SlaManager async http client.**
Jun 15 20:48:53 aurora aurora-scheduler[8719]: I0615 20:48:53.475 
[CronLifecycle STOPPING, CronLifecycle] Shutting down Quartz cron scheduler.
...
Jun 15 20:48:56 aurora aurora-scheduler[8719]: I0615 20:48:56.167 [main, 
SchedulerMain] Application run() exited.


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67613: Close AsyncHttpClient on scheduler shutdown.

2018-06-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67613/
---

(Updated June 15, 2018, 12:58 p.m.)


Review request for Aurora, David McLaughlin and Jordan Ly.


Changes
---

- add to scheduler active services


Bugs: AURORA-1990
https://issues.apache.org/jira/browse/AURORA-1990


Repository: aurora


Description
---

Convert SlaManager into an AbstractIdleService and explicitly
close the AsyncHttpClient on scheduler shutdown. Otherwise
we run the rise of having a stuck scheduler JVM that is unable
to shutdown due to any on the remaining non-daemon http client
threads.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
5ad12511e3ec7dda227d133b7e0a2063c352c016 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
98bec4857f1b3c247c24059150de3e4aac080a02 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
07082a99701ea1e428164e76267b908ae20508ad 


Diff: https://reviews.apache.org/r/67613/diff/2/

Changes: https://reviews.apache.org/r/67613/diff/1-2/


Testing
---

./gradlew test


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67613: Close AsyncHttpClient on scheduler shutdown.

2018-06-15 Thread Santhosh Kumar Shanmugham


> On June 15, 2018, 11:03 a.m., Jordan Ly wrote:
> > Oops missed one thing:
> > 
> > You need to add a scheduler active binding in the module: 
> > ```
> > SchedulerServicesModule.addSchedulerActiveServiceBinding(binder())
> >   .to([SOMETHING].class);
> > ```
> 
> Jordan Ly wrote:
> And for a quick test, can you bring up the scheduler, do an SLA drain 
> with a coordinator, and shut it down and ensure the debug message works 
> correctly?

Testing now. Will post the results shortly.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67613/#review204858
-------


On June 15, 2018, 11:02 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67613/
> ---
> 
> (Updated June 15, 2018, 11:02 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Jordan Ly.
> 
> 
> Bugs: AURORA-1990
> https://issues.apache.org/jira/browse/AURORA-1990
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Convert SlaManager into an AbstractIdleService and explicitly
> close the AsyncHttpClient on scheduler shutdown. Otherwise
> we run the rise of having a stuck scheduler JVM that is unable
> to shutdown due to any on the remaining non-daemon http client
> threads.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 5ad12511e3ec7dda227d133b7e0a2063c352c016 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
> 98bec4857f1b3c247c24059150de3e4aac080a02 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 07082a99701ea1e428164e76267b908ae20508ad 
> 
> 
> Diff: https://reviews.apache.org/r/67613/diff/1/
> 
> 
> Testing
> ---
> 
> ./gradlew test
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Re: Review Request 67613: Close AsyncHttpClient on scheduler shutdown.

2018-06-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67613/
---

(Updated June 15, 2018, 11:02 a.m.)


Review request for Aurora, David McLaughlin and Jordan Ly.


Bugs: AURORA-1990
https://issues.apache.org/jira/browse/AURORA-1990


Repository: aurora


Description
---

Convert SlaManager into an AbstractIdleService and explicitly
close the AsyncHttpClient on scheduler shutdown. Otherwise
we run the rise of having a stuck scheduler JVM that is unable
to shutdown due to any on the remaining non-daemon http client
threads.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
5ad12511e3ec7dda227d133b7e0a2063c352c016 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
98bec4857f1b3c247c24059150de3e4aac080a02 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
07082a99701ea1e428164e76267b908ae20508ad 


Diff: https://reviews.apache.org/r/67613/diff/1/


Testing
---

./gradlew test


Thanks,

Santhosh Kumar Shanmugham



Review Request 67613: Close AsyncHttpClient on scheduler shutdown.

2018-06-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67613/
---

Review request for Aurora, David McLaughlin and Jordan Ly.


Repository: aurora


Description
---

Convert SlaManager into an AbstractIdleService and explicitly
close the AsyncHttpClient on scheduler shutdown. Otherwise
we run the rise of having a stuck scheduler JVM that is unable
to shutdown due to any on the remaining non-daemon http client
threads.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
5ad12511e3ec7dda227d133b7e0a2063c352c016 
  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java 
98bec4857f1b3c247c24059150de3e4aac080a02 
  src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
07082a99701ea1e428164e76267b908ae20508ad 


Diff: https://reviews.apache.org/r/67613/diff/1/


Testing
---

./gradlew test


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67326: Update Pants to 1.6.0 and Virtualenv to 16.2.0

2018-06-13 Thread Santhosh Kumar Shanmugham


> On June 13, 2018, 12:19 p.m., Santhosh Kumar Shanmugham wrote:
> > Ship It!

Actullay minor comment - fix the commit message "Virutalenv version is 
mis-spelled"


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67326/#review204723
---


On May 25, 2018, 7:58 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67326/
> ---
> 
> (Updated May 25, 2018, 7:58 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Beyond a regular version bump, this fixes the build on older versions of 
> MacOS.
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/build.sh a5975398929d01268841fa4c02aa360b309f6114 
>   build-support/python/checkstyle-check 
> 7e65dd97687d3bd5b586cec163f973d08decab6e 
>   build-support/thrift/thriftw 26b4f9c8087214222100a46d83b47dbce01a7471 
>   build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
>   pants.ini 8c71b144619f437175727dd2027f702ee749df11 
>   rbt 7531fcb2ed21d125bbd2adf5611db82a1a727545 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
> 
> 
> Diff: https://reviews.apache.org/r/67326/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 67326: Update Pants to 1.6.0 and Virtualenv to 16.2.0

2018-06-13 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67326/#review204723
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On May 25, 2018, 7:58 a.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67326/
> ---
> 
> (Updated May 25, 2018, 7:58 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Beyond a regular version bump, this fixes the build on older versions of 
> MacOS.
> 
> 
> Diffs
> -
> 
>   build-support/jenkins/build.sh a5975398929d01268841fa4c02aa360b309f6114 
>   build-support/python/checkstyle-check 
> 7e65dd97687d3bd5b586cec163f973d08decab6e 
>   build-support/thrift/thriftw 26b4f9c8087214222100a46d83b47dbce01a7471 
>   build-support/virtualenv d6484f58fbffd33ef61d6052c869c55153ec7313 
>   pants 312dd2035a5ad2e65a1fb3f52d1c36693c2624f0 
>   pants.ini 8c71b144619f437175727dd2027f702ee749df11 
>   rbt 7531fcb2ed21d125bbd2adf5611db82a1a727545 
>   src/main/python/apache/aurora/executor/BUILD 
> 486230db34a22ea5dd0f68da911c0afb1afbcac0 
> 
> 
> Diff: https://reviews.apache.org/r/67326/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 67479: Remove maintenance request after a host is drained.

2018-06-06 Thread Santhosh Kumar Shanmugham


> On June 6, 2018, 1:50 p.m., Renan DelValle wrote:
> > src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
> > Lines 259 (patched)
> > <https://reviews.apache.org/r/67479/diff/1/?file=2036108#file2036108line259>
> >
> > Quick question, the existing behavior is to keep hosts in the DRAINED 
> > status until the scheduler receives an end maintenance call. Will this 
> > modify the current behavior?

The host will continue to remain in `DRAINED` mode, blocking any new tasks from 
getting scheduled on it. This is true even when the host is removed and 
re-registers with a new slave id.

We are only removing the maintenance request here. Since the work for draining 
the tasks is already done and there is nothing more to be done here. We need to 
do this otherwise the HostMaintenanceStore can keep growing, unless end 
maintenance is called for each host. This may not be ideal for cases where 
hosts are being returned and are not expected to re-enter the cluster.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67479/#review204427
---


On June 6, 2018, 12:44 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67479/
> ---
> 
> (Updated June 6, 2018, 12:44 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Delete the `HostMaintenaceRequest` once the host has been
> `DRAINED`.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
>  dd2462d98a04c9ab6fdd79ccdb25cd309278267e 
>   
> src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
>  28c62a17db33b16d084b59cf40ca299f322d05e7 
> 
> 
> Diff: https://reviews.apache.org/r/67479/diff/1/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>



Review Request 67479: Remove maintenance request after a host is drained.

2018-06-06 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67479/
---

Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
Stephan Erb.


Repository: aurora


Description
---

Delete the `HostMaintenaceRequest` once the host has been
`DRAINED`.


Diffs
-

  
src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceController.java
 dd2462d98a04c9ab6fdd79ccdb25cd309278267e 
  
src/test/java/org/apache/aurora/scheduler/maintenance/MaintenanceControllerImplTest.java
 28c62a17db33b16d084b59cf40ca299f322d05e7 


Diff: https://reviews.apache.org/r/67479/diff/1/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-06-05 Thread Santhosh Kumar Shanmugham
 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/16/

Changes: https://reviews.apache.org/r/66716/diff/15-16/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


File Attachments


Load test results
  
https://reviews.apache.org/media/uploaded/files/2018/06/05/96d42678-8e61-48c8-977c-fdd925a23185__Screen_Shot_2018-06-04_at_8.14.28_PM.png


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-06-04 Thread Santhosh Kumar Shanmugham
 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/15/

Changes: https://reviews.apache.org/r/66716/diff/14-15/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


File Attachments


Load test results
  
https://reviews.apache.org/media/uploaded/files/2018/06/05/96d42678-8e61-48c8-977c-fdd925a23185__Screen_Shot_2018-06-04_at_8.14.28_PM.png


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-06-04 Thread Santhosh Kumar Shanmugham


> On June 4, 2018, 12:24 p.m., Renan DelValle wrote:
> > src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
> > Lines 396 (patched)
> > <https://reviews.apache.org/r/66716/diff/14/?file=2030797#file2030797line396>
> >
> > Should we make 20 a constant for this test?

Done.


> On June 4, 2018, 12:24 p.m., Renan DelValle wrote:
> > src/test/python/apache/aurora/admin/test_maintenance.py
> > Lines 361 (patched)
> > <https://reviews.apache.org/r/66716/diff/14/?file=2030807#file2030807line361>
> >
> > Maybe this percentage (95) should also be considered a candidate to be 
> > made a constant.

Done.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review204272
-------


On June 4, 2018, 8:35 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated June 4, 2018, 8:35 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1978
> https://issues.apache.org/jira/browse/AURORA-1978
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/client/cli/context.py 
> 06b194114a7f44a61943e0932973e71b53f239b4 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 536d04a21d32c4e586dc943a6f9b0ad0143354a3 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 8c1f5ce6d7eb94ec4e03

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-06-04 Thread Santhosh Kumar Shanmugham


> On June 1, 2018, 1:44 p.m., David McLaughlin wrote:
> > Looks great! 
> > 
> > Has this been load-tested? Do we know how many concurrent machines with 
> > coordinator tasks can be put into maintenance before it starts to affect 
> > offer processing, etc.?

Load testing results show that there is no significant impact on offer 
processing or scheduling. A mixed load of bad-coordinator endpoints and 
long-delay coordinators were used.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review204202
---


On May 25, 2018, 6:13 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 25, 2018, 6:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1978
> https://issues.apache.org/jira/browse/AURORA-1978
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/client/cli/context.py 
> 06b194114a7f44a61943e0932973e71b53f239b4 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 536d04a21d32c4e586dc943a6f9b0ad0143354a3 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 8c1f5ce6d7eb94ec4e0302bfd41318bd0797a1a5 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  749ffeac6cb851f3

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Santhosh Kumar Shanmugham
/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/14/

Changes: https://reviews.apache.org/r/66716/diff/13-14/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203916
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On May 25, 2018, 3:15 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 25, 2018, 3:15 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1978
> https://issues.apache.org/jira/browse/AURORA-1978
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/client/cli/context.py 
> 06b194114a7f44a61943e0932973e71b53f239b4 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 536d04a21d32c4e586dc943a6f9b0ad0143354a3 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 8c1f5ce6d7eb94ec4e0302bfd41318bd0797a1a5 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  749ffeac6cb851f32bba7606390203d7a046a0e6 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 0fabb3370713e57d417adbd2af9e24a4d515c60a 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/QuartzTestUtil.java 
> b7dcf3af366c9def63165dc9bef998ab5e95ed49 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
> c6163bbabc7e7748f167b679893a93f58e4ef1ac 
>   src/test/java/org/apache/au

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Santhosh Kumar Shanmugham
 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/13/

Changes: https://reviews.apache.org/r/66716/diff/12-13/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Santhosh Kumar Shanmugham
ost_drain` in the next 
release and replace the logic for `host_drain`?

Putting `host_drain` under deprecation, nevertheless. We can discuss this 
further when you are back and happy to address any concerns.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203807
---


On May 25, 2018, 2:14 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 25, 2018, 2:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1978
> https://issues.apache.org/jira/browse/AURORA-1978
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/client/cli/context.py 
> 06b194114a7f44a61943e0932973e71b53f239b4 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 536d04a21d32c4e586dc943a6f9b0ad0143354a3 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 8c1f5ce6d7eb94ec4e0302bfd41318bd0797a1a5 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  749ffeac6cb851f32bba7606390203d7a046a0e6 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/CronIT.java 
> 0fabb3370713e57d417adbd2af9e24a4d515c60a 
>   src/test/java/org/apache/aurora/scheduler/cron/quartz/QuartzTestUtil.java 
> b7dcf3af366c9def63165dc9bef998ab5e95ed49 
>   
> src/test/

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Santhosh Kumar Shanmugham


> On May 25, 2018, 11:40 a.m., Jordan Ly wrote:
> > docs/operations/configuration.md
> > Lines 315-316 (original), 315-316 (patched)
> > <https://reviews.apache.org/r/66716/diff/11/?file=2028986#file2028986line315>
> >
> > nit: why double dashes?

Scheduler params use single `-` while the Executor params use `--`. Fixing this 
here since these are Scheduler params.


> On May 25, 2018, 11:40 a.m., Jordan Ly wrote:
> > docs/operations/configuration.md
> > Lines 323-327 (original), 323-327 (patched)
> > <https://reviews.apache.org/r/66716/diff/11/?file=2028986#file2028986line323>
> >
> > nit: why double dashes?

Same here.


> On May 25, 2018, 11:40 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 384 (patched)
> > <https://reviews.apache.org/r/66716/diff/11/?file=2028995#file2028995line384>
> >
> > Would it be useful to add a log message here for forcing through SLA 
> > requirements?

Done.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203896
---


On May 25, 2018, 2:14 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 25, 2018, 2:14 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1978
> https://issues.apache.org/jira/browse/AURORA-1978
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/operations/configuration.md 85a6fab54e03d52e42ba7d4ff47ab93f5b8293ee 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/client/cli/context.py 
> 06b194114a7f44a619

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-25 Thread Santhosh Kumar Shanmugham
/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/12/

Changes: https://reviews.apache.org/r/66716/diff/11-12/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66537: Adding enhancements to Docker functionality and client support for FetcherURIs

2018-05-24 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66537/#review203817
---



Can you include an end-to-end test?

https://github.com/apache/aurora/blob/master/src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


api/src/main/thrift/org/apache/aurora/gen/api.thrift
Lines 191 (patched)
<https://reviews.apache.org/r/66537/#comment286149>

Should we just make this a string so that future network types supported by 
Docker does not need an Aurora update? Since Mesos's `DockerInfo` already type 
checks via enums this is probably leaking abstraction and is not needed?



api/src/main/thrift/org/apache/aurora/gen/api.thrift
Lines 238 (patched)
<https://reviews.apache.org/r/66537/#comment286130>

nit - s/Hub/registry/



src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java
Lines 221-227 (patched)
<https://reviews.apache.org/r/66537/#comment286147>

Not an expert on how Docker entry points work. The Mesos documentation on 
invoking a Docker entry-point seems to differ here? Or I am not understanding 
this.


http://mesos.apache.org/documentation/latest/docker-containerizer/#commandinfo-to-run-docker-images



src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java
Lines 293 (patched)
<https://reviews.apache.org/r/66537/#comment286148>

Should this be resolved when interpolating the pystachio schema in the 
client?

What about the other templated namespaces? 
http://aurora.apache.org/documentation/latest/reference/configuration/#template-namespaces


- Santhosh Kumar Shanmugham


On May 2, 2018, 6:06 p.m., Steve Salevan wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66537/
> ---
> 
> (Updated May 2, 2018, 6:06 p.m.)
> 
> 
> Review request for Aurora, Renan DelValle and Stephan Erb.
> 
> 
> Bugs: AURORA-1982
> https://issues.apache.org/jira/browse/AURORA-1982
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Hey there!
> 
> Over here at Spine we've added a few enhancements to Aurora's Docker 
> functionality to support several of our use cases. We'd like to submit these 
> back up to the upstream to support the great work y'all are doing in this 
> space.
> 
> Here's what's included in this RB:
> 
> * Support for the force_pull flag to help ensure container freshness
> * Overrides for a Docker --entrypoint which can be specified on the job
> * Support for alternative Docker networks (defaults to HOST)
> * Support for user Docker networks
> 
> We currently use Aurora to schedule Docker containers without the use of 
> Thermos, so we've added support for server-side templating of common Thermos 
> variables into Docker executor's parameters for this purpose.
> 
> This change modifies Aurora's api.thrift with several new optional fields, 
> and all added code handles their absence gracefully, so no backfills have 
> been added. We've threaded these schema changes through to the Python Aurora 
> client alongside support for the Mesos Fetcher URIs already supported 
> server-side.
> 
> Let me know what you think and thanks!
> 
> 
> Diffs
> -
> 
>   .gitignore 9ce74ebbbc57b77d912eaa573a8fb18ed4aa3c15 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosTaskFactory.java 
> bcb2bbf882f43d813dd26c746d806e78bae6bcf3 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  749ffeac6cb851f32bba7606390203d7a046a0e6 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosTaskFactoryImplTest.java 
> 686087ef858b8a5a8e956d82a7bd692f7be28b12 
>   src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
> 40851c419e4d62e6545959eebc0ce144fdecc697 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> e4f43d0573c7862adc9bc679f4cea40cc76eac38 
>   src/test/python/apache/aurora/config/test_thrift.py 
> 8e1d0e177959af12b97bdd1cd47845b7

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-24 Thread Santhosh Kumar Shanmugham
/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/11/

Changes: https://reviews.apache.org/r/66716/diff/10-11/


Testing
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-24 Thread Santhosh Kumar Shanmugham
/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/10/


Testing (updated)
---

./build-support/jenkins/build.sh
./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-24 Thread Santhosh Kumar Shanmugham
/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/sh/org/apache/aurora/e2e/http_example.py 
ba7d11429b5f3945a1fdf1808105b11e6ef78420 
  src/test/sh/org/apache/aurora/e2e/partition_aware.aurora 
7ea9fadefcb4846cfe4922e11febec74c75f15db 
  src/test/sh/org/apache/aurora/e2e/sla_coordinator.py PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh 
888efe4e990913d81335f1f3e2c9b6473de7bee8 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/10/

Changes: https://reviews.apache.org/r/66716/diff/9-10/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-24 Thread Santhosh Kumar Shanmugham


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 236 (patched)
> > <https://reviews.apache.org/r/66716/diff/9/?file=2027359#file2027359line236>
> >
> > I think listing the unaffected tasks might be a bit gratuitious. I 
> > would have that as a debug statement if needed.
> > 
> > When doing SLA-aware updates, a big job will produce huge `unaffected 
> > tasks` lists repeatedly.

Dropping the unaffected tasks list.


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 273 (patched)
> > <https://reviews.apache.org/r/66716/diff/9/?file=2027359#file2027359line273>
> >
> > I believe that if the coordinator is locked for more than 10 seconds, 
> > then this method will return false. However, the code will continue since 
> > the output is ignored despite the lock never being gotten.
> > 
> > I think waiting for 10 seconds is also a long time to block the thread. 
> > Is it feasible to try and and get the lock and if it is not immediately 
> > available to return? The caller can invoke the method again 10 seconds 
> > later for essentially the same effect if they desire.

Good catch. Dropping the timeout.


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 314-318 (patched)
> > <https://reviews.apache.org/r/66716/diff/9/?file=2027359#file2027359line314>
> >
> > Is there a way to set a timeout or increment a metric when a 
> > coordinator takes too long to respond?
> > 
> > Is there any downside to allowing long requests?

The `HttpClient` already sets `requestTimeout`, `connectTimeout`, 
`handshakeTimeout`, `readTimeout` etc. Is there something that is not covered 
by this?


> On May 23, 2018, 4:13 p.m., Jordan Ly wrote:
> > src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java
> > Lines 1018 (patched)
> > <https://reviews.apache.org/r/66716/diff/9/?file=2027376#file2027376line1018>
> >
> > `Thread.sleep(15000)` here breaks this test per my comment in 
> > `SlaManager`

Addressed above comment.


- Santhosh Kumar


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203701
---


On May 22, 2018, 5:21 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 22, 2018, 5:21 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-22 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203624
---



Note to reviewers: Since the rebase removes changes that were part of this 
patch, the diffs look werid. Please see diff against master.

- Santhosh Kumar Shanmugham


On May 22, 2018, 5:21 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 22, 2018, 5:21 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   RELEASE-NOTES.md 5e1f9940a7974e212140b7e5304695afa7f96e78 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ff48000d613ceef3e03586b94944d13275fb127c 
>   docs/README.md 166bf1ce240474f0a181e023439cfbfbe7363822 
>   docs/features/sla-requirements.md PRE-CREATION 
>   docs/reference/configuration.md d4b869b938105ba301fc88d41019af2f1707f6f4 
>   docs/reference/scheduler-configuration.md 
> a659cfac974059b04ef5593286011decbb7f9110 
>   examples/vagrant/systemd/aurora-scheduler.service 
> 57e4bba858672f8da94eaa0499f8e5f3347ab982 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/client/cli/context.py 
> 06b194114a7f44a61943e0932973e71b53f239b4 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 536d04a21d32c4e586dc943a6f9b0ad0143354a3 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 8c1f5ce6d7eb94ec4e0302bfd41318bd0797a1a5 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
>  749ffeac6cb851f32bba7606390203d7a046a0e6 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
> c6163bbabc7e7748f167b679893a93f58e4ef1ac 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
> PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> d37e7a07e9258bc8c0758bf50aece5b79025126b 
>   
> src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
>  770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
>

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-22 Thread Santhosh Kumar Shanmugham
/aurora/e2e/sla_policy.aurora PRE-CREATION 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/9/

Changes: https://reviews.apache.org/r/66716/diff/8-9/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67219: Fix flaky Webhook test by ensuring proper error condition

2018-05-21 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67219/#review203508
---


Ship it!




Ship It!


src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java
Line 244 (original), 241 (patched)
<https://reviews.apache.org/r/67219/#comment285801>

Update comment.


- Santhosh Kumar Shanmugham


On May 21, 2018, 11 a.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67219/
> ---
> 
> (Updated May 21, 2018, 11 a.m.)
> 
> 
> Review request for Aurora, Renan DelValle, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Attempt #3 at fixing the flaky Webhook test once and for all.
> 
> Previously, I was testing the error condition by hitting a bad url with a 
> port of -1. I believe this was erroneous (I am assuming the -1 overflowed 
> into a valid port). Additionally, there was a timing associated with the test 
> which could make it flaky as well.
> 
> I ensured that the test hit a bad host url and removed the timing for a more 
> deterministic test.
> 
> 
> Diffs
> -
> 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> 3e10c57e00ba12725310bd50bd55743bec95a77b 
> 
> 
> Diff: https://reviews.apache.org/r/67219/diff/4/
> 
> 
> Testing
> ---
> 
> `./gradlew test` passes.
> 
> Repeated AuroraBot tests.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-17 Thread Santhosh Kumar Shanmugham


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > I have done a first quick pass. I will have a second closer look once the 
> > storage patch has landed.

Thanks for the review. Much appreciated.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
> > Lines 74 (patched)
> > <https://reviews.apache.org/r/66716/diff/7/?file=2023157#file2023157line74>
> >
> > I think we should make this value configurable for operators.

Done.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
> > Lines 248-259 (patched)
> > <https://reviews.apache.org/r/66716/diff/7/?file=2023157#file2023157line248>
> >
> > Both this validation and the other `instance`-based validation below 
> > have a conceptual problem: Users can freely adjust the number of available 
> > instances via `kill` and `addInstance`. This implies that the policy can 
> > become invalid long after we have checked it here.
> > 
> > I think the most robust way would be if we just print warnings in the 
> > client, and change the scheduler so that it can ignore invalid policies, 
> > similar to how it ignores them after a timeout.

Good catch. I will add the warning and the validation check.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
> > Lines 276 (patched)
> > <https://reviews.apache.org/r/66716/diff/7/?file=2023157#file2023157line276>
> >
> > `"CountSlaPolicy: count=5 must be less than 3"` can be hard to 
> > understand as it lacks context. 
> > 
> > If you add that the second number refers to the number of instances, it 
> > it will be easier to users to reason about the error.

Done.


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 280 (patched)
> > <https://reviews.apache.org/r/66716/diff/7/?file=2023160#file2023160line280>
> >
> > Would you be open to the idea of passing the host and (health?) port of 
> > the instance here as well? I have a usecase in mind that would be 
> > simplified by this quite a bit as I can query a running instance for 
> > additional information without having to query the the service discover ZK 
> > first.
> > 
> > In addition, have you considered just passing each information 
> > individually (job, environment, role,...) rather than as a joined string? 
> > Many usescases will probably require string splitting within the 
> > coordinator otherwise.

Adding the entire `ScheduledTask` into the body in JSON format, similar to 
`WebHooks` (but unlike that we will use `TSerializer` to cleanly serialize to 
JSON).


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 303 (patched)
> > <https://reviews.apache.org/r/66716/diff/7/?file=2023160#file2023160line303>
> >
> > What usecase do you have in mind for the statuskey?
> > 
> > I am mostly wondering if a protocol purely based on HTTP status codes 
> > would be sufficient: e.g. `200 OK` means ready to drain and `428 
> > PRECONDITION REQUIRED` asks us to come back again later.

I foresee this becoming a for full-fledged protocol with its own library in 
future depending on the expectations from more stateful services. Some of the 
possible usecases are "come back after x mins from now", "add instance before 
killing".


> On May 16, 2018, 2:35 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java
> > Lines 384 (patched)
> > <https://reviews.apache.org/r/66716/diff/7/?file=2023160#file2023160line384>
> >
> > isProduction is deprecated. You will need to check the appropriate tier 
> > config here.

Done.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203276
---


On May 17, 2018, 7:49 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 17, 2018, 7:49 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-17 Thread Santhosh Kumar Shanmugham
/aurora/scheduler/configuration/ConfigurationManagerTest.java
 749ffeac6cb851f32bba7606390203d7a046a0e6 
  src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
c6163bbabc7e7748f167b679893a93f58e4ef1ac 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/admin/test_maintenance.py 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_add.py 
b22b9f72fbddb553bfc33b1bd8e10636a8d887a6 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/client/cli/test_kill.py 
0e859dc8618a044b2a4a6f73f45cab4a7ffcce4e 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/8/

Changes: https://reviews.apache.org/r/66716/diff/7-8/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-16 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/
---

(Updated May 16, 2018, 7:07 p.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
Stephan Erb.


Bugs: AURORA-1977
https://issues.apache.org/jira/browse/AURORA-1977


Repository: aurora


Description
---

Add `SlaPolicy` and `HostMaintenanceRequest` structs
to the thrift definition and introduce a new `HostMaintenanceStore`
for tracking maintenance requests. These changes will be used in
https://reviews.apache.org/r/66716 for implementing custom SLA
and scheduler driven maintenance.

This RB splits the storage related changes from 
https://reviews.apache.org/r/66716
for better rollback story.

Tested rollback on the vagrant.


Diffs (updated)
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
b79e2045ccda05d5058565f81988dfe33feea8f1 
  src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
da5534f886e032ca5a182f3704aa335ff680b258 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
 f1fdc275d3958a36bbe79110d70dfeba640a948a 
  src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
10864f122eff5027c88d835baae6de483d960218 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java 
8d70cae35289a9e36142bab288cf0c9398ebd2d4 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
b30de881eafa3226fdc32383b4e9bfd33ca912a5 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
4b52be02001e704f4b1a5f447226ac8c2386e3fd 
  
src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
9f324b010db7e351e98b257d8fc8fecfeac81268 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
edcea09b4d206cfddb642074237b031ad71cff13 
  src/main/python/apache/aurora/config/schema/base.py 
a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
  src/main/python/apache/aurora/config/thrift.py 
6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
  src/main/python/apache/aurora/executor/executor_vars.py 
561f9452aedda4cc695c84a2a850bdd7e1d65dec 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/67141/diff/3/

Changes: https://reviews.apache.org/r/67141/diff/2-3/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-16 Thread Santhosh Kumar Shanmugham
/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-16 Thread Santhosh Kumar Shanmugham


> On May 16, 2018, 1:22 p.m., Stephan Erb wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 263 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023574#file2023574line263>
> >
> > Just by looking at this it is not clear what `statusKey` means. Maybe 
> > you can extend the doc string?

Done.


> On May 16, 2018, 1:22 p.m., Stephan Erb wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 901 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023574#file2023574line901>
> >
> > In most other places in the API timestamps are called out as such. 
> > Maybe for consistency we should rename `creationTimeMs` to 
> > `createdTimestampMs` (e.g. as in `JobUpdateState`)? 
> > 
> > (I don't feel strongly about this though)

Done.


> On May 16, 2018, 1:22 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java
> > Lines 57-65 (original), 58-66 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023579#file2023579line58>
> >
> > This is missing the `HostMaintenance` store

Done.


> On May 16, 2018, 1:22 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
> > Lines 257-258 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023580#file2023580line257>
> >
> > If we use a feature toggle here operators can enable the backwards 
> > incompatible change after they have vetted the release (i.e. once they are 
> > sure they don't need to do a rollback for unrelated issues).
> > 
> > We can then simply enable the feature toggle by default after next 
> > release.
> > 
> > @Jordan Ly, would this address your backwards incompatibility concerns?
> 
> Jordan Ly wrote:
> I realize that they can just remove all `HostMaintenanceRequest` and 
> perform a snapshot if they want to rollback to the previous version (kinda 
> like what they did for 0.14 GPU resources 
> https://github.com/apache/aurora/blob/master/RELEASE-NOTES.md#0140). I don't 
> have any concerns going forward with what is present.
> 
> Stephan Erb wrote:
> Ah good catch!

I will add the RELEASE-NOTES entry in the next patch and include similar 
instructions for rollback.


- Santhosh Kumar


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/#review203269
---


On May 15, 2018, 2:15 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67141/
> ---
> 
> (Updated May 15, 2018, 2:15 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1977
> https://issues.apache.org/jira/browse/AURORA-1977
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add `SlaPolicy` and `HostMaintenanceRequest` structs
> to the thrift definition and introduce a new `HostMaintenanceStore`
> for tracking maintenance requests. These changes will be used in
> https://reviews.apache.org/r/66716 for implementing custom SLA
> and scheduler driven maintenance.
> 
> This RB splits the storage related changes from 
> https://reviews.apache.org/r/66716
> for better rollback story.
> 
> Tested rollback on the vagrant.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/sched

Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-16 Thread Santhosh Kumar Shanmugham


> On May 16, 2018, 10:02 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
> > Lines 257-258 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023580#file2023580line257>
> >
> > For additional backwards compatability, we can also not write to the 
> > snapshot yet.
> > 
> > This will allow us to rollback to before the storage changes are even 
> > released.

I will add the RELEASE-NOTES entry in the next patch and include similar 
instructions for rollback as we did for GPU resources.


> On May 16, 2018, 10:02 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
> > Lines 266-267 (patched)
> > <https://reviews.apache.org/r/67141/diff/1/?file=2023580#file2023580line266>
> >
> > Same as above.

Same as above.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/#review203243
-------


On May 15, 2018, 2:15 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67141/
> ---
> 
> (Updated May 15, 2018, 2:15 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
> Stephan Erb.
> 
> 
> Bugs: AURORA-1977
> https://issues.apache.org/jira/browse/AURORA-1977
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Add `SlaPolicy` and `HostMaintenanceRequest` structs
> to the thrift definition and introduce a new `HostMaintenanceStore`
> for tracking maintenance requests. These changes will be used in
> https://reviews.apache.org/r/66716 for implementing custom SLA
> and scheduler driven maintenance.
> 
> This RB splits the storage related changes from 
> https://reviews.apache.org/r/66716
> for better rollback story.
> 
> Tested rollback on the vagrant.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   
> src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
>  PRE-CREATION 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
>  31f9545d83a950064df646ef6ba8a95234cf89ec 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3dd9ce4039b223cb6156462d089f7062a1cde772 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
>  27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
>

Review Request 67142: Add SlaManager to encapsulate SLA operations.

2018-05-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67142/
---

Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
Stephan Erb.


Bugs: AURORA-1978
https://issues.apache.org/jira/browse/AURORA-1978


Repository: aurora


Description
---

Introduce an SlaManager that provies an interface
for perfoming SLA-safe actions. This will be used
by the MaintenanceController to perform SLA-safe
host maintenance and the JobUpdateController to
perform SLA-safe job updates.


Diffs
-

  src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/67142/diff/1/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/
---

(Updated May 15, 2018, 2:15 p.m.)


Review request for Aurora, David McLaughlin, Jordan Ly, Renan DelValle, and 
Stephan Erb.


Changes
---

Adding reviewers.


Bugs: AURORA-1977
https://issues.apache.org/jira/browse/AURORA-1977


Repository: aurora


Description (updated)
---

Add `SlaPolicy` and `HostMaintenanceRequest` structs
to the thrift definition and introduce a new `HostMaintenanceStore`
for tracking maintenance requests. These changes will be used in
https://reviews.apache.org/r/66716 for implementing custom SLA
and scheduler driven maintenance.

This RB splits the storage related changes from 
https://reviews.apache.org/r/66716
for better rollback story.

Tested rollback on the vagrant.


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
b79e2045ccda05d5058565f81988dfe33feea8f1 
  src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
da5534f886e032ca5a182f3704aa335ff680b258 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
 f1fdc275d3958a36bbe79110d70dfeba640a948a 
  src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
10864f122eff5027c88d835baae6de483d960218 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java 
8d70cae35289a9e36142bab288cf0c9398ebd2d4 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
b30de881eafa3226fdc32383b4e9bfd33ca912a5 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
4b52be02001e704f4b1a5f447226ac8c2386e3fd 
  
src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
9f324b010db7e351e98b257d8fc8fecfeac81268 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
edcea09b4d206cfddb642074237b031ad71cff13 
  src/main/python/apache/aurora/config/schema/base.py 
a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
  src/main/python/apache/aurora/config/thrift.py 
6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
  src/main/python/apache/aurora/executor/executor_vars.py 
561f9452aedda4cc695c84a2a850bdd7e1d65dec 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/67141/diff/1/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Review Request 67141: Introduce structs to enable specifying custom SLA.

2018-05-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67141/
---

Review request for Aurora.


Repository: aurora


Description
---

Add `SlaPolicy` and `HostMaintenanceRequest` structs
to the thrift definition and introduce a new `HostMaintenanceStore`
for tracking maintenance requests. These changes will be used in
https://reviews.apache.org/r/66716 for implementing custom SLA
and scheduler driven maintenance.

This RB splits the storage related changes from 
https://reviews.apache.org/r/66716
for better rollback story.


Diffs
-

  api/src/main/thrift/org/apache/aurora/gen/api.thrift 
ef754e32172e7490a47a13e7b526f243ffa3efeb 
  api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
b79e2045ccda05d5058565f81988dfe33feea8f1 
  src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
da5534f886e032ca5a182f3704aa335ff680b258 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
 f1fdc275d3958a36bbe79110d70dfeba640a948a 
  src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
10864f122eff5027c88d835baae6de483d960218 
  
src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java 
8d70cae35289a9e36142bab288cf0c9398ebd2d4 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
b30de881eafa3226fdc32383b4e9bfd33ca912a5 
  src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
4b52be02001e704f4b1a5f447226ac8c2386e3fd 
  
src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
 PRE-CREATION 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
9f324b010db7e351e98b257d8fc8fecfeac81268 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
edcea09b4d206cfddb642074237b031ad71cff13 
  src/main/python/apache/aurora/config/schema/base.py 
a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
  src/main/python/apache/aurora/config/thrift.py 
6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
  src/main/python/apache/aurora/executor/executor_vars.py 
561f9452aedda4cc695c84a2a850bdd7e1d65dec 
  src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/67141/diff/1/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-15 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203150
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On May 15, 2018, 10:16 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 15, 2018, 10:16 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
>  9733ffe74b107f336858657550156ddb1f1dd215 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda

Re: Review Request 66716: Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-15 Thread Santhosh Kumar Shanmugham
/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/admin/test_maintenance.py 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/7/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-15 Thread Santhosh Kumar Shanmugham
/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/admin/test_maintenance.py 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/7/

Changes: https://reviews.apache.org/r/66716/diff/6-7/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66192: [WIP] Variable group size updates

2018-05-15 Thread Santhosh Kumar Shanmugham


> On May 14, 2018, 10:58 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
> > Lines 826 (patched)
> > <https://reviews.apache.org/r/66192/diff/2/?file=2018260#file2018260line826>
> >
> > Sums up to be exactly `mutableRequest.instanceCount`?
> 
> Renan DelValle wrote:
> This is something I wonderd myself. The current design follows the 
> philosophy that the sum of all update groups does not have to be equal to the 
> instance count.
> 
> If the instance count is greater than the instanceCount, the update will 
> carry forward until we run out of instance to update. 
> 
> For example, if we have update groups 1,2,3 and our instance count is 5, 
> we will get the following steps in practice: 1, 2, 2
> 
> If the instance count is lesser than the instance count, the update will 
> forward repeating the value of the last group size until the update completes.
> 
> For example, if we have update groups 1,2 and our instance count is 5, we 
> will get the following steps: 1, 2, 2
> 
> I will write thourough documentation on this so that users know what to 
> expect when this update strategy is used.
> 
> 
> One benefit of implementing the variable group size update this way is 
> that it provides a path going forward to have a single batch strategy in the 
> code base.
> 
> Since we repeat the last group size we have, having a list of group sizes 
> of length 1 is equivalent to a batch update. (This was done based on a 
> comment by Stephan during the first review round that resonated with me.)

That makes sense. I missed the earlier conversation.


> On May 14, 2018, 10:58 a.m., Santhosh Kumar Shanmugham wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
> > Lines 65 (patched)
> > <https://reviews.apache.org/r/66192/diff/2/?file=2018263#file2018263line65>
> >
> > Can you include an example for the rolling forward and backward cases?
> 
> Renan DelValle wrote:
> Can you expand on what you mean by including an example? Should I put it 
> in as a comment on the code?

Yes. Please include that in the javadoc.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review203048
---


On May 14, 2018, 7:19 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated May 14, 2018, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66192: [WIP] Variable group size updates

2018-05-15 Thread Santhosh Kumar Shanmugham
checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:109:50:
> >  '-' is not followed by whitespace. [WhitespaceAround]
> > [ant:checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:117:7:
> >  'if' is not followed by whitespace. [WhitespaceAround]
> > [ant:checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:127:
> >  Line is longer than 100 characters (found 101). [LineLength]
> >  FAILED
> > 
> > FAILURE: Build failed with an exception.
> > 
> > * What went wrong:
> > Execution failed for task ':checkstyleMain'.
> > > Checkstyle rule violations were found. See the report at: 
> > > file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html
> > 
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or 
> > --debug option to get more log output.
> > 
> > * Get more help at https://help.gradle.org
> > 
> > BUILD FAILED in 5m 13s
> > 32 actionable tasks: 26 executed, 6 up-to-date
> > 
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"
> 
> Santhosh Kumar Shanmugham wrote:
> Can you clean up the style issues?
> 
> Also, will this be configurable via the pystachio config?
> 
> Renan DelValle wrote:
> Definitely, just wanted to get the latest version out to get feedback on 
> the Thrift Schema changes. This will indeed be configurable via pystachio, 
> just wanted to settle on the Thrift Schema before making those changes.

Ack.


- Santhosh Kumar


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review203093
---


On May 14, 2018, 7:19 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated May 14, 2018, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66192: [WIP] Variable group size updates

2018-05-15 Thread Santhosh Kumar Shanmugham
checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:109:50:
> >  '-' is not followed by whitespace. [WhitespaceAround]
> > [ant:checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:117:7:
> >  'if' is not followed by whitespace. [WhitespaceAround]
> > [ant:checkstyle] [ERROR] 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java:127:
> >  Line is longer than 100 characters (found 101). [LineLength]
> >  FAILED
> > 
> > FAILURE: Build failed with an exception.
> > 
> > * What went wrong:
> > Execution failed for task ':checkstyleMain'.
> > > Checkstyle rule violations were found. See the report at: 
> > > file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html
> > 
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or 
> > --debug option to get more log output.
> > 
> > * Get more help at https://help.gradle.org
> > 
> > BUILD FAILED in 5m 13s
> > 32 actionable tasks: 26 executed, 6 up-to-date
> > 
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"

Can you clean up the style issues?

Also, will this be configurable via the pystachio config?


- Santhosh Kumar


-------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review203093
---


On May 14, 2018, 7:19 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated May 14, 2018, 7:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66192: [WIP] Variable group size updates

2018-05-14 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review203048
---



Approach looks good to me.


src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
Lines 826 (patched)
<https://reviews.apache.org/r/66192/#comment285106>

Sums up to be exactly `mutableRequest.instanceCount`?



src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
Lines 46 (patched)
<https://reviews.apache.org/r/66192/#comment285107>

nit - s/Creates an/Creates a/



src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
Lines 65 (patched)
<https://reviews.apache.org/r/66192/#comment285109>

Can you include an example for the rolling forward and backward cases?



src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
Lines 67 (patched)
<https://reviews.apache.org/r/66192/#comment285108>

nit - s/where/we are/


- Santhosh Kumar Shanmugham


On May 8, 2018, 4:26 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated May 8, 2018, 4:26 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, Santhosh Kumar 
> Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-14 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review203047
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On May 14, 2018, 9:52 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 14, 2018, 9:52 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/configuration/ConfigurationManager.java
>  4073229b74d0e0e7fd31552bd96894ceb8a0971a 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   
> src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotDeduplicator.java
>  9733ffe74b107f336858657550156ddb1f1dd215 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   src/main/java/org/apache/aurora/scheduler/thrift/ReadOnlySchedulerImpl.java 
> e88cad6cf12312512e6840329db7ca7134ceaae6 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/admin/admin_util.py 
> 8240e8093160623b4c30dd212a88b8e122fd9856 
>   src/main/python/apache/aurora/admin/host_maintenance.py 
> 83fc2b6ece40d3436cc7de7a034f95224235fcfd 
>   src/main/python/apache/aurora/admin/maintenance.py 
> 942a237f47a6e0416bbaf244278685477e0f407d 
>   src/main/python/apache/aurora/client/api/__init__.py 
> f6fd1dd6d7c2bdd5bca3037f501b36badab78c75 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-14 Thread Santhosh Kumar Shanmugham
l.java 
778148a7c033cba9004954cabc33a2b1d003dccf 
  src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
e66ec116112df164106598d9ff0bc9e8f465e44f 
  
src/test/java/org/apache/aurora/scheduler/configuration/ConfigurationManagerTest.java
 749ffeac6cb851f32bba7606390203d7a046a0e6 
  src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
c6163bbabc7e7748f167b679893a93f58e4ef1ac 
  src/test/java/org/apache/aurora/scheduler/sla/SlaManagerTest.java 
PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
d37e7a07e9258bc8c0758bf50aece5b79025126b 
  
src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/admin/test_maintenance.py 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/6/

Changes: https://reviews.apache.org/r/66716/diff/5-6/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-14 Thread Santhosh Kumar Shanmugham
/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
 770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
  
src/test/java/org/apache/aurora/scheduler/storage/AbstractHostMaintenanceStoreTest.java
 PRE-CREATION 
  src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DataCompatibilityTest.java
 31f9545d83a950064df646ef6ba8a95234cf89ec 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
 3dd9ce4039b223cb6156462d089f7062a1cde772 
  
src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
 27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
  src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
be07361a27afefa21cc2ba76ce82531a418d9814 
  
src/test/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStoreTest.java
 PRE-CREATION 
  
src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java 
d59118be13342da9003b0bcb97e12e477d9edf8f 
  
src/test/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/admin/test_maintenance.py 
ca0239b157f9f9053821af0328b9448703386cd4 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 
  src/test/sh/org/apache/aurora/e2e/sla_policy.aurora PRE-CREATION 
  ui/src/main/js/components/TaskConfigSummary.js 
64880f4bd5c5358287ef481df455f6355fedd7d6 


Diff: https://reviews.apache.org/r/66716/diff/6/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-09 Thread Santhosh Kumar Shanmugham
/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66716/diff/5/

Changes: https://reviews.apache.org/r/66716/diff/4-5/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham
ts today. So the clients will retry the 
drain -> which will create the maintenance request object -> which will unblock 
the drains. 

Considering the `dedicated` owners user-case (mentioned above), I dropped the 
scheduler level config. We can bring it back if you feel strongly about it.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202652
---


On May 8, 2018, 8:49 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:49 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.ja

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202657
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On May 8, 2018, 8:49 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:49 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/app/SchedulerIT.java 
> 63c338e5bbdf60de0fba8d68c6613904abb93fa8 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandlerTest.java 
> c6163bbabc7e7748f167b679893a93f58e4ef1ac 
>   src/test/java/org/apache/aurora/scheduler/sla/SlaModuleTest.java 
> d37e7a07e9258bc8c0758bf50aece5b79025126b 
>   
> src/test/java/org/apache/aurora/sched

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66716/diff/4/

Changes: https://reviews.apache.org/r/66716/diff/3-4/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 3, 2018, 10:43 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Lines 144-150 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015592#file2015592line145>
> >
> > If bound in a private module, you don't need to have a `@Named` 
> > annotation (I don't think it is used anywhere else).
> > 
> > Additionally, maybe use `@Qualifier` instead (e.g. 
> > https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L60-L62).
> >  Not entirely sure what the difference is myself but this is used more 
> > throughout the project.

I would prefer namespacing the argument even if they are inside a private 
module. I will use `@Qualifier` that looks cleaner.


> On May 3, 2018, 10:43 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java
> > Line 157 (original), 203 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015592#file2015592line217>
> >
> > This name confused me a bit. For me, it implies there will be a 
> > long-running method continually doing something but this is a one-time 
> > thing.
> > 
> > Maybe rename to `checkForDrainingTasks` or something?

Done


> On May 3, 2018, 10:43 a.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 153-155 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015593#file2015593line153>
> >
> > Do we need to immediately return after this?
> > 
> > If I am going to force the work to be done, do I need to even call this 
> > method?

I want to invoke the required in one-place so it is easier to follow.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202363
---


On May 8, 2018, 8:28 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:28 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
>

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 7, 2018, 2:12 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 124-130 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015593#file2015593line124>
> >
> > I believe that you will want to only look at the lastest task event 
> > only if it is RUNNING.
> > 
> > This will take into account tasks that are KILLING/PREEMPTING which 
> > would break SLA.

Changed it so that we only look for RUNNING tasks when computing running time 
SLA. This will be more conservative than how we are doing SLA calculations in 
the admin client.


> On May 7, 2018, 2:12 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 138 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015593#file2015593line138>
> >
> > Can you elaborate more on what this line does?

This should change to exclude the task that is currently under consideration to 
simulate the SLA without that task. Updated.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202594
-------


On May 8, 2018, 8:28 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:28 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham


> On May 3, 2018, 3:26 p.m., Reza Motamedi wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 248 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015590#file2015590line248>
> >
> > Should the comment be updated to match the struct in resolution?
> > 
> > Ms is milli secs, right?

Done.


> On May 3, 2018, 3:26 p.m., Reza Motamedi wrote:
> > api/src/main/thrift/org/apache/aurora/gen/api.thrift
> > Lines 1241 (patched)
> > <https://reviews.apache.org/r/66716/diff/2/?file=2015590#file2015590line1241>
> >
> > s/of the host/of the hosts

Done.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202386
-----------


On May 8, 2018, 8:28 a.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 8, 2018, 8:28 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/app/AppModule.java 
> ffc07443fae9e5216a5333ae305f75aa9b452a0c 
>   src/main/java/org/apache/aurora/scheduler/config/CliOptions.java 
> a2fb0393ba47e876c4c8c63e3ed27ebe42cb6ca3 
>   
> src/main/java/org/apache/aurora/scheduler/maintenance/MaintenanceModule.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/mesos/MesosCallbackHandler.java 
> 3b4df55a05873e79aae206b117cbc753fa3abb94 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaManager.java PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/sla/SlaModule.java 
> 25ed474289f369e74c24e999ad97ed6810c9fd5e 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/jav

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-08 Thread Santhosh Kumar Shanmugham
/apache/aurora/scheduler/thrift/SchedulerThriftInterfaceTest.java
 2cf66d8154ad3795989ee9026e45af1be509f244 
  src/test/java/org/apache/aurora/scheduler/thrift/ThriftIT.java 
40851c419e4d62e6545959eebc0ce144fdecc697 
  src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
d412090c292691305f01bccd1596fb0f6bb003ad 
  src/test/python/apache/aurora/api_util.py 
3fc9b478cc9aada0503e8ed8698a37b4ed926cdd 
  src/test/python/apache/aurora/client/api/test_scheduler_client.py 
f2a2eae1539f7f6dff6855e4122cc41c6cbb0f7b 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
e4f43d0573c7862adc9bc679f4cea40cc76eac38 
  src/test/python/apache/aurora/config/test_thrift.py 
8e1d0e177959af12b97bdd1cd47845b72bc12fe1 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/removeHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveCronJob
 88e1c36a1aa2d192b95963f7aa36e243a447e4af 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66716/diff/3/

Changes: https://reviews.apache.org/r/66716/diff/2-3/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66922: Changing Vagrant requirements to latest version for launching our local dev box.

2018-05-04 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66922/#review202453
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On May 2, 2018, 3:08 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66922/
> ---
> 
> (Updated May 2, 2018, 3:08 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Changing Vagrant requirements to latest version for launching our local dev 
> box.
> 
> Needed since we depend on the Vagrant Cloud instead of Atlas now.
> 
> 
> Diffs
> -
> 
>   Vagrantfile 76dfcaca08324a458d7ec45a85deb7a38d94b1f7 
> 
> 
> Diff: https://reviews.apache.org/r/66922/diff/1/
> 
> 
> Testing
> ---
> 
> Ran vagrant box with latest Vagrant version (2.0.4).
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-01 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review202246
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On May 1, 2018, 2:19 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 1, 2018, 2:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/SlaManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa68bf6f0e5bbbffecc5bd8c0431 
>   src/main/python/apache/aurora/executor/executor_vars.py 
> 561f9452aedda4cc695c84a2a850bdd7e1d65dec 
>   src/test/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> 778148a7c033cba9004954cabc33a2b1d003dccf 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> e66ec116112df164106598d9ff0bc9e8f465e44f 
>   
> src/test/java/org/apache/aurora/scheduler/state/MaintenanceControllerImplTest.java
>  770846e84e9980ea3dbf9e1c46b0d45c5488c5b3 
>   src/test/java/org/apache/aurora/scheduler/storage/backup/RecoveryTest.java 
> ba03ff94bb5fee2b09a6660a9ad759cece7449f1 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/DurableStorageTest.java
>  3dd9ce4039b223cb6156462d089f7062a1cde772 
>   
> src/test/java/org/apache/aurora/scheduler/storage/durability/WriteRecorderTest.java
>  27c8c829cd1e417dd5e60a8e9415331ca4a7c918 
>   
> src/test/java/org/apache/aurora/scheduler/storage/log/SnapshotterImplIT.java 
> be07361a27afefa21cc2ba76ce82531a418d9814 
>   
> src/test/java/org/apache/aurora/scheduler/storage/testing/StorageTestUtil.java
>  d59118be13342da9003b0bcb97e12e477d9edf8f 
>   
> src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java 
> d412090c292691305f01bccd1596fb0f6bb003ad 
>   src/test/python/apache/aurora/api_ut

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-01 Thread Santhosh Kumar Shanmugham
at this makes more sense in `MaintenanceController`

Agree. Done.


> On April 20, 2018, 3:31 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 100 (patched)
> > <https://reviews.apache.org/r/66716/diff/1/?file=2006760#file2006760line100>
> >
> > mostly for my curiosity but why 100?

This should be the max number of coordinators that can exist in a cluster. 
Should be parametrized.


> On April 20, 2018, 3:31 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/state/SlaManager.java
> > Lines 148-174 (patched)
> > <https://reviews.apache.org/r/66716/diff/1/?file=2006760#file2006760line148>
> >
> > Mentioned above, but I think that the concept of `drain` should remain 
> > in `MaintenanceController` and not leak into `SlaManager`.
> > 
> > Instead of `drain`, this could be something like `checkSlaAndExecute` 
> > which would be a consumer that checks the SLA then executes an action 
> > provided by the caller. I am thinking about this in terms of my proposed 
> > SLA-aware updates -- I would be able to utilize this interface as well.

Discussed offline a lot about the interface to make it flexible for SLA-aware 
updates and we decided to go with a `checkSlaThenAct` which takes a 
`Storage.MutateWork` argument.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66716/#review201642
---


On May 1, 2018, 2:19 p.m., Santhosh Kumar Shanmugham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66716/
> ---
> 
> (Updated May 1, 2018, 2:19 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Jordan Ly, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> `Tasks` can specify custom SLA requirements as part of
> their `TaskConfig`. One of the new features is the ability
> to specify an external coordinator that can ACK/NACK
> maintenance requests for tasks. This will be hugely
> beneficial for onboarding services that cannot satisfactorily
> specify SLA in terms of running instances.
> 
> Maintenance requests are driven from the Scheduler to
> improve management of nodes in the cluster.
> 
> 
> Note to reviewers:
> - Test coverage is minimal at this point. Expect more coverage soon in the 
> next diff.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/state/MaintenanceController.java 
> f58c66aaebe8d31913d67a05add0f3d6054e88d1 
>   src/main/java/org/apache/aurora/scheduler/state/SlaManager.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/state/StateModule.java 
> 0e0f90b670bbbcd6cb3aa302ce4a9abfe70ea979 
>   src/main/java/org/apache/aurora/scheduler/storage/HostMaintenanceStore.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/Storage.java 
> da5534f886e032ca5a182f3704aa335ff680b258 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/DurableStorage.java
>  f1fdc275d3958a36bbe79110d70dfeba640a948a 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotService.java 
> b30de881eafa3226fdc32383b4e9bfd33ca912a5 
>   src/main/java/org/apache/aurora/scheduler/storage/log/SnapshotterImpl.java 
> 4b52be02001e704f4b1a5f447226ac8c2386e3fd 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemHostMaintenanceStore.java
>  PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorage.java 
> 9f324b010db7e351e98b257d8fc8fecfeac81268 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemStorageModule.java 
> edcea09b4d206cfddb642074237b031ad71cff13 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   src/main/python/apache/aurora/config/schema/base.py 
> a629bcd1261e5959da0a8458a55545d4e2c2a7a5 
>   src/main/python/apache/aurora/config/thrift.py 
> 6d2dde6e964daa6

Re: Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-05-01 Thread Santhosh Kumar Shanmugham
/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66716/diff/2/

Changes: https://reviews.apache.org/r/66716/diff/1-2/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66806: Breakdown resource stats by role

2018-04-26 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66806/#review202021
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On April 26, 2018, 10:51 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66806/
> ---
> 
> (Updated April 26, 2018, 10:51 a.m.)
> 
> 
> Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Currently Aurora exports total quota and resource reservation over time. This 
> can be very useful to see changes in trends of production and free tier 
> capacity. One challenge (particularly in a self-serve capacity environment) 
> is identifying and tracking where large deltas came from. This change exports 
> both quota and resource usage per role to help with this.
> 
> It is possible to make this more efficient by refactoring the current 
> abstractions to do both totals and per-role metrics in a single pass. But 
> given this only runs once per hour, I went for the cleaner/simpler approach.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/stats/ResourceCounter.java 
> a3e9bc75d5ce48cc3d64aa6b650df708f2f7c916 
>   src/main/java/org/apache/aurora/scheduler/stats/TaskStatCalculator.java 
> ac5cf2462cb0f383493296b541348bff40e99025 
>   src/test/java/org/apache/aurora/scheduler/stats/ResourceCounterTest.java 
> a30d74e73774a9fd4224839d08ebfe7fb0df095a 
> 
> 
> Diff: https://reviews.apache.org/r/66806/diff/1/
> 
> 
> Testing
> ---
> 
> /vars in Vagrant: 
> 
> quota_per_role_www-data_cpu_cores 10
> quota_per_role_www-data_disk_mb 102400
> quota_per_role_www-data_ram_mb 102400
> resources_allocated_quota_cpu_cores 0
> resources_allocated_quota_disk_mb 0
> resources_allocated_quota_ram_mb 0
> resources_dedicated_consumed_cpu_cores 0
> resources_dedicated_consumed_disk_mb 0
> resources_dedicated_consumed_ram_mb 0
> resources_free_pool_consumed_cpu_cores 1
> resources_free_pool_consumed_disk_mb 8
> resources_free_pool_consumed_ram_mb 1
> resources_per_role_free_pool_consumed_www-data_cpu_cores 1
> resources_per_role_free_pool_consumed_www-data_disk_mb 8
> resources_per_role_free_pool_consumed_www-data_ram_mb 1
> resources_per_role_total_consumed_www-data_cpu_cores 1
> resources_per_role_total_consumed_www-data_disk_mb 8
> resources_per_role_total_consumed_www-data_ram_mb 1
> resources_quota_consumed_cpu_cores 0
> resources_quota_consumed_disk_mb 0
> resources_quota_consumed_ram_mb 0
> resources_total_consumed_cpu_cores 1
> resources_total_consumed_disk_mb 8
> resources_total_consumed_ram_mb 1
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 66697: Add --pid-file flag to `aurora task ssh` to write the PID of the underlying SSH command to a specified file.

2018-04-19 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66697/#review201559
---



@ReviewBot retry

- Santhosh Kumar Shanmugham


On April 19, 2018, 12:30 p.m., Sameer Brenn wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66697/
> ---
> 
> (Updated April 19, 2018, 12:30 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> My team has some scripts to start devel shards which create tunnels:
> 
> ```
> aurora task ssh -L 8002:http --ssh-options "-f -N" 
> "$DC/$USER/devel/proxyapp/0"
> aurora task ssh -L 9002:health --ssh-options "-f -N" 
> "$DC/$USER/devel/proxyapp/0"
> ```
> 
> We use fixed local port numbers because that way we can run dependent 
> services locally that look for locally-running copies of the
> same service on a fixed port, but then those requests get tunnelled through 
> to the devel shard.
> 
> When the devel shard is restarted, however, the tunnel is still running so 
> the subsequent call to create a new tunnel fails because
> it can't bind to the fixed port.
> 
> If we save the SSH process PID to a file, we can then kill existing tunnel to 
> the old instance before starting up the new tunnel to the
> new instance.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/cli/task.py 
> 652a545072f161dbf854b3d6d273809b09d142e8 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> a543d4a101c58149f8af265257d061ff5032049c 
> 
> 
> Diff: https://reviews.apache.org/r/66697/diff/4/
> 
> 
> Testing
> ---
> 
> ```
> $ ./pants test src/test/python/apache/aurora/client::
> ```
> 
> And when applying the same patch to our local repo at Twitter:
> 
> ```
> $ ./pants run 
> twitter/src/main/python/twitter/aurora/client/cli_internal:aurora_internal -- 
> task ssh -L 8005:http --ssh-options "-n -N" --pid-file /tmp/p 
> "smf1/sbrenn/devel/proxyapp/0" &
> $ ps -p `cat /tmp/p`
>   PID TTY   TIME CMD
> 34729 ttys0000:00.05 ssh -t -n -N -L 
> 8005:smf1-aki-27-sr1.prod.twitter.com:31794 
> sbr...@smf1-aki-27-sr1.prod.twitter.com cd 
> /var/lib/mesos/slaves/*/frameworks/*/exec
> ```
> 
> 
> Thanks,
> 
> Sameer Brenn
> 
>



Re: Review Request 66697: Add --pid-file flag to `aurora task ssh` to write the PID of the underlying SSH command to a specified file.

2018-04-19 Thread Santhosh Kumar Shanmugham


> On April 19, 2018, 1:52 p.m., Santhosh Kumar Shanmugham wrote:
> > @ReviewBot retry

Actually, looks like some style issues.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66697/#review201559
---


On April 19, 2018, 12:30 p.m., Sameer Brenn wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66697/
> ---
> 
> (Updated April 19, 2018, 12:30 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> My team has some scripts to start devel shards which create tunnels:
> 
> ```
> aurora task ssh -L 8002:http --ssh-options "-f -N" 
> "$DC/$USER/devel/proxyapp/0"
> aurora task ssh -L 9002:health --ssh-options "-f -N" 
> "$DC/$USER/devel/proxyapp/0"
> ```
> 
> We use fixed local port numbers because that way we can run dependent 
> services locally that look for locally-running copies of the
> same service on a fixed port, but then those requests get tunnelled through 
> to the devel shard.
> 
> When the devel shard is restarted, however, the tunnel is still running so 
> the subsequent call to create a new tunnel fails because
> it can't bind to the fixed port.
> 
> If we save the SSH process PID to a file, we can then kill existing tunnel to 
> the old instance before starting up the new tunnel to the
> new instance.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/cli/task.py 
> 652a545072f161dbf854b3d6d273809b09d142e8 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> a543d4a101c58149f8af265257d061ff5032049c 
> 
> 
> Diff: https://reviews.apache.org/r/66697/diff/4/
> 
> 
> Testing
> ---
> 
> ```
> $ ./pants test src/test/python/apache/aurora/client::
> ```
> 
> And when applying the same patch to our local repo at Twitter:
> 
> ```
> $ ./pants run 
> twitter/src/main/python/twitter/aurora/client/cli_internal:aurora_internal -- 
> task ssh -L 8005:http --ssh-options "-n -N" --pid-file /tmp/p 
> "smf1/sbrenn/devel/proxyapp/0" &
> $ ps -p `cat /tmp/p`
>   PID TTY   TIME CMD
> 34729 ttys0000:00.05 ssh -t -n -N -L 
> 8005:smf1-aki-27-sr1.prod.twitter.com:31794 
> sbr...@smf1-aki-27-sr1.prod.twitter.com cd 
> /var/lib/mesos/slaves/*/frameworks/*/exec
> ```
> 
> 
> Thanks,
> 
> Sameer Brenn
> 
>



Re: Review Request 66697: Add --pid-file flag to `aurora task ssh` to write the PID of the underlying SSH command to a specified file.

2018-04-19 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66697/#review201483
---


Ship it!





src/test/python/apache/aurora/client/cli/test_task.py
Lines 175 (patched)
<https://reviews.apache.org/r/66697/#comment282732>

Pull `12312` into a constant.


- Santhosh Kumar Shanmugham


On April 18, 2018, 2:54 p.m., Sameer Brenn wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66697/
> ---
> 
> (Updated April 18, 2018, 2:54 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> My team has some scripts to start devel shards which create tunnels:
> 
> ```
> aurora task ssh -L 8002:http --ssh-options "-f -N" 
> "$DC/$USER/devel/proxyapp/0"
> aurora task ssh -L 9002:health --ssh-options "-f -N" 
> "$DC/$USER/devel/proxyapp/0"
> ```
> 
> We use fixed local port numbers because that way we can run dependent 
> services locally that look for locally-running copies of the
> same service on a fixed port, but then those requests get tunnelled through 
> to the devel shard.
> 
> When the devel shard is restarted, however, the tunnel is still running so 
> the subsequent call to create a new tunnel fails because
> it can't bind to the fixed port.
> 
> If we save the SSH process PID to a file, we can then kill existing tunnel to 
> the old instance before starting up the new tunnel to the
> new instance.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/client/cli/task.py 
> 652a545072f161dbf854b3d6d273809b09d142e8 
>   src/test/python/apache/aurora/client/cli/test_task.py 
> a543d4a101c58149f8af265257d061ff5032049c 
> 
> 
> Diff: https://reviews.apache.org/r/66697/diff/3/
> 
> 
> Testing
> ---
> 
> ```
> $ ./pants test src/test/python/apache/aurora/client::
> ```
> 
> And when applying the same patch to our local repo at Twitter:
> 
> ```
> $ ./pants run 
> twitter/src/main/python/twitter/aurora/client/cli_internal:aurora_internal -- 
> task ssh -L 8005:http --ssh-options "-n -N" --pid-file /tmp/p 
> "smf1/sbrenn/devel/proxyapp/0" &
> $ ps -p `cat /tmp/p`
>   PID TTY   TIME CMD
> 34729 ttys0000:00.05 ssh -t -n -N -L 
> 8005:smf1-aki-27-sr1.prod.twitter.com:31794 
> sbr...@smf1-aki-27-sr1.prod.twitter.com cd 
> /var/lib/mesos/slaves/*/frameworks/*/exec
> ```
> 
> 
> Thanks,
> 
> Sameer Brenn
> 
>



Review Request 66716: [WIP] Enable `Tasks` to specify their own custom maintenance SLA.

2018-04-19 Thread Santhosh Kumar Shanmugham
/durability/goldens/current/saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveJobUpdate
 32fdcdacde58345cdd6c4b449b82c0c90c2b2aae 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/current/saveTasks
 4323031ec6bd128576c2a43ebc11f04a9f046e2f 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/16-saveHostMaintenanceRequest
 PRE-CREATION 
  
src/test/resources/org/apache/aurora/scheduler/storage/durability/goldens/read-compatible/17-removeHostMaintenanceRequest
 PRE-CREATION 


Diff: https://reviews.apache.org/r/66716/diff/1/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Santhosh Kumar Shanmugham



Re: Review Request 66186: Upgrade to psutil with optimized Process.children()

2018-04-17 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66186/#review201337
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On April 11, 2018, 3:13 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66186/
> ---
> 
> (Updated April 11, 2018, 3:13 p.m.)
> 
> 
> Review request for Aurora, Reza Motamedi and Santhosh Kumar Shanmugham.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> The changelog claims: `Process.children() is 2x faster on UNIX and 2.4x 
> faster on Linux.`
> 
> This is needed for all stats retrieved via `ProcessTreeCollector`. An update 
> therefore
> seems worthwhile. 
> 
> https://github.com/giampaolo/psutil/blob/master/HISTORY.rst
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   src/test/python/apache/thermos/monitoring/test_process_collector_psutil.py 
> 93ff878be578fa7a63d25b65e7d915790dc9ccc6 
> 
> 
> Diff: https://reviews.apache.org/r/66186/diff/1/
> 
> 
> Testing
> ---
> 
> Successfully verified in vagrant that CPU and memory are reported as expected.
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



Re: Review Request 66623: Fix the json endpoints in thermos

2018-04-17 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66623/#review201336
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On April 16, 2018, 2:01 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66623/
> ---
> 
> (Updated April 16, 2018, 2:01 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> # Fixing the json endpoints in thermos
> 
> `TaskObserverJSONBindings` is mixin that includes a few routes that serve 
> info about tasks and processes in pure JSON format. The functions are 
> overridden in the main bottle server, so the routes are not accessible. This 
> patch fixes it by renaming those methods.
> 
> Check here:
> https://github.com/apache/aurora/blob/master/src/main/python/apache/thermos/observer/http/http_observer.py#L72
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/thermos/observer/http/json.py 
> 4ba53245173c253a3f9044f6971c58b7e856171e 
>   src/test/python/apache/thermos/observer/http/BUILD 
> 708f09bef0755baebb64759eb4e920a1e213765b 
> 
> 
> Diff: https://reviews.apache.org/r/66623/diff/2/
> 
> 
> Testing
> ---
> 
> There was no unit test affected. 
> 
> After fixing the routes server the expected content.
> ```
> ? curl http://192.168.33.7:1338/j/task_ids
> {"type": "all", "tasks": [{"status": "sleeping", "ram": 3727360, 
> "state_timestamp": 1523728477, "threads": 2, "user": 0.24, "disk": 10117120, 
> "launch_timestamp": 1523728477, "vms": 22990848, "rss": 3727360, "name": 
> "hello", "task_id": 
> "www-data-prod-hello-0-00e58d09-a67f-4a46-94a0-15bcad26a098", "system": 0.34, 
> "ports": {}, "state": "ACTIVE", "role": "www-data", "cpu": 0.0, "nice": 0}], 
> "num": 20, "task_count": 1, "offset": 0}%
> 
> ? curl 
> http://192.168.33.7:1338/j/task/www-data-prod-hello-0-00e58d09-a67f-4a46-94a0-15bcad26a098
> {"www-data-prod-hello-0-00e58d09-a67f-4a46-94a0-15bcad26a098": {"task": 
> {"processes": [{"daemon": false, "name": "hello", "max_failures": 1, 
> "ephemeral": false, "min_duration": 5, "cmdline": "\nwhile true; do\n 
>  echo hello world\n  sleep 10\ndone\n  ", "final": false}], "name": 
> "hello", "finalization_wait": 30, "max_failures": 1, "max_concurrency": 0, 
> "resources": {"gpu": 0, "disk": 134217728, "ram": 134217728, "cpu": 1.0}, 
> "constraints": [{"order": ["hello"]}]}, "name": "hello", "task_id": 
> "www-data-prod-hello-0-00e58d09-a67f-4a46-94a0-15bcad26a098", "processes": 
> {"failed": [], "running": ["hello"], "killed": [], "success": [], "waiting": 
> []}, "state_timestamp": 1523728477, "state": "ACTIVE", 
> "resource_consumption": {"status": "sleeping", "disk": 10113024, "ram": 
> 3719168, "system": 0.33, "vms": 22990848, "threads": 2, "user": 0.24, "rss": 
> 3719168, "cpu": 0.0, "nice": 0}, "user": "www-data", "launch_timestamp": 
> 1523728477, "ports": {}}}%
> 
> ? curl 
> http://192.168.33.7:1338/j/task\?task_id\=www-data-prod-hello-0-00e58d09-a67f-4a46-94a0-15bcad26a098
> {"www-data-prod-hello-0-00e58d09-a67f-4a46-94a0-15bcad26a098": {"task": 
> {"processes": [{"daemon": false, "name": "hello", "max_failures": 1, 
> "ephemeral": false, "min_duration": 5, "cmdline": "\nwhile true; do\n 
>  echo hello world\n  sleep 10\ndone\n  ", "final": false}], "name": 
> "hello", "finalization_wait": 30, "max_failures": 1, "max_concurrency": 0, 
> "re

Re: Review Request 66573: Add initial interval before searching for preemption slots

2018-04-11 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66573/#review200950
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On April 11, 2018, 6:05 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66573/
> ---
> 
> (Updated April 11, 2018, 6:05 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Between failovers, tasks that normally would not require preemption could be 
> in a PENDING state for an extended period of time and become eligible for 
> preemption. Thus, when the scheduler starts, offers could not have been 
> processed yet and the tasks can preempt other tasks needlessly.
> 
> Added an initial delay to preemption slot searching on scheduler startup so 
> PENDING tasks have a chance to be scheduled before preempting.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java 
> 7618efc2c0cb46e96119accd2c7962ea8ee7a05e 
>   src/test/java/org/apache/aurora/scheduler/config/CommandLineTest.java 
> 0a16d3c95d3f262686936330ac7d7dc332d759d5 
> 
> 
> Diff: https://reviews.apache.org/r/66573/diff/1/
> 
> 
> Testing
> ---
> 
> `./gradlew test`
> 
> Will deploy on a cluster to ensure preemption does not start for the initial 
> interval.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66536: Add more preemption metrics (jobs preempted, preemptors) and logging statements

2018-04-11 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66536/#review200915
---


Ship it!





src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java
Lines 194 (patched)
<https://reviews.apache.org/r/66536/#comment281815>

Include candidates as well.


- Santhosh Kumar Shanmugham


On April 10, 2018, 3:47 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66536/
> ---
> 
> (Updated April 10, 2018, 3:47 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Added additional metrics:
> ```
> 1. preemptor_tasks_preempted_[JOB_NAME] - The number of times [JOB_NAME] has 
> been preempted for another task.
> 2. preemptor_tasks_preemptor_[JOB_NAME] - The number of times [JOB_NAME] has 
> preempted another task.
> 3. preemptor_slot_search_[success|failed]_for_[JOB_NAME] - The number of 
> times [JOB_NAME] has or hasn't found a slot for preemption.
> 4. preemptor_slot_validation_[success|failed]_for_[JOB_NAME] - The number of 
> times [JOB_NAME] succeeded to or failed to validate a slot before preemption.
> ```
> 
> Additionally, added some `LOG.info` statements for better visibility into 
> preemption/preemption slot finding.
> 
> Did a little bit of code refactoring as well.
> 
> 
> Diffs
> -
> 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java 
> ef06471d007b1d36300eea30cdea059c1ba231b0 
>   
> src/main/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilter.java
>  569cfe6b04e6b7bf0dca7625b00698e9d8e47daf 
>   src/main/java/org/apache/aurora/scheduler/preemptor/Preemptor.java 
> 293d106eee383dd5352a629780b897d58c9dd439 
>   src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java 
> 87305774db0ce6fb7ebed060ab4dc99be6c2df4c 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImpl.java 
> edab03dfd7fdbb24891565ba755212f03d6ea3b8 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
>  ba775f4688dc57504e2def0dc4b5dcd00da448e1 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  b3ffb0d4fc9b9b52bb49225765bd14fb8105169a 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 0ef29d598784ce529bcaac7017dc0f2cc5055938 
> 
> 
> Diff: https://reviews.apache.org/r/66536/diff/1/
> 
> 
> Testing
> ---
> 
> Added unit tests, `./gradlew test` passes.
> Manually ensured new metrics are exported.
> Tested at scale.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 66269: End to end tests misc. fixes

2018-03-27 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66269/#review200069
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On March 25, 2018, 7:59 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66269/
> ---
> 
> (Updated March 25, 2018, 7:59 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Santhosh Kumar Shanmugham, and Stephan 
> Erb.
> 
> 
> Bugs: AURORA-1974
> https://issues.apache.org/jira/browse/AURORA-1974
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Excluding kerberos unit file from being copied on provision as it's later 
> copied and deleted by the end to end test.
> 
> Bypass leader redirect changed from upstart to systemd. This test wasn't 
> being run because the kerberos test was failing.
> 
> Fixing kerberos end to end test. Previous version had it's signing key 
> revoked resulting in the test failing.
> 
> Changing docker image to slim-stretch in docker aurora tests to address 
> AURORA-1974.
> 
> Added daemon-reload to aurorabuild whenever the daemons are restarted.
> 
> 
> Diffs
> -
> 
>   examples/jobs/hello_docker_engine.aurora 
> 99d99a26844f2f2f473626b16cfbf91aa70031ff 
>   examples/jobs/hello_docker_image.aurora 
> 049a147749876f795636827ea5e5485fa72a0930 
>   examples/vagrant/aurorabuild.sh c39388f46ea4718117889a5c67aec9afcc7f5d2e 
>   examples/vagrant/provision-dev-cluster.sh 
> fe3281f6b1f6adee021e534b230221efb86a5d3c 
>   examples/vagrant/systemd/aurora-scheduler-kerberos.service 
> 10e4f2c355c10b8204518af0a49c9d90be4f6ef8 
>   src/test/sh/org/apache/aurora/e2e/test_bypass_leader_redirect_end_to_end.sh 
> 5c0f12b56a30eef35c1903d5f4a96591d3c74471 
>   src/test/sh/org/apache/aurora/e2e/test_kerberos_end_to_end.sh 
> 646c213ea105e32f2d37df29832aa1009481b6d1 
> 
> 
> Diff: https://reviews.apache.org/r/66269/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66103: Introduce mesos disk collector

2018-03-22 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199838
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On March 22, 2018, 6:11 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> ---
> 
> (Updated March 22, 2018, 6:11 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d3ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/9/
> 
> 
> Testing
> ---
> 
> - I added unit tests.
> - Tested in vagrant and it works as intenced.
> - I also built and deployed in our test enviroment. In order to measure 
> imporoved performance I created jobs with nested folders and noticed 
> reduction in CPU utilization of the Observer process, by at least 60%. (1.5 
> CPU cores to 0.4 CPU cores)
> 
> Here is one specific test setup: On two hosts I created a two tasks. Each 
> task creates identical nested directory structures and files in them. The 
> overall size is 30GB. test_host_1 runs the current version of observer and 
> test_host_2 runs Observer with this patch and also has mesos_disk_collection 
> enabled. The results are as follows:
> 
> ```
> rezam[7]TEST_HOST_1 ~ $ while true; do echo `date`; curl localhost:1338/vars 
> -s | grep cpu; sleep 10; done
> Thu Mar 22 04:36:17 UTC 2018
> observer.observer_cpu 108.9
> Thu Mar 22 04:36:27 UTC 2018
> observer.observer_cpu 123.2
> Thu Mar 22 04:36:38 UTC 2018
> observer.observer_cpu 1

Re: Review Request 66103: Introduce mesos disk collector

2018-03-22 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66103/#review199814
---


Fix it, then Ship it!




Implementation LGTM. Some more comments about the tests. Fix it and ship it.


src/test/python/apache/thermos/monitoring/test_disk.py
Lines 97 (patched)
<https://reviews.apache.org/r/66103/#comment280287>

Everywhere using HTTPretty.

Assert `httpretty.last_request()`

https://github.com/gabrielfalcao/HTTPretty



src/test/python/apache/thermos/monitoring/test_disk.py
Lines 211 (patched)
<https://reviews.apache.org/r/66103/#comment280288>

s/uage/usage/



src/test/python/apache/thermos/monitoring/test_disk.py
Lines 284 (patched)
<https://reviews.apache.org/r/66103/#comment280289>

We can use `dynamic responses through callbacks` and mock a connection 
error with HTTPretty without patching `requests`.


https://github.com/gabrielfalcao/HTTPretty#dynamic-responses-through-callbacks



src/test/python/apache/thermos/monitoring/test_disk.py
Lines 305 (patched)
<https://reviews.apache.org/r/66103/#comment280285>

Assert `request.get` was called indeed.


- Santhosh Kumar Shanmugham


On March 22, 2018, 2:52 p.m., Reza Motamedi wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66103/
> ---
> 
> (Updated March 22, 2018, 2:52 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Daniel Knightly, Franck Cuny, 
> Jordan Ly, Santhosh Kumar Shanmugham, and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> When disk isolation is enabled in a Mesos agent it calculates the disk usage 
> for each container. 
> Thermos Observer also monitors disk usage using `twitter.common.dirutil`, 
> essentially repeating the work already done by the agent. In practice, we see 
> that disk monitoring is one of the most expensive resource monitoring tasks. 
> For instance, when there are deeply nested directories, the CPU utilization 
> of the observer process can easily reach 1.5 CPUs. It would be ideal if we 
> delegate the disk monitoring task to the agent and do it only once. With this 
> approach, when disk collection has improved in the agent (for instance by 
> implementing XFS isolation), we can simply benefit from it without any code 
> change. Some more information about the problem is provided in AURORA-1918.
> 
> This patch that introduces `MesosDiskCollector` which queries the agent's API 
> endpoint to lookup disk_used_bytes. Note that there is also resource 
> monitoring in thermos executor. Currently, I left the disk collector there to 
> use the `du` implementation. That can be changed in a later patch.
> 
> I modified some vagrant config files including `aurora-executor.service` and 
> `etc_mesos-slave/isolation` for testing. They can be left as is. I included 
> them in this patch to show how this would work e2e.
> 
> 
> Diffs
> -
> 
>   3rdparty/python/requirements.txt 4ac242cfa2c1c19cb7447816ab86e748839d3d11 
>   RELEASE-NOTES.md 51ab6c724694244bf616b29e9beace4a4a3f5252 
>   docs/reference/observer-configuration.md 
> 8a443c94f7f37f9454989781f722101a97c99f15 
>   examples/jobs/hello_world.aurora 5401bfebe753b5e53abd08baeac501144ced9b5a 
>   examples/vagrant/mesos_config/etc_mesos-slave/isolation 
> 1a7028ffc70116b104ef3ad22b7388f637707a0f 
>   examples/vagrant/systemd/thermos.service 
> 01925bcd2ae44f100df511f3c3951c3f5a1a72aa 
>   src/main/python/apache/aurora/tools/thermos_observer.py 
> dd9f0c46ceac9e939b1b763073314161de0ea614 
>   src/main/python/apache/thermos/monitoring/BUILD 
> 65ba7088f65e7baa5d30744736ba456b46a55e86 
>   src/main/python/apache/thermos/monitoring/disk.py 
> 986d33a5000f8d5db15cb639c81f8b1d756ffa05 
>   src/main/python/apache/thermos/monitoring/resource.py 
> adcdc751c03460dc801a18278faa96d6bd64722b 
>   src/main/python/apache/thermos/observer/task_observer.py 
> a6870d48bddf2a2ccede7bb68195f2baae1d0e47 
>   
> src/test/python/apache/aurora/executor/common/test_resource_manager_integration.py
>  fe74bd1d3ecd89fca1b5b2251202cbbc0f24 
>   src/test/python/apache/thermos/monitoring/BUILD 
> 8f2b39336dce6c7b580e6ba0009f60afdcb89179 
>   src/test/python/apache/thermos/monitoring/test_disk.py 
> 362393bfd1facf3198e2d438d0596b16700b72b8 
>   src/test/python/apache/thermos/monitoring/test_resource.py 
> e577e552d4ee1807096a15401851bb9fd95fa426 
> 
> 
> Diff: https://reviews.apache.org/r/66103/diff/7/
> 
> 
> Testing
> ---
> 
> - I 

Re: Review Request 66192: [WIP] Variable group size updates

2018-03-21 Thread Santhosh Kumar Shanmugham


> On March 20, 2018, 9:29 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
> > Lines 107-111 (patched)
> > <https://reviews.apache.org/r/66192/diff/1/?file=1984357#file1984357line107>
> >
> > Is this idea of a step sate needed? Strategies writing to the database 
> > seems like a design smell. 
> > 
> > If I have 10 instances:
> > 
> > [0,1,2,3,4,5,6,7,8,9] 
> > 
> > And I pass a VariableBatchStrategy of: [1, 4, 5]
> > 
> > Then using the BatchStrategy it is *always* in the following order:
> > 
> > [[0], [1,2,3,4], [5,6,7,8,9]]
> > 
> > And I don't *think* I need to write the step to the database to figure 
> > out which step I'm in? I can use the number idle and active to figure out 
> > exactly how many have been processed.
> 
> Jordan Ly wrote:
> +1, removing the need for persistent state would greatly reduce the 
> surface area of this patch.

+1


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review199635
---


On March 20, 2018, 7:10 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> -------
> 
> (Updated March 20, 2018, 7:10 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
> b25f3831cecc58c90375c90b16142421f8f09e38 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java 
> 9e86b9e276ea90a249284a824705b5bbf19dcbce 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  f8be8058f3a80a18b999d2666e2adb33e1e55fef 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66192: [WIP] Variable group size updates

2018-03-21 Thread Santhosh Kumar Shanmugham


> On March 21, 2018, 2:33 p.m., Jordan Ly wrote:
> > I am mostly concerned about the UX. Users will be able to specify both 
> > batch size and variable batch size and must know that variable batch sizing 
> > takes precedent over other strategies.
> > 
> > Is it worth it to make a larger investment into the Thrift interface and 
> > avoid ambiguity? Or refactor the current batching strategy to use the new 
> > variable codepath (a single batch size specified to the variable strategy 
> > should behave the same as the current implementation).

+1 I think it should be an either-or. There should be logic in the API to 
clearly message this case.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66192/#review199707
---


On March 20, 2018, 7:10 p.m., Renan DelValle wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66192/
> ---
> 
> (Updated March 20, 2018, 7:10 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, and 
> Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Adding support for variable group sizes when executing an update.
> 
> Design doc for this change is here: 
> https://docs.google.com/document/d/1xGk4ueH8YlmJCk6hQJh85u4to4M1VQD0l630IOchvgY/edit#heading=h.lg3hty82f5cz
> 
> I opted for the path of least resistance with regards to the Thrift changes 
> as I didn't see any benefit in making the larger changes required to make the 
> interfaces a bit more flexible.
> 
> Requesting feedback on these changes and the approach from the community 
> before I proceed.
> 
> Tests will be added after the community approves of the direciton and 
> approach.
> 
> Note to reviewers: Changes made in ActiveLimitedStrategy.java were made to 
> move towards getting rid of FluentIterable. I figured since I was touching 
> that code, it wouldn't hurt to test the Java 8 equivalent of it. I can get 
> rid of the change here and make it in a separate patch if desired.
> 
> 
> Diffs
> -
> 
>   api/src/main/thrift/org/apache/aurora/gen/api.thrift 
> ef754e32172e7490a47a13e7b526f243ffa3efeb 
>   api/src/main/thrift/org/apache/aurora/gen/storage.thrift 
> b79e2045ccda05d5058565f81988dfe33feea8f1 
>   src/main/java/org/apache/aurora/scheduler/storage/JobUpdateStore.java 
> b25f3831cecc58c90375c90b16142421f8f09e38 
>   src/main/java/org/apache/aurora/scheduler/storage/durability/Loader.java 
> 10864f122eff5027c88d835baae6de483d960218 
>   
> src/main/java/org/apache/aurora/scheduler/storage/durability/WriteRecorder.java
>  8d70cae35289a9e36142bab288cf0c9398ebd2d4 
>   
> src/main/java/org/apache/aurora/scheduler/storage/mem/MemJobUpdateStore.java 
> 9e86b9e276ea90a249284a824705b5bbf19dcbce 
>   
> src/main/java/org/apache/aurora/scheduler/thrift/SchedulerThriftInterface.java
>  9fc0416086dd3eb2e2f4e8f659da59fcdea2b22b 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  f8be8058f3a80a18b999d2666e2adb33e1e55fef 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateFactory.java 
> 3992aa77fc305adc390a4aaeb1d3939d6241ddbd 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/ActiveLimitedStrategy.java
>  855ea9c20788b51695b7eff5ac0970f0d52a9546 
>   
> src/main/java/org/apache/aurora/scheduler/updater/strategy/VariableBatchStrategy.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66192/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Renan DelValle
> 
>



Re: Review Request 66139: Speedup regular Thermos observer checkpoint refresh

2018-03-21 Thread Santhosh Kumar Shanmugham


> On March 21, 2018, 10:47 a.m., Santhosh Kumar Shanmugham wrote:
> > Ship It!

I am a little confused about the performance testing results that are posted 
here, since the pervious results indicated gains from 2secs to 0.2secs, while 
the current one is much lesser. Can you add a little bit of context regarding 
the results? Thanks.


- Santhosh Kumar


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66139/#review199681
---


On March 20, 2018, 3:41 p.m., Stephan Erb wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66139/
> ---
> 
> (Updated March 20, 2018, 3:41 p.m.)
> 
> 
> Review request for Aurora, Jordan Ly, Renan DelValle, and Reza Motamedi.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Profiling indicates that a significant part of the refresh time os spend in 
> `os.path.realpath`. 
> This was introduced in https://reviews.apache.org/r/35580/ to properly handle 
> the `latest`
> symlink in the Mesos folder layout. 
> 
> This patch takes a slightly different approach to solve this problem based on 
> `os.path.islink`.
> The latter is faster as it just needs to look at a single folder rather than 
> an entire path.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/executor/common/path_detector.py 
> ed264d74ef5a5a7aa681a56b340f9b16504a88ad 
>   src/test/python/apache/aurora/executor/common/test_path_detector.py 
> 7b5ef0cf552d22d4cfbf3357071de036551026dc 
> 
> 
> Diff: https://reviews.apache.org/r/66139/diff/2/
> 
> 
> Testing
> ---
> 
> I have tested this build on a node with 55 running tasks and 2004 finished 
> ones.
> 
> Before this patch:
> 
> D0320 22:20:44.887248 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.92s
> D0320 22:20:50.746316 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.93s
> D0320 22:20:56.590157 25771 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.89s
>  
> With this patch:
> 
> D0320 22:18:53.545236 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.48s
> D0320 22:18:59.031919 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.49s
> D0320 22:19:04.512358 16250 task_observer.py:142] TaskObserver: finished 
> checkpoint refresh in 0.48s
> 
> 
> Thanks,
> 
> Stephan Erb
> 
>



  1   2   3   4   >