Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176716
---


Ship it!




Ship It!

- David McLaughlin


On June 2, 2017, 12:01 a.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 2, 2017, 12:01 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar, Stephan Erb, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/5/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176711
---



Master (e76862a) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On June 2, 2017, 12:01 a.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 2, 2017, 12:01 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar, Stephan Erb, and 
> Zameer Manji.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/5/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59733: Adding Configurable Wait Period for Graceful Shutdowns

2017-06-01 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59733/#review176708
---


Ship it!




Master (e76862a) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On June 1, 2017, 4:48 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59733/
> ---
> 
> (Updated June 1, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1931
> https://issues.apache.org/jira/browse/AURORA-1931
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We have some services that require more than the current 10 seconds given to 
> gracefully shutdown (they need to close resources, finish requests, etc).
> 
> We would like to be able to configure the amount of time we wait between each
> stage of the graceful shutdown sequence. See this 
> [proposal](https://docs.google.com/document/d/1Sl-KWNyt1j0nIndinqfJsH3pkUY5IYXfGWyLHU2wacs/edit?usp=sharing)
>  for a more in-depth
> analysis.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/config/schema/base.py 
> b2692a648645a195a24491e4978fb833c6c20be8 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 81461cb49ac223f3bdfa59e8c59e150a07771dea 
>   src/main/python/apache/aurora/executor/http_lifecycle.py 
> 9280bf29da9bda1691adbf3a4c34c4f3d4900517 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 4a23c5984c2d093e2f53e93aec71418f84b65928 
>   src/test/python/apache/aurora/executor/test_http_lifecycle.py 
> a967e3410a4d2dc2e1721f505a4d76da9209d177 
>   src/test/python/apache/aurora/executor/test_thermos_task_runner.py 
> 1b92667bceabc8ea1540122477a51cb58ea2ae36 
> 
> 
> Diff: https://reviews.apache.org/r/59733/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit and integration tests.
> 
> Created and killed jobs with varying wait_escalation_secs values on the 
> Vagrant devcluster.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/
---

(Updated June 2, 2017, 12:01 a.m.)


Review request for Aurora, David McLaughlin, Santhosh Kumar, Stephan Erb, and 
Zameer Manji.


Changes
---

fixed style issue


Bugs: AURORA-1929
https://issues.apache.org/jira/browse/AURORA-1929


Repository: aurora


Description
---

Improve task history pruning by batch deleting tasks.

The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
cluster has a large number of inactive tasks.

This CR batches all removeTasks operations and execute them all at once to 
avoid additional cost of coalescing. The fix will also benefit implicit task 
history pruning since it has similar underlying implementation. See 
https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
73878210f9028901fda3b08e66c6a63c24260d35 


Diff: https://reviews.apache.org/r/59699/diff/5/

Changes: https://reviews.apache.org/r/59699/diff/4-5/


Testing
---

__unit_tests:__

./build-support/jenkins/build.sh

No unit tests were created for this patch since it does not add new 
functionalities or alter the interface, but improves the efficiency of the 
existing code.

__e2e tests:__

Attached was a screenshot of the task history pruning benchmark obtained from a 
scale test in Twitter's test cluster.

- Before applying this patch, the task history pruning takes ~30 minutes on 
130K tasks.

- After applying the patch, the pruning takes ~1 minute.


File Attachments


task_history_pruning_benchmark.png
  
https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png


Thanks,

Kai Huang



Re: Review Request 59733: Adding Configurable Wait Period for Graceful Shutdowns

2017-06-01 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59733/#review176707
---



Can you add a new end-to-end test that exercises this feature?


src/main/python/apache/aurora/executor/http_lifecycle.py
Lines 37 (patched)


+1


- Santhosh Kumar Shanmugham


On June 1, 2017, 4:48 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59733/
> ---
> 
> (Updated June 1, 2017, 4:48 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1931
> https://issues.apache.org/jira/browse/AURORA-1931
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We have some services that require more than the current 10 seconds given to 
> gracefully shutdown (they need to close resources, finish requests, etc).
> 
> We would like to be able to configure the amount of time we wait between each
> stage of the graceful shutdown sequence. See this 
> [proposal](https://docs.google.com/document/d/1Sl-KWNyt1j0nIndinqfJsH3pkUY5IYXfGWyLHU2wacs/edit?usp=sharing)
>  for a more in-depth
> analysis.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/config/schema/base.py 
> b2692a648645a195a24491e4978fb833c6c20be8 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 81461cb49ac223f3bdfa59e8c59e150a07771dea 
>   src/main/python/apache/aurora/executor/http_lifecycle.py 
> 9280bf29da9bda1691adbf3a4c34c4f3d4900517 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 4a23c5984c2d093e2f53e93aec71418f84b65928 
>   src/test/python/apache/aurora/executor/test_http_lifecycle.py 
> a967e3410a4d2dc2e1721f505a4d76da9209d177 
>   src/test/python/apache/aurora/executor/test_thermos_task_runner.py 
> 1b92667bceabc8ea1540122477a51cb58ea2ae36 
> 
> 
> Diff: https://reviews.apache.org/r/59733/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit and integration tests.
> 
> Created and killed jobs with varying wait_escalation_secs values on the 
> Vagrant devcluster.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 59733: Adding Configurable Wait Period for Graceful Shutdowns

2017-06-01 Thread Jordan Ly


> On June 1, 2017, 11:53 p.m., David McLaughlin wrote:
> > src/main/python/apache/aurora/executor/http_lifecycle.py
> > Lines 37 (patched)
> > 
> >
> > Will this break for tasks that were deployed before the client was 
> > updated? Should we also inject a default here?

Good point! I definitely need to add a default to avoid this case.


- Jordan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59733/#review176705
---


On June 1, 2017, 11:48 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59733/
> ---
> 
> (Updated June 1, 2017, 11:48 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1931
> https://issues.apache.org/jira/browse/AURORA-1931
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We have some services that require more than the current 10 seconds given to 
> gracefully shutdown (they need to close resources, finish requests, etc).
> 
> We would like to be able to configure the amount of time we wait between each
> stage of the graceful shutdown sequence. See this 
> [proposal](https://docs.google.com/document/d/1Sl-KWNyt1j0nIndinqfJsH3pkUY5IYXfGWyLHU2wacs/edit?usp=sharing)
>  for a more in-depth
> analysis.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/config/schema/base.py 
> b2692a648645a195a24491e4978fb833c6c20be8 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 81461cb49ac223f3bdfa59e8c59e150a07771dea 
>   src/main/python/apache/aurora/executor/http_lifecycle.py 
> 9280bf29da9bda1691adbf3a4c34c4f3d4900517 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 4a23c5984c2d093e2f53e93aec71418f84b65928 
>   src/test/python/apache/aurora/executor/test_http_lifecycle.py 
> a967e3410a4d2dc2e1721f505a4d76da9209d177 
>   src/test/python/apache/aurora/executor/test_thermos_task_runner.py 
> 1b92667bceabc8ea1540122477a51cb58ea2ae36 
> 
> 
> Diff: https://reviews.apache.org/r/59733/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit and integration tests.
> 
> Created and killed jobs with varying wait_escalation_secs values on the 
> Vagrant devcluster.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Re: Review Request 59733: Adding Configurable Wait Period for Graceful Shutdowns

2017-06-01 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59733/#review176705
---




src/main/python/apache/aurora/executor/http_lifecycle.py
Lines 37 (patched)


Will this break for tasks that were deployed before the client was updated? 
Should we also inject a default here?


- David McLaughlin


On June 1, 2017, 11:48 p.m., Jordan Ly wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59733/
> ---
> 
> (Updated June 1, 2017, 11:48 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, 
> Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1931
> https://issues.apache.org/jira/browse/AURORA-1931
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> We have some services that require more than the current 10 seconds given to 
> gracefully shutdown (they need to close resources, finish requests, etc).
> 
> We would like to be able to configure the amount of time we wait between each
> stage of the graceful shutdown sequence. See this 
> [proposal](https://docs.google.com/document/d/1Sl-KWNyt1j0nIndinqfJsH3pkUY5IYXfGWyLHU2wacs/edit?usp=sharing)
>  for a more in-depth
> analysis.
> 
> 
> Diffs
> -
> 
>   src/main/python/apache/aurora/config/schema/base.py 
> b2692a648645a195a24491e4978fb833c6c20be8 
>   src/main/python/apache/aurora/executor/aurora_executor.py 
> 81461cb49ac223f3bdfa59e8c59e150a07771dea 
>   src/main/python/apache/aurora/executor/http_lifecycle.py 
> 9280bf29da9bda1691adbf3a4c34c4f3d4900517 
>   src/test/python/apache/aurora/client/cli/test_inspect.py 
> 4a23c5984c2d093e2f53e93aec71418f84b65928 
>   src/test/python/apache/aurora/executor/test_http_lifecycle.py 
> a967e3410a4d2dc2e1721f505a4d76da9209d177 
>   src/test/python/apache/aurora/executor/test_thermos_task_runner.py 
> 1b92667bceabc8ea1540122477a51cb58ea2ae36 
> 
> 
> Diff: https://reviews.apache.org/r/59733/diff/1/
> 
> 
> Testing
> ---
> 
> Ran unit and integration tests.
> 
> Created and killed jobs with varying wait_escalation_secs values on the 
> Vagrant devcluster.
> 
> 
> Thanks,
> 
> Jordan Ly
> 
>



Review Request 59733: Adding Configurable Wait Period for Graceful Shutdowns

2017-06-01 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59733/
---

Review request for Aurora, David McLaughlin, Santhosh Kumar Shanmugham, Stephan 
Erb, and Zameer Manji.


Bugs: AURORA-1931
https://issues.apache.org/jira/browse/AURORA-1931


Repository: aurora


Description
---

We have some services that require more than the current 10 seconds given to 
gracefully shutdown (they need to close resources, finish requests, etc).

We would like to be able to configure the amount of time we wait between each
stage of the graceful shutdown sequence. See this 
[proposal](https://docs.google.com/document/d/1Sl-KWNyt1j0nIndinqfJsH3pkUY5IYXfGWyLHU2wacs/edit?usp=sharing)
 for a more in-depth
analysis.


Diffs
-

  src/main/python/apache/aurora/config/schema/base.py 
b2692a648645a195a24491e4978fb833c6c20be8 
  src/main/python/apache/aurora/executor/aurora_executor.py 
81461cb49ac223f3bdfa59e8c59e150a07771dea 
  src/main/python/apache/aurora/executor/http_lifecycle.py 
9280bf29da9bda1691adbf3a4c34c4f3d4900517 
  src/test/python/apache/aurora/client/cli/test_inspect.py 
4a23c5984c2d093e2f53e93aec71418f84b65928 
  src/test/python/apache/aurora/executor/test_http_lifecycle.py 
a967e3410a4d2dc2e1721f505a4d76da9209d177 
  src/test/python/apache/aurora/executor/test_thermos_task_runner.py 
1b92667bceabc8ea1540122477a51cb58ea2ae36 


Diff: https://reviews.apache.org/r/59733/diff/1/


Testing
---

Ran unit and integration tests.

Created and killed jobs with varying wait_escalation_secs values on the Vagrant 
devcluster.


Thanks,

Jordan Ly



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/
---

(Updated June 1, 2017, 11:48 p.m.)


Review request for Aurora, David McLaughlin, Santhosh Kumar, Stephan Erb, and 
Zameer Manji.


Changes
---

Refactored the createDeleteEvent code to ensure that the tasks deleted from 
TaskStore are consistent with the tasks in PubsubEvent.


Bugs: AURORA-1929
https://issues.apache.org/jira/browse/AURORA-1929


Repository: aurora


Description
---

Improve task history pruning by batch deleting tasks.

The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
cluster has a large number of inactive tasks.

This CR batches all removeTasks operations and execute them all at once to 
avoid additional cost of coalescing. The fix will also benefit implicit task 
history pruning since it has similar underlying implementation. See 
https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
73878210f9028901fda3b08e66c6a63c24260d35 


Diff: https://reviews.apache.org/r/59699/diff/4/

Changes: https://reviews.apache.org/r/59699/diff/3-4/


Testing
---

__unit_tests:__

./build-support/jenkins/build.sh

No unit tests were created for this patch since it does not add new 
functionalities or alter the interface, but improves the efficiency of the 
existing code.

__e2e tests:__

Attached was a screenshot of the task history pruning benchmark obtained from a 
scale test in Twitter's test cluster.

- Before applying this patch, the task history pruning takes ~30 minutes on 
130K tasks.

- After applying the patch, the pruning takes ~1 minute.


File Attachments


task_history_pruning_benchmark.png
  
https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png


Thanks,

Kai Huang



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread David McLaughlin


> On June 1, 2017, 10:46 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Line 376 (original), 377 (patched)
> > 
> >
> > I wonder if it's using the taskIds from the event? Since they have been 
> > selected from the database and might not match the taskIds we passed in. 
> > This logic was there in the previous patch too. This would ensure 
> > consistency between taskStore.deleteTasks and the taskIds we broadcast as 
> > deleted.
> 
> Kai Huang wrote:
> Previously taskStore.deleteTasks() is not using the taskIds from the 
> event. But it doesn't hurt to ensure the consistency here.

If you look at the old code for deleteTasks, before it processes each task it 
first selects them all from the taskStore (kind of what is happening in the 
event code).


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176694
---


On June 1, 2017, 10:41 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 10:41 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/3/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang


> On June 1, 2017, 10:46 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Line 376 (original), 377 (patched)
> > 
> >
> > I wonder if it's using the taskIds from the event? Since they have been 
> > selected from the database and might not match the taskIds we passed in. 
> > This logic was there in the previous patch too. This would ensure 
> > consistency between taskStore.deleteTasks and the taskIds we broadcast as 
> > deleted.

Previously taskStore.deleteTasks() is not using the taskIds from the event. But 
it doesn't hurt to ensure the consistency here.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176694
---


On June 1, 2017, 10:41 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 10:41 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/3/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176695
---



Master (e76862a) is red with this patch.
  ./build-support/jenkins/build.sh

:commons:generateThriftResources
:commons:processResources
:commons:classes
:commons:jar
:compileJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/storage/log/WriteAheadStorage.java:74:
 Note: Wrote forwarder 
org.apache.aurora.scheduler.storage.log.WriteAheadStorageForwarder
@Forward({
^
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/org/apache/aurora/common/args/apt/cmdline.arg.info.txt.2
Note: Writing 
file:/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/classes/main/META-INF/compiler/resource-mappings/org.apache.aurora.common.args.apt.CmdLineProcessor

:generateBuildProperties
:processResources
:classes
:jar
:startScripts
:distTar
:distZip
:assemble
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources UP-TO-DATE
:jmhClasses
:checkstyleJmh
:jsHint
:checkstyleMain[ant:checkstyle] [ERROR] 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java:21:8:
 Unused import - java.util.stream.Collectors. [UnusedImports]
 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':checkstyleMain'.
> Checkstyle rule violations were found. See the report at: 
> file:///home/jenkins/jenkins-slave/workspace/AuroraBot/dist/reports/checkstyle/main.html

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 1 mins 27.917 secs


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On June 2, 2017, 1:41 a.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 2, 2017, 1:41 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/3/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176694
---



This LGTM now! Just one minor thing.


src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Line 376 (original), 377 (patched)


I wonder if it's using the taskIds from the event? Since they have been 
selected from the database and might not match the taskIds we passed in. This 
logic was there in the previous patch too. This would ensure consistency 
between taskStore.deleteTasks and the taskIds we broadcast as deleted.


- David McLaughlin


On June 1, 2017, 10:41 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 10:41 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/3/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang


> On June 1, 2017, 9:22 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 389-391 (original), 374-376 (patched)
> > 
> >
> > We probably don't even need the separate events. We could just have:
> > 
> > eventSink.post(createDeleteEvent(taskStore, taskIds));
> 
> Kai Huang wrote:
> So this will change the semantics that: The Delete Event is published 
> after we delete task from TaskStore?
> 
> David McLaughlin wrote:
> I don't think it changes the semantics at all? In both cases, the events 
> are published after the batch delete. All I'm suggesting is we send a single 
> event for the batch rather than one event per task in the batch.

addressed.


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176670
---


On June 1, 2017, 10:41 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 10:41 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/3/
> 
> 
> Testing
> ---
> 
> __unit_tests:__
> 
> ./build-support/jenkins/build.sh
> 
> No unit tests were created for this patch since it does not add new 
> functionalities or alter the interface, but improves the efficiency of the 
> existing code.
> 
> __e2e tests:__
> 
> Attached was a screenshot of the task history pruning benchmark obtained from 
> a scale test in Twitter's test cluster.
> 
> - Before applying this patch, the task history pruning takes ~30 minutes on 
> 130K tasks.
> 
> - After applying the patch, the pruning takes ~1 minute.
> 
> 
> File Attachments
> 
> 
> task_history_pruning_benchmark.png
>   
> https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/
---

(Updated June 1, 2017, 10:41 p.m.)


Review request for Aurora, David McLaughlin and Santhosh Kumar.


Changes
---

- 1. Refactored the deletTasks code to post a single event for all task 
deletions.

- 2. Upload benchmark result for task history pruning.


Bugs: AURORA-1929
https://issues.apache.org/jira/browse/AURORA-1929


Repository: aurora


Description
---

Improve task history pruning by batch deleting tasks.

The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
cluster has a large number of inactive tasks.

This CR batches all removeTasks operations and execute them all at once to 
avoid additional cost of coalescing. The fix will also benefit implicit task 
history pruning since it has similar underlying implementation. See 
https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
73878210f9028901fda3b08e66c6a63c24260d35 


Diff: https://reviews.apache.org/r/59699/diff/3/

Changes: https://reviews.apache.org/r/59699/diff/2-3/


Testing (updated)
---

__unit_tests:__

./build-support/jenkins/build.sh

No unit tests were created for this patch since it does not add new 
functionalities or alter the interface, but improves the efficiency of the 
existing code.

__e2e tests:__

Attached was a screenshot of the task history pruning benchmark obtained from a 
scale test in Twitter's test cluster.

- Before applying this patch, the task history pruning takes ~30 minutes on 
130K tasks.

- After applying the patch, the pruning takes ~1 minute.


File Attachments (updated)


task_history_pruning_benchmark.png
  
https://reviews.apache.org/media/uploaded/files/2017/06/01/74eb5104-d338-4530-abd2-b82fbdc6bf84__task_history_pruning_benchmark.png


Thanks,

Kai Huang



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread Reza Motamedi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review176682
---


Ship it!




Ship It!

- Reza Motamedi


On June 1, 2017, 6:33 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated June 1, 2017, 6:33 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread David McLaughlin


> On June 1, 2017, 9:22 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 389-391 (original), 374-376 (patched)
> > 
> >
> > We probably don't even need the separate events. We could just have:
> > 
> > eventSink.post(createDeleteEvent(taskStore, taskIds));
> 
> Kai Huang wrote:
> So this will change the semantics that: The Delete Event is published 
> after we delete task from TaskStore?

I don't think it changes the semantics at all? In both cases, the events are 
published after the batch delete. All I'm suggesting is we send a single event 
for the batch rather than one event per task in the batch.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176670
---


On June 1, 2017, 9:13 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 9:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/2/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread Reza Motamedi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review176681
---




src/main/java/org/apache/aurora/scheduler/events/Webhook.java
Lines 111-113 (patched)


I just think the listener misses one publish event?


- Reza Motamedi


On June 1, 2017, 6:33 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated June 1, 2017, 6:33 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang


> On June 1, 2017, 9:22 p.m., David McLaughlin wrote:
> > src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
> > Lines 389-391 (original), 374-376 (patched)
> > 
> >
> > We probably don't even need the separate events. We could just have:
> > 
> > eventSink.post(createDeleteEvent(taskStore, taskIds));

So this will change the semantics that: The Delete Event is published after we 
delete task from TaskStore?


- Kai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176670
---


On June 1, 2017, 9:13 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 9:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/2/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176673
---



Master (e76862a) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On June 1, 2017, 9:13 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 9:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/2/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread David McLaughlin


> On June 1, 2017, 9:15 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/events/Webhook.java
> > Lines 111-113 (patched)
> > 
> >
> > What are the implications of a task failing to change?

Not sure I follow. You mean the webhook request failing? It would just mean the 
downstream service wouldn't receive the event.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review17
---


On June 1, 2017, 6:33 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated June 1, 2017, 6:33 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/#review176670
---




src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java
Lines 389-391 (original), 374-376 (patched)


We probably don't even need the separate events. We could just have:

eventSink.post(createDeleteEvent(taskStore, taskIds));


- David McLaughlin


On June 1, 2017, 9:13 p.m., Kai Huang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59699/
> ---
> 
> (Updated June 1, 2017, 9:13 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Santhosh Kumar.
> 
> 
> Bugs: AURORA-1929
> https://issues.apache.org/jira/browse/AURORA-1929
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Improve task history pruning by batch deleting tasks.
> 
> The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
> cluster has a large number of inactive tasks.
> 
> This CR batches all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation. See 
> https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
> details.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
> 73878210f9028901fda3b08e66c6a63c24260d35 
> 
> 
> Diff: https://reviews.apache.org/r/59699/diff/2/
> 
> 
> Testing
> ---
> 
> ./build-support/jenkins/build.sh
> 
> 
> Thanks,
> 
> Kai Huang
> 
>



Re: Review Request 59698: Allow custom OfferManager ordering to be injected via Guice modules

2017-06-01 Thread David McLaughlin


> On June 1, 2017, 9:15 p.m., Reza Motamedi wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Lines 121-123 (patched)
> > 
> >
> > So, these installed modules are just needed by the private-module, 
> > right?

Normally you're right and you would limit the exposure of the binding, but here 
it's outside of the PrivateModule for the OfferSettings.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59698/#review176660
---


On June 1, 2017, 12:14 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59698/
> ---
> 
> (Updated June 1, 2017, 12:14 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Given how powerful a global ordering of offers is, it's worth allowing custom 
> orderings to be injected with a CLI-provided Guice module. This is consistent 
> with the other custom scheduling logic work.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 96acaf9ab7479c818a1f89e0faa33aad971bb5d4 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> ee80176e81a997a1361c036c9fbda6d120109a27 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e999ac5faee98333e69e964073000a2a9481d081 
> 
> 
> Diff: https://reviews.apache.org/r/59698/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59698: Allow custom OfferManager ordering to be injected via Guice modules

2017-06-01 Thread David McLaughlin


> On June 1, 2017, 8:52 p.m., Jordan Ly wrote:
> > src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
> > Lines 121-123 (patched)
> > 
> >
> > Maybe move to configure

It can't be in the PrivateModule as it needs to be exposed for the 
OfferSettings provider.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59698/#review176658
---


On June 1, 2017, 12:14 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59698/
> ---
> 
> (Updated June 1, 2017, 12:14 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Given how powerful a global ordering of offers is, it's worth allowing custom 
> orderings to be injected with a CLI-provided Guice module. This is consistent 
> with the other custom scheduling logic work.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 96acaf9ab7479c818a1f89e0faa33aad971bb5d4 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> ee80176e81a997a1361c036c9fbda6d120109a27 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e999ac5faee98333e69e964073000a2a9481d081 
> 
> 
> Diff: https://reviews.apache.org/r/59698/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review17
---



LGTM, one small question for my own understanding


src/main/java/org/apache/aurora/scheduler/events/Webhook.java
Lines 111-113 (patched)


What are the implications of a task failing to change?


- Jordan Ly


On June 1, 2017, 6:33 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated June 1, 2017, 6:33 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59698: Allow custom OfferManager ordering to be injected via Guice modules

2017-06-01 Thread Reza Motamedi

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59698/#review176660
---


Fix it, then Ship it!




Getting used to Guice here... but overall looks good to me :).


src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
Lines 121-123 (patched)


So, these installed modules are just needed by the private-module, right?


- Reza Motamedi


On June 1, 2017, 12:14 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59698/
> ---
> 
> (Updated June 1, 2017, 12:14 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Given how powerful a global ordering of offers is, it's worth allowing custom 
> orderings to be injected with a CLI-provided Guice module. This is consistent 
> with the other custom scheduling logic work.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 96acaf9ab7479c818a1f89e0faa33aad971bb5d4 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> ee80176e81a997a1361c036c9fbda6d120109a27 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e999ac5faee98333e69e964073000a2a9481d081 
> 
> 
> Diff: https://reviews.apache.org/r/59698/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59699: Improve task history pruning by batch deleting tasks

2017-06-01 Thread Kai Huang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59699/
---

(Updated June 1, 2017, 9:13 p.m.)


Review request for Aurora, David McLaughlin and Santhosh Kumar.


Changes
---

- 1. Do a batch delete of all tasks to be pruned.

- 2. Rename the deleteTasks method to createDeleteEvent.


Bugs: AURORA-1929
https://issues.apache.org/jira/browse/AURORA-1929


Repository: aurora


Description
---

Improve task history pruning by batch deleting tasks.

The `'aurora_admin prune_tasks'` endpoint seems to be very slow when the 
cluster has a large number of inactive tasks.

This CR batches all removeTasks operations and execute them all at once to 
avoid additional cost of coalescing. The fix will also benefit implicit task 
history pruning since it has similar underlying implementation. See 
https://issues.apache.org/jira/browse/AURORA-1929 for more information and 
details.


Diffs (updated)
-

  src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java 
73878210f9028901fda3b08e66c6a63c24260d35 


Diff: https://reviews.apache.org/r/59699/diff/2/

Changes: https://reviews.apache.org/r/59699/diff/1-2/


Testing
---

./build-support/jenkins/build.sh


Thanks,

Kai Huang



Re: Review Request 59698: Allow custom OfferManager ordering to be injected via Guice modules

2017-06-01 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59698/#review176658
---


Fix it, then Ship it!




Small refactor but otherwise LGTM


src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java
Lines 121-123 (patched)


Maybe move to configure


- Jordan Ly


On June 1, 2017, 12:14 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59698/
> ---
> 
> (Updated June 1, 2017, 12:14 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham and Stephan Erb.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Given how powerful a global ordering of offers is, it's worth allowing custom 
> orderings to be injected with a CLI-provided Guice module. This is consistent 
> with the other custom scheduling logic work.
> 
> 
> Diffs
> -
> 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferManager.java 
> 96acaf9ab7479c818a1f89e0faa33aad971bb5d4 
>   src/main/java/org/apache/aurora/scheduler/offers/OfferSettings.java 
> ee80176e81a997a1361c036c9fbda6d120109a27 
>   src/main/java/org/apache/aurora/scheduler/offers/OffersModule.java 
> e999ac5faee98333e69e964073000a2a9481d081 
> 
> 
> Diff: https://reviews.apache.org/r/59698/diff/1/
> 
> 
> Testing
> ---
> 
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread David McLaughlin


> On June 1, 2017, 5:35 p.m., Zameer Manji wrote:
> > build.gradle
> > Lines 121 (patched)
> > 
> >
> > I don't see the code use netty, so why do we need this?

The async-http-client uses netty, and gradle was complaining about a dependency 
conflict within this package.


- David


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review176631
---


On June 1, 2017, 6:33 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated June 1, 2017, 6:33 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review176631
---



otherwise this LGTM.


build.gradle
Lines 121 (patched)


I don't see the code use netty, so why do we need this?


- Zameer Manji


On May 31, 2017, 11:33 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated May 31, 2017, 11:33 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread Santhosh Kumar Shanmugham

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review176627
---


Ship it!




Ship It!

- Santhosh Kumar Shanmugham


On May 31, 2017, 11:33 p.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated May 31, 2017, 11:33 p.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Re: Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/#review176584
---


Ship it!




Master (e76862a) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On June 1, 2017, 6:33 a.m., David McLaughlin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59703/
> ---
> 
> (Updated June 1, 2017, 6:33 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Bugs: AURORA-1773
> https://issues.apache.org/jira/browse/AURORA-1773
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Current code uses a synchronous HTTP client, which can block the EventBus. 
> Switch to an async HTTP client.
> 
> 
> Diffs
> -
> 
>   build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
>   src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
> 3868779986285ac302d028f8713f683192951b83 
>   src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
> 1f10af71830386652d21961b733bd0927c5436a1 
>   src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
> e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 
> 
> 
> Diff: https://reviews.apache.org/r/59703/diff/1/
> 
> 
> Testing
> ---
> 
> Enabled webhooks in Vagrant and verified message received:
> 
> POST / HTTP/1.1
> Timestamp: 1496298562793
> Content-Length: 2570
> Host: 192.168.33.7:5100
> Accept: */*
> User-Agent: AHC/2.0
> 
> {"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
>  \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
> {\"expected_response_code\": 0, \"endpoint\": \"/health\", 
> \"expected_response\": \"ok\"}}, \"min_consecuti
 ve_successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
> m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": 
> [\"fetch_package\", \"hello_world\"]}]}, \"production\": false, \"role\": 
> \"www-data\", \"tier\": \"preemptible\", \"lifecycle\": {\"http\": 
> {\"graceful_shutdown_endpoint\": \"/quitquitquit\", \"port\": \"health\", 
> \"shutdown_endpoint\": \"/abortabortabort\"}}, \"priority\": 
> 0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>



Review Request 59703: Use async HTTP for Web Hooks.

2017-06-01 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59703/
---

Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
Manji.


Bugs: AURORA-1773
https://issues.apache.org/jira/browse/AURORA-1773


Repository: aurora


Description
---

Current code uses a synchronous HTTP client, which can block the EventBus. 
Switch to an async HTTP client.


Diffs
-

  build.gradle 4802d5e552b978338b037326eae85e193a7eb2d1 
  src/main/java/org/apache/aurora/scheduler/events/Webhook.java 
3868779986285ac302d028f8713f683192951b83 
  src/main/java/org/apache/aurora/scheduler/events/WebhookModule.java 
1f10af71830386652d21961b733bd0927c5436a1 
  src/test/java/org/apache/aurora/scheduler/events/WebhookTest.java 
e8335d9b78dbf30bf3ae08b6bdd02018cea76f6b 


Diff: https://reviews.apache.org/r/59703/diff/1/


Testing
---

Enabled webhooks in Vagrant and verified message received:

POST / HTTP/1.1
Timestamp: 1496298562793
Content-Length: 2570
Host: 192.168.33.7:5100
Accept: */*
User-Agent: AHC/2.0

{"task":{"cachedHashCode":0,"assignedTask":{"cachedHashCode":0,"taskId":"www-data-devel-hello_world-0-11fb2654-efd7-4411-84d7-ec8e0ed485d9","task":{"cachedHashCode":371132471,"job":{"cachedHashCode":1193662568,"role":"www-data","environment":"devel","name":"hello_world"},"owner":{"cachedHashCode":226895216,"user":"vagrant"},"isService":true,"numCpus":0.0,"ramMb":0,"diskMb":0,"priority":0,"maxTaskFailures":1,"production":false,"tier":"preemptible","resources":[{"cachedHashCode":-367894814,"setField":"NUM_CPUS","value":1.0},{"cachedHashCode":445972424,"setField":"RAM_MB","value":1},{"cachedHashCode":-2018031677,"setField":"DISK_MB","value":8}],"constraints":[],"requestedPorts":[],"mesosFetcherUris":[],"taskLinks":{},"executorConfig":{"cachedHashCode":-1711220736,"name":"AuroraExecutor","data":"{\"environment\":
 \"devel\", \"health_check_config\": {\"health_checker\": {\"http\": 
{\"expected_response_code\": 0, \"endpoint\": \"/health\", 
\"expected_response\": \"ok\"}}, \"min_consecutive
 _successes\": 1, \"initial_interval_secs\": 15.0, 
\"max_consecutive_failures\": 0, \"timeout_secs\": 1.0, \"interval_secs\": 
10.0}, \"name\": \"hello_world\", \"service\": true, \"max_task_failures\": 1, 
\"cron_collision_policy\": \"KILL_EXISTING\", \"enable_hooks\": false, 
\"cluster\": \"devcluster\", \"task\": {\"processes\": [{\"daemon\": false, 
\"name\": \"fetch_package\", \"ephemeral\": false, \"max_failures\": 1, 
\"min_duration\": 5, \"cmdline\": \"cp /vagrant/hello_world.py . \u0026\u0026 
echo a146647a4293aef6ae45c6d5699c1f96 \u0026\u0026 chmod +x hello_world.py\", 
\"final\": false}, {\"daemon\": false, \"name\": \"hello_world\", 
\"ephemeral\": false, \"max_failures\": 1, \"min_duration\": 5, \"cmdline\": 
\"python -u hello_world.py\", \"final\": false}], \"name\": \"fetch_package\", 
\"finalization_wait\": 30, \"max_failures\": 1, \"max_concurrency\": 0, 
\"resources\": {\"gpu\": 0, \"disk\": 8388608, \"ra
m\": 1048576, \"cpu\": 1.0}, \"constraints\": [{\"order\": [\"fetch_package\", 
\"hello_world\"]}]}, \"production\": false, \"role\": \"www-data\", \"tier\": 
\"preemptible\", \"lifecycle\": {\"http\": {\"graceful_shutdown_endpoint\": 
\"/quitquitquit\", \"port\": \"health\", \"shutdown_endpoint\": 
\"/abortabortabort\"}}, \"priority\": 
0}"},"metadata":[],"container":{"cachedHashCode":-270656463,"setField":"MESOS","value":{"cachedHashCode":962,"volumes":[]}}},"assignedPorts":{},"instanceId":0},"status":"PENDING","failureCount":0,"taskEvents":[{"cachedHashCode":0,"timestamp":1496298562764,"status":"PENDING","scheduler":"aurora"}]},"oldState":{}}


Thanks,

David McLaughlin