Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-27 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review150590
---


Ship it!




Master (60e5e4e) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 27, 2016, 6:06 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 27, 2016, 6:06 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>  

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-27 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 27, 2016, 6:06 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Joshua's comments.


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
6f1cbfbc4510a037cffc95fee54f62f463d2b534 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
664bc6cf964ede2473a4463e58bcdbcb65bc7413 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
d390c07522d22e43d79ce4370985f3643ef021ca 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
207d38d1ddfd373892602218a98c1daaf4a1325f 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
88729626de5fa87b45472792c59cc0ff1ade3e93 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
a4e87d2216401f344dca64d69b945de7bcf8159a 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-27 Thread Maxim Khutornenko


> On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java, line 55
> > 
> >
> > javadoc might be useful here to make it clear that the collection is a 
> > list of task ids to remove? (or maybe just rename param to `taskIds`?)

Renamed the param.


> On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 127
> > 
> >
> > I know we have the `@Positive` annotation on the arg, but for the sake 
> > of safety (it's possible to initialize these settings from any source) it's 
> > probably worth ensuring `maxTasksPerScheduler` is > 0 here?
> > 
> > If we do that, it might also make sense to move the `checkArgument` on 
> > `firstScheduleDelay` from the constructor below into this constructor?

Good point. Done and done.


> On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 143
> > 
> >
> > If we're going to store settings as an instance variable, we should 
> > reference the other settings from it as well (and kill off those 
> > variables), alternately just store `maxTasksPerScheduler` rather than 
> > settings itself)?

Agree, there are a few refactoring waves colliding here and it's definitely 
time to simplify/remove duplication. Done.


> On Sept. 23, 2016, 7:52 p.m., Joshua Cohen wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 172
> > 
> >
> > If we're just assuming `taskId` is present, why wrap it in an optional? 
> > (Presumably on the initial pass through the loop we know `taskIds` was not 
> > empty, and on each subsequent pass we check `remainingTasks.hasNext` before 
> > re-assigning `taskId`). Am I missing something?

Good catch. There was a legitimate use of the `Optional` in the 
original diff but it's no longer needed after we changed to return `Set`.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review150225
---


On Sept. 20, 2016, 10:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 10:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-23 Thread Joshua Cohen

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review150225
---



lgtm overall. Only potential blocker is the last comment (which may or may not 
actually be a problem).


src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java (line 54)


javadoc might be useful here to make it clear that the collection is a list 
of task ids to remove? (or maybe just rename param to `taskIds`?)



src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (line 126)


I know we have the `@Positive` annotation on the arg, but for the sake of 
safety (it's possible to initialize these settings from any source) it's 
probably worth ensuring `maxTasksPerScheduler` is > 0 here?

If we do that, it might also make sense to move the `checkArgument` on 
`firstScheduleDelay` from the constructor below into this constructor?



src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (line 142)


If we're going to store settings as an instance variable, we should 
reference the other settings from it as well (and kill off those variables), 
alternately just store `maxTasksPerScheduler` rather than settings itself)?



src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java (line 171)


If we're just assuming `taskId` is present, why wrap it in an optional? 
(Presumably on the initial pass through the loop we know `taskIds` was not 
empty, and on each subsequent pass we check `remainingTasks.hasNext` before 
re-assigning `taskId`). Am I missing something?


- Joshua Cohen


On Sept. 20, 2016, 10:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 10:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-22 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review150106
---


Ship it!




Ship It!

- Zameer Manji


On Sept. 20, 2016, 3:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 3:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  a4e87d2216401f344dca64d69b945de7bcf8159a 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-22 Thread Maxim Khutornenko


> On Sept. 22, 2016, 10:34 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java, 
> > line 114
> > 
> >
> > Is there a way of doing this without making `AttributeAggregate` 
> > explicitly mutable?
> > 
> > I don't like how we have to update this from the offer out of band 
> > instead of just refreshing from storage.
> > 
> > Why can't we just have a `refresh` method which re-runs the original 
> > query after we consume the offer?
> > 
> > I'm weary of bugs where we forget to call this method after scheduling 
> > with an offer.

The whole purpose of having the `AttributeAggregate` is to avoid reconstructing 
the job state every time a view into attributes is required. The current 
approach saves an expensive (especially in the `DBTaskStore case` and a large 
job combo) fetch across task and attribute stores. The underlying data 
structure of the `AttributeAggregate` is just a Multiset of host attributes and 
their values. Throwing all that data away and reconstructing it again just to 
have a new host representation added is very wasteful, especially given that we 
already have all new attributes handy.

As for not forgetting to call it, I don't see how reconstructing would be any 
better in that regard as we would still have to remember to reconstruct it out 
of band. The `TaskAssignerImplTest.testAssignPartialNoVetoes` covers 
`AttributeAggregate` update case by the way.


> On Sept. 22, 2016, 10:34 p.m., Zameer Manji wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 179
> > 
> >
> > We don't check if this is null later, so why not just make it an empty 
> > `ImmutableSet` to start?

I don't see any benefits in that. This is the classic assign-or-throw pattern. 
The `result.get()` returns a Set and we don't pass nulls for collections.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review150085
---


On Sept. 20, 2016, 10:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 10:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-22 Thread Zameer Manji

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review150085
---




src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java (line 
114)


Is there a way of doing this without making `AttributeAggregate` explicitly 
mutable?

I don't like how we have to update this from the offer out of band instead 
of just refreshing from storage.

Why can't we just have a `refresh` method which re-runs the original query 
after we consume the offer?

I'm weary of bugs where we forget to call this method after scheduling with 
an offer.



src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java (line 47)


Nice use of the new `Stream` API. Very clean and easy to understand.



src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (line 172)


We don't check if this is null later, so why not just make it an empty 
`ImmutableSet` to start?


- Zameer Manji


On Sept. 20, 2016, 3:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 3:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-22 Thread Zameer Manji


> On Sept. 20, 2016, 11:49 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java, 
> > lines 93-96
> > 
> >
> > Regarding your notes in the RB description: I don't see a problem if we 
> > set this to a slightly higher value such as `10` or `15`. 
> > 
> > It seems like we will maintain the basic taskgroup round robin 
> > scheduling fairness even with slightly larger batch sizes, so I am ok with 
> > bumping the value.
> 
> Maxim Khutornenko wrote:
> The higher this count is the less fair do we treat lower instance count 
> jobs in case the capacity of the cluster is constrained. It's hard to come up 
> with an ideal default here but I feel 10-15 would be a bit too extreme for 
> the general case.

+1

I think 5 is suitable as well.


- Zameer


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149681
---


On Sept. 20, 2016, 3:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 3:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149747
---


Ship it!




Ship It!

- Stephan Erb


On Sept. 21, 2016, 12:02 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 21, 2016, 12:02 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  a4e87d2216401f344dca64d69b945de7bcf8159a 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149744
---


Ship it!




Master (8432894) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 20, 2016, 10:02 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 20, 2016, 10:02 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Maxim Khutornenko


> On Sept. 20, 2016, 6:49 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java, 
> > lines 93-96
> > 
> >
> > Regarding your notes in the RB description: I don't see a problem if we 
> > set this to a slightly higher value such as `10` or `15`. 
> > 
> > It seems like we will maintain the basic taskgroup round robin 
> > scheduling fairness even with slightly larger batch sizes, so I am ok with 
> > bumping the value.

The higher this count is the less fair do we treat lower instance count jobs in 
case the capacity of the cluster is constrained. It's hard to come up with an 
ideal default here but I feel 10-15 would be a bit too extreme for the general 
case.


> On Sept. 20, 2016, 6:49 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, lines 
> > 174-175
> > 
> >
> > If I understand things correctly, I believe this line could have a 
> > performance bug in it:
> > 
> > `batchWorker.execute` acquires the global storage lock before calling 
> > the `taskScheduler`. For the latter, we use the following definition:
> > 
> > ```
> > this.taskScheduler = (store, taskIds) -> {
> >   settings.rateLimiter.acquire();
> >   return taskScheduler.schedule(store, taskIds);
> > };
> > ```
> > 
> > In combination, we will be throttled by the `rateLimiter` while holding 
> > the storage lock.
> > 
> > Instead, we should try to acquire the log within the `Runnable` in 
> > `startGroup` before calling `batchWorker.execute` so that the global lock 
> > is not hold longer than absolutely necessary.

This is a valid concern. It's certainly more possible than before to hit our 
default 40 tasks per second limit now. Fixed.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149681
---


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 20, 2016, 10:02 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Stephan's comments.


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
6f1cbfbc4510a037cffc95fee54f62f463d2b534 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
664bc6cf964ede2473a4463e58bcdbcb65bc7413 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
d390c07522d22e43d79ce4370985f3643ef021ca 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
207d38d1ddfd373892602218a98c1daaf4a1325f 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
88729626de5fa87b45472792c59cc0ff1ade3e93 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
a4e87d2216401f344dca64d69b945de7bcf8159a 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Maxim Khutornenko


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 197
> > 
> >
> > Side show: Isn't that `if` unnecessary here and we can adjust the 
> > penality in any case? We will remove the group if `hasMore()` returns 
> > false, so any penality should be fine.
> 
> Maxim Khutornenko wrote:
> Not sure I follow. This is the place that applies penalty accrued inside 
> the `startGroup()` or removes the group if it's empty.
> 
> Stephan Erb wrote:
> Let me print some more code to make clearer what I meant.
> 
> This is the code where we compute penaltyMs depending on `group.hasMore`
> ```
> scheduledTaskPenalties.accumulate(group.getPenaltyMs());
> group.remove(scheduled);
> if (group.hasMore()) {
>   penaltyMs = firstScheduleDelay;
> }
>   }
> }
> 
> group.setPenaltyMs(penaltyMs);
> evaluateGroupLater(this, group);
> ```
> 
> 
> Later on we then drop empty groups in `evaluateGroupLater`:
> ```
>   private synchronized void evaluateGroupLater(Runnable evaluate, 
> TaskGroup group) {
> // Avoid check-then-act by holding the intrinsic lock.  If not done 
> atomically, we could
> // remove a group while a task is being added to it.
> if (group.hasMore()) {
>   executor.execute(evaluate, Amount.of(group.getPenaltyMs(), 
> Time.MILLISECONDS));
> } else {
>   groups.remove(group.getKey());
> }
>   }
> ```
> 
> What I tried to say is that we could unconditionally write `penaltyMs = 
> firstScheduleDelay;` without the `if (group.hasMore())` in the first snippet. 
> If `group.hasMore()` returns false we will remove the group anyway, so it 
> does not matter if we set a new penality or not.
> 
> This is completely unreleated to the change in this RB so feel free to 
> ignore it. It was more about checking my own understanding of the code.

I see what you meant now. The slight problem with the simplification you 
propose is that it _may_ result in a penalty where it would not happen today: 
`startGroup()` calculates the penalty even though there are no more groups left 
but the call to `evaluateGroupLater()` is delayed for some reason AND a new 
task is added into the group thus recharging it. The penalty would be applied 
the moment `evaluateGroupLater()` is finally reached. It's certainly an edge 
case but I'd prefer not changing the current behavior in the this patch.


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
---


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Stephan Erb


> On Sept. 16, 2016, 11:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 197
> > 
> >
> > Side show: Isn't that `if` unnecessary here and we can adjust the 
> > penality in any case? We will remove the group if `hasMore()` returns 
> > false, so any penality should be fine.
> 
> Maxim Khutornenko wrote:
> Not sure I follow. This is the place that applies penalty accrued inside 
> the `startGroup()` or removes the group if it's empty.

Let me print some more code to make clearer what I meant.

This is the code where we compute penaltyMs depending on `group.hasMore`
```
scheduledTaskPenalties.accumulate(group.getPenaltyMs());
group.remove(scheduled);
if (group.hasMore()) {
  penaltyMs = firstScheduleDelay;
}
  }
}

group.setPenaltyMs(penaltyMs);
evaluateGroupLater(this, group);
```


Later on we then drop empty groups in `evaluateGroupLater`:
```
  private synchronized void evaluateGroupLater(Runnable evaluate, TaskGroup 
group) {
// Avoid check-then-act by holding the intrinsic lock.  If not done 
atomically, we could
// remove a group while a task is being added to it.
if (group.hasMore()) {
  executor.execute(evaluate, Amount.of(group.getPenaltyMs(), 
Time.MILLISECONDS));
} else {
  groups.remove(group.getKey());
}
  }
```

What I tried to say is that we could unconditionally write `penaltyMs = 
firstScheduleDelay;` without the `if (group.hasMore())` in the first snippet. 
If `group.hasMore()` returns false we will remove the group anyway, so it does 
not matter if we set a new penality or not.

This is completely unreleated to the change in this RB so feel free to ignore 
it. It was more about checking my own understanding of the code.


> On Sept. 16, 2016, 11:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 208-213
> > 
> >
> > I would have expected that we only request preemption if we failed to 
> > schedule all tasks in the current group.  If I remember correctly, 
> > preemption slot search only happens on a per-group basis anyway.
> > 
> > You have probably thought about this, so I would like to understand 
> > your reasoning.
> 
> Maxim Khutornenko wrote:
> We still want to attempt a preemption if _some_ but not _all_ tasks are 
> scheduled within a given round, right? Otherwise, preemption becomes an 
> all-or-nothing feature and we have to wait for another scheduling cycle to 
> request a reservation.

Ok I see.


> On Sept. 16, 2016, 11:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 174
> > 
> >
> > Shouldn't that only happen if `launchTask` has succeeded?
> 
> Maxim Khutornenko wrote:
> I've debated that as well and decided it's more logical to finish 
> accessing offer details before it's being launched with and removed from the 
> `OfferManager`.
> 
> BTW, I just realized I was missing a `break` statement to bail out from 
> the scheduling round if a `LaunchException` is thrown. While theoretically we 
> _could_ continue matching even after the `LaunchException`, it's hard to 
> reason about the state of the storage and the last assigned item and as such 
> it's safer to terminate than continue. Fixed and added a test case. This 
> should now be resolved. Thanks for asking :)

Thanks for the explanation. Makes sense to me.


- Stephan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
---


On Sept. 16, 2016, 11:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 11:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-20 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149681
---




src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
(lines 93 - 96)


Regarding your notes in the RB description: I don't see a problem if we set 
this to a slightly higher value such as `10` or `15`. 

It seems like we will maintain the basic taskgroup round robin scheduling 
fairness even with slightly larger batch sizes, so I am ok with bumping the 
value.



src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (lines 173 
- 174)


If I understand things correctly, I believe this line could have a 
performance bug in it:

`batchWorker.execute` acquires the global storage lock before calling the 
`taskScheduler`. For the latter, we use the following definition:

```
this.taskScheduler = (store, taskIds) -> {
  settings.rateLimiter.acquire();
  return taskScheduler.schedule(store, taskIds);
};
```

In combination, we will be throttled by the `rateLimiter` while holding the 
storage lock.

Instead, we should try to acquire the log within the `Runnable` in 
`startGroup` before calling `batchWorker.execute` so that the global lock is 
not hold longer than absolutely necessary.


- Stephan Erb


On Sept. 16, 2016, 11:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 11:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>  

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149294
---


Ship it!




Master (a87ad41) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>  

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149290
---



@ReviewBot retry

- Maxim Khutornenko


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> 87b9e1928ab2d44668df1123f32ffdc4197c0c70 
>   src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
> 664bc6cf964ede2473a4463e58bcdbcb65bc7413 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
> 5d319557057e27fd5fc6d3e553e9ca9139399c50 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
> d390c07522d22e43d79ce4370985f3643ef021ca 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 207d38d1ddfd373892602218a98c1daaf4a1325f 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> 7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
>   
> src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
>  ece476b918e6f2c128039e561eea23a94d8ed396 
>   
> src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
> 209f9298a1d55207b9b41159f2ab366f92c1eb70 
>   
> src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java
>  0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
>   src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
> c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
>  ee5c6528af89cc62a35fdb314358c489556d8131 
>   src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
> 98048fabc00f233925b6cca015c2525980556e2b 
>   
> src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
> 2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
>   src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
> 88729626de5fa87b45472792c59cc0ff1ade3e93 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  a4e87d2216401f344dca64d69b945de7bcf8159a 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149288
---



Master (496397a) is red with this patch.
  ./build-support/jenkins/build.sh


:processTestResources
:testClasses
:checkstyleTest
:findbugsJmhThe message received from the daemon indicates that the daemon has 
disappeared.
Build request sent: Build{id=04b1c215-9a35-4563-8eac-3fc8439bfb48.1, 
currentDir=/home/jenkins/jenkins-slave/workspace/AuroraBot}
Attempting to read last messages from the daemon log...
Daemon pid: 23279
  log file: /home/jenkins/.gradle/daemon/2.14/daemon-23279.out.log
- Last  20 lines from daemon log file - daemon-23279.out.log -
:distZip
:assemble
:compileJmhJavaNote: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/src/jmh/java/org/apache/aurora/benchmark/fakes/FakeSchedulerDriver.java
 uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processJmhResources UP-TO-DATE
:jmhClasses
:checkstyleJmh
:jsHint
:checkstyleMain
:compileTestJava/home/jenkins/jenkins-slave/workspace/AuroraBot/src/test/java/org/apache/aurora/scheduler/thrift/aop/MockDecoratedThrift.java:38:
 Note: Wrote forwarder 
org.apache.aurora.scheduler.thrift.aop.MockDecoratedThriftForwarder
@Forward(AnnotatedAuroraAdmin.class)
^
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

:processTestResources
:testClasses
:checkstyleTest
:findbugsJmhDaemon vm is shutting down... The daemon has exited normally or was 
terminated in response to a user interrupt.
- End of the daemon log -


FAILURE: Build failed with an exception.

* What went wrong:
Gradle build daemon disappeared unexpectedly (it may have been killed or may 
have crashed)

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 9:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 9:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 6f1cbfbc4510a037cffc95fee54f62f463d2b534 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 16, 2016, 9:53 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Repository: aurora


Description (updated)
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```


Diffs
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
6f1cbfbc4510a037cffc95fee54f62f463d2b534 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
664bc6cf964ede2473a4463e58bcdbcb65bc7413 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
d390c07522d22e43d79ce4370985f3643ef021ca 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
207d38d1ddfd373892602218a98c1daaf4a1325f 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
88729626de5fa87b45472792c59cc0ff1ade3e93 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
a4e87d2216401f344dca64d69b945de7bcf8159a 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 16, 2016, 9:53 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Rebased over master.


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```

NOTE: this will not apply cleanly as it branched off of 
https://reviews.apache.org/r/51765, which itself depends on 
https://reviews.apache.org/r/51759/.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
6f1cbfbc4510a037cffc95fee54f62f463d2b534 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
664bc6cf964ede2473a4463e58bcdbcb65bc7413 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
d390c07522d22e43d79ce4370985f3643ef021ca 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
207d38d1ddfd373892602218a98c1daaf4a1325f 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c1c3eca4a6e6c88dab6b1c69fae3e2f290b58039 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
88729626de5fa87b45472792c59cc0ff1ade3e93 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
a4e87d2216401f344dca64d69b945de7bcf8159a 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko



Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149260
---



Master (783baae) is red with this patch.
  ./build-support/jenkins/build.sh

 # Create file stdout for capturing output. We 
can't use StringIO mock
 # because TestProcess is running fork.
 with open(os.path.join(td, 'sys_stdout'), 
'w+') as stdout:
   with open(os.path.join(td, 'sys_stderr'), 
'w+') as stderr:
 with mutable_sys():
   sys.stdout, sys.stderr = stdout, 
stderr
 
   p = TestProcess('process', 'echo hello 
world; echo >&2 hello stderr', 0,
   taskpath, sandbox, 
logger_destination=LoggerDestination.BOTH)
   p.start()
   rc = 
wait_for_rc(taskpath.getpath('process_checkpoint'))
 
   assert rc == 0
   # Check log files were created in std 
path with correct content
 > assert_log_content(taskpath, 'stdout', 
'hello world\n')
 
 src/test/python/apache/thermos/core/test_process.py:487: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 
 taskpath = 
 log_name = 'stdout'
 expected_content = 'hello world\n'
 
 def assert_log_content(taskpath, log_name, 
expected_content):
   log = 
taskpath.with_filename(log_name).getpath('process_logdir')
   assert os.path.exists(log)
   with open(log, 'r') as fp:
 >   assert fp.read() == expected_content
 E   assert '' == 'hello world\n'
 E + hello world
 
 src/test/python/apache/thermos/core/test_process.py:313: 
AssertionError
  generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml
 
  1 failed, 710 passed, 6 skipped, 1 warnings in 
381.34 seconds 
 
FAILURE


20:19:36 07:12   [complete]
   FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 7:53 p.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 7:53 p.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

(Updated Sept. 16, 2016, 7:53 p.m.)


Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Changes
---

Stephan's comments. The `TaskAssigner` now returns only assigned task IDs. The 
`TaskScheduler` returns both assigned AND those that should be removed from the 
`TaskGroup` (e.g.: not found or not PENDING).


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```

NOTE: this will not apply cleanly as it branched off of 
https://reviews.apache.org/r/51765, which itself depends on 
https://reviews.apache.org/r/51759/.


Diffs (updated)
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
9d0d40b82653fb923bed16d06546288a1576c21d 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
11e8033438ad0808e446e41bb26b3fa4c04136c7 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
c044ebe6f72183a67462bbd8e5be983eb592c3e9 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
d266f6a25ae2360db2977c43768a19b1f1efe8ff 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c2ceb4e7685a9301f8014a9183e02fbad65bca26 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
72562e6bd9a9860c834e6a9faa094c28600a8fed 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Maxim Khutornenko


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java, line 
> > 197
> > 
> >
> > Side show: Isn't that `if` unnecessary here and we can adjust the 
> > penality in any case? We will remove the group if `hasMore()` returns 
> > false, so any penality should be fine.

Not sure I follow. This is the place that applies penalty accrued inside the 
`startGroup()` or removes the group if it's empty.


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java, 
> > lines 208-213
> > 
> >
> > I would have expected that we only request preemption if we failed to 
> > schedule all tasks in the current group.  If I remember correctly, 
> > preemption slot search only happens on a per-group basis anyway.
> > 
> > You have probably thought about this, so I would like to understand 
> > your reasoning.

We still want to attempt a preemption if _some_ but not _all_ tasks are 
scheduled within a given round, right? Otherwise, preemption becomes an 
all-or-nothing feature and we have to wait for another scheduling cycle to 
request a reservation.


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, line 174
> > 
> >
> > Shouldn't that only happen if `launchTask` has succeeded?

I've debated that as well and decided it's more logical to finish accessing 
offer details before it's being launched with and removed from the 
`OfferManager`.

BTW, I just realized I was missing a `break` statement to bail out from the 
scheduling round if a `LaunchException` is thrown. While theoretically we 
_could_ continue matching even after the `LaunchException`, it's hard to reason 
about the state of the storage and the last assigned item and as such it's 
safer to terminate than continue. Fixed and added a test case. This should now 
be resolved. Thanks for asking :)


> On Sept. 16, 2016, 9:08 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java, lines 
> > 214-220
> > 
> >
> > There is additional complexity here and in other places because 
> > `schedule`, `scheduleTask` and `maybeAssign` return a `Map > Boolean>` encoding if scheduling was successful for all given tasks.
> > 
> > I'd propose to change the signature so that all methods mentioned above 
> > return `Set` containing only the successfully launched tasks. 
> > 
> > If a caller is really interested in tasks failed to schedule, he can 
> > compute the set difference between passed and returned task ids.

I was fighting with myself over this initially and decided to keep the original 
contract. Now that the aproach fully shaped up, I can see how it can be 
simplified by not returning the exhaustive map of things. Refactored to return 
only what's needed. Thanks for bringing this up!


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
---


On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 12:51 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-16 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149179
---




src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java (line 196)


Side show: Isn't that `if` unnecessary here and we can adjust the penality 
in any case? We will remove the group if `hasMore()` returns false, so any 
penality should be fine.



src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java (lines 
132 - 133)






src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java (lines 
197 - 202)


I would have expected that we only request preemption if we failed to 
schedule all tasks in the current group.  If I remember correctly, preemption 
slot search only happens on a per-group basis anyway.

You have probably thought about this, so I would like to understand your 
reasoning.



src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java (line 173)


Shouldn't that only happen if `launchTask` has succeeded?



src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java (lines 211 - 
217)


There is additional complexity here and in other places because `schedule`, 
`scheduleTask` and `maybeAssign` return a `Map` encoding if 
scheduling was successful for all given tasks.

I'd propose to change the signature so that all methods mentioned above 
return `Set` containing only the successfully launched tasks. 

If a caller is really interested in tasks failed to schedule, he can 
compute the set difference between passed and returned task ids.


- Stephan Erb


On Sept. 16, 2016, 2:51 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 2:51 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only slightly more demanding (~8%). Both results though are MUCH 
> higher than the real cluster perf, which just confirms we are not bound by 
> CPU time here:
> 
> Master:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  17126.183 ± 488.425  ops/s
> ```
> 
> This patch:
> ```
> Benchmark
> Mode  Cnt  Score Error  Units
> SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
> thrpt   10  15838.051 ± 187.890  ops/s
> ```
> 
> NOTE: this will not apply cleanly as it branched off of 
> https://reviews.apache.org/r/51765, which itself depends on 
> https://reviews.apache.org/r/51759/.
> 
> 
> Diffs
> -
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 9d0d40b82653fb923bed16d06546288a1576c21d 
>   

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-15 Thread Maxim Khutornenko


> On Sept. 16, 2016, 1:20 a.m., Aurora ReviewBot wrote:
> > Master (783baae) is red with this patch.
> >   ./build-support/jenkins/build.sh
> > 
> >  # Create file stdout for capturing output. 
> > We can't use StringIO mock
> >  # because TestProcess is running fork.
> >  with open(os.path.join(td, 'sys_stdout'), 
> > 'w+') as stdout:
> >    with open(os.path.join(td, 
> > 'sys_stderr'), 'w+') as stderr:
> >  with mutable_sys():
> >    sys.stdout, sys.stderr = stdout, 
> > stderr
> >  
> >    p = TestProcess('process', 'echo 
> > hello world; echo >&2 hello stderr', 0,
> >    taskpath, sandbox, 
> > logger_destination=LoggerDestination.BOTH)
> >    p.start()
> >    rc = 
> > wait_for_rc(taskpath.getpath('process_checkpoint'))
> >  
> >    assert rc == 0
> >    # Check log files were created in 
> > std path with correct content
> >  > assert_log_content(taskpath, 
> > 'stdout', 'hello world\n')
> >  
> >  
> > src/test/python/apache/thermos/core/test_process.py:487: 
> >  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> >  
> >  taskpath =  > at 0x7fdd3cd73b10>
> >  log_name = 'stdout'
> >  expected_content = 'hello world\n'
> >  
> >  def assert_log_content(taskpath, log_name, 
> > expected_content):
> >    log = 
> > taskpath.with_filename(log_name).getpath('process_logdir')
> >    assert os.path.exists(log)
> >    with open(log, 'r') as fp:
> >  >   assert fp.read() == expected_content
> >  E   assert '' == 'hello world\n'
> >  E + hello world
> >  
> >  
> > src/test/python/apache/thermos/core/test_process.py:313: AssertionError
> >   generated xml file: 
> > /home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml
> >  
> >   1 failed, 710 passed, 6 skipped, 1 warnings 
> > in 226.09 seconds 
> >  
> > FAILURE
> > 
> > 
> > 01:19:57 04:18   [complete]
> >FAILURE
> > 
> > 
> > I will refresh this build result if you post a review containing 
> > "@ReviewBot retry"

@ReviewBot retry


- Maxim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149162
---


On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 12:51 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> 

Re: Review Request 51929: Scheduling multiple tasks per round.

2016-09-15 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/#review149162
---



Master (783baae) is red with this patch.
  ./build-support/jenkins/build.sh

 # Create file stdout for capturing output. We 
can't use StringIO mock
 # because TestProcess is running fork.
 with open(os.path.join(td, 'sys_stdout'), 
'w+') as stdout:
   with open(os.path.join(td, 'sys_stderr'), 
'w+') as stderr:
 with mutable_sys():
   sys.stdout, sys.stderr = stdout, 
stderr
 
   p = TestProcess('process', 'echo hello 
world; echo >&2 hello stderr', 0,
   taskpath, sandbox, 
logger_destination=LoggerDestination.BOTH)
   p.start()
   rc = 
wait_for_rc(taskpath.getpath('process_checkpoint'))
 
   assert rc == 0
   # Check log files were created in std 
path with correct content
 > assert_log_content(taskpath, 'stdout', 
'hello world\n')
 
 src/test/python/apache/thermos/core/test_process.py:487: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
 
 taskpath = 
 log_name = 'stdout'
 expected_content = 'hello world\n'
 
 def assert_log_content(taskpath, log_name, 
expected_content):
   log = 
taskpath.with_filename(log_name).getpath('process_logdir')
   assert os.path.exists(log)
   with open(log, 'r') as fp:
 >   assert fp.read() == expected_content
 E   assert '' == 'hello world\n'
 E + hello world
 
 src/test/python/apache/thermos/core/test_process.py:313: 
AssertionError
  generated xml file: 
/home/jenkins/jenkins-slave/workspace/AuroraBot/dist/test-results/415337499eb72578eab327a6487c1f5c9452b3d6.xml
 
  1 failed, 710 passed, 6 skipped, 1 warnings in 
226.09 seconds 
 
FAILURE


01:19:57 04:18   [complete]
   FAILURE


I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Sept. 16, 2016, 12:51 a.m., Maxim Khutornenko wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51929/
> ---
> 
> (Updated Sept. 16, 2016, 12:51 a.m.)
> 
> 
> Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> This is phase 2 of scheduling perf improvement effort started in 
> https://reviews.apache.org/r/51759/.
> 
> We can now take multiple (configurable) number of task IDs from a given 
> `TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
> and assign more than one task if possible. This approach delivers 
> substantially better MTTA and still ensures fairness across multiple 
> `TaskGroups`. We have observed almost linear improvement in MTTA (4x+ with 5 
> tasks per round), which suggest the `max_tasks_per_schedule_attempt` can be 
> set even higher if the majority of cluster jobs have large number of 
> instances and/or update batch sizes.
> 
> As far as a single round perf goes, we can consider the following 2 
> worst-case scenarios:
> - master: single task scheduling fails after trying all offers in the queue
> - this patch: N tasks launched with the very last N offers in the queue + `(N 
> x single_task_launch_latency)`
> 
> Assuming that matching N tasks against M offers takes exactly the same time 
> as 1 task against M offers (as they all share the same `TaskGroup`), the only 
> measurable difference comes from the additional `N x 
> single_task_launch_latency` overhead. Based on real cluster observations, the 
> `single_task_launch_latency` is less than 1% of a single task scheduling 
> attempt, which is << than the savings from avoided additional scheduling 
> rounds. 
> 
> As far as jmh results go, the new approach (batching + multiple tasks per 
> round) is only 

Review Request 51929: Scheduling multiple tasks per round.

2016-09-15 Thread Maxim Khutornenko

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51929/
---

Review request for Aurora, Joshua Cohen, Stephan Erb, and Zameer Manji.


Repository: aurora


Description
---

This is phase 2 of scheduling perf improvement effort started in 
https://reviews.apache.org/r/51759/.

We can now take multiple (configurable) number of task IDs from a given 
`TaskGroup` per scheduling. The idea is to go deeper through the offer queue 
and assign more than one task if possible. This approach delivers substantially 
better MTTA and still ensures fairness across multiple `TaskGroups`. We have 
observed almost linear improvement in MTTA (4x+ with 5 tasks per round), which 
suggest the `max_tasks_per_schedule_attempt` can be set even higher if the 
majority of cluster jobs have large number of instances and/or update batch 
sizes.

As far as a single round perf goes, we can consider the following 2 worst-case 
scenarios:
- master: single task scheduling fails after trying all offers in the queue
- this patch: N tasks launched with the very last N offers in the queue + `(N x 
single_task_launch_latency)`

Assuming that matching N tasks against M offers takes exactly the same time as 
1 task against M offers (as they all share the same `TaskGroup`), the only 
measurable difference comes from the additional `N x 
single_task_launch_latency` overhead. Based on real cluster observations, the 
`single_task_launch_latency` is less than 1% of a single task scheduling 
attempt, which is << than the savings from avoided additional scheduling 
rounds. 

As far as jmh results go, the new approach (batching + multiple tasks per 
round) is only slightly more demanding (~8%). Both results though are MUCH 
higher than the real cluster perf, which just confirms we are not bound by CPU 
time here:

Master:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  17126.183 ± 488.425  ops/s
```

This patch:
```
Benchmark
Mode  Cnt  Score Error  Units
SchedulingBenchmarks.InsufficientResourcesSchedulingBenchmark.runBenchmark  
thrpt   10  15838.051 ± 187.890  ops/s
```

NOTE: this will not apply cleanly as it branched off of 
https://reviews.apache.org/r/51765, which itself depends on 
https://reviews.apache.org/r/51759/.


Diffs
-

  src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
9d0d40b82653fb923bed16d06546288a1576c21d 
  src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
87b9e1928ab2d44668df1123f32ffdc4197c0c70 
  src/main/java/org/apache/aurora/scheduler/scheduling/SchedulingModule.java 
11e8033438ad0808e446e41bb26b3fa4c04136c7 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroup.java 
5d319557057e27fd5fc6d3e553e9ca9139399c50 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskGroups.java 
c044ebe6f72183a67462bbd8e5be983eb592c3e9 
  src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
d266f6a25ae2360db2977c43768a19b1f1efe8ff 
  src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
7f7b4358ef05c0f0d0e14daac1a5c25488467dc9 
  
src/test/java/org/apache/aurora/scheduler/events/NotifyingSchedulingFilterTest.java
 ece476b918e6f2c128039e561eea23a94d8ed396 
  src/test/java/org/apache/aurora/scheduler/filter/AttributeAggregateTest.java 
209f9298a1d55207b9b41159f2ab366f92c1eb70 
  
src/test/java/org/apache/aurora/scheduler/filter/SchedulingFilterImplTest.java 
0cf23df9f373c0d9b27e55a12adefd5f5fd81ba5 
  src/test/java/org/apache/aurora/scheduler/http/AbstractJettyTest.java 
c2ceb4e7685a9301f8014a9183e02fbad65bca26 
  
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptionVictimFilterTest.java
 ee5c6528af89cc62a35fdb314358c489556d8131 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorImplTest.java 
98048fabc00f233925b6cca015c2525980556e2b 
  src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java 
2c3e5f32c774be07a5fa28c8bcf3b9a5d88059a1 
  src/test/java/org/apache/aurora/scheduler/scheduling/TaskGroupsTest.java 
95cf25eda0a5bfc0cc4c46d1439ebe9d5359ce79 
  
src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java 
72562e6bd9a9860c834e6a9faa094c28600a8fed 
  src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
b4d27f69ad5d4cce03da9f04424dc35d30e8af29 

Diff: https://reviews.apache.org/r/51929/diff/


Testing
---

All types of testing including deploying to test and production clusters.


Thanks,

Maxim Khutornenko