-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55357/
-----------------------------------------------------------
Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and
Zameer Manji.
Bugs: AURORA-1867
https://issues.apache.org/jira/browse/AURORA-1867
Repository: aurora
Description
-------
To be fair, PendingTaskProcessor interleaves tasks from different groups.
However, this fairness comes at the price of increasing reservation time. Even
if reservations are being made for the same task group, the processor would
still restart iterating through slaves for each task instance. This results in
reevaluating all slaves already rejected in a previous search before it finds a
new viable candidate.
This patch improves `PendingTaskProcessor` performance by reducing slave
search/evaluation time, at the cost of reduced fairness. `PendingTaskProcessor`
now does reservation for a configurable maximum of _N_ candidates per task
group in each iteration over the list of slaves.
Diffs
-----
src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java
fa37236e68657b539b182519b9d46d96d5b0953a
src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java
f59f3fd8959b1ba3726b55a2943fb9228a049ac5
src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorMetrics.java
67822cafbe89f4798b4ea6da3856663cc4872798
src/main/java/org/apache/aurora/scheduler/preemptor/PreemptorModule.java
23d1c120657d5cb9d294a80c63e8a04512d361ca
src/test/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessorTest.java
d11ae5883f2a00dca4c4b36f0ab58ea95c7ecb2e
src/test/java/org/apache/aurora/scheduler/preemptor/PreemptorModuleTest.java
67b6d69e3ddd1028dfe9ff451b171cd888674920
Diff: https://reviews.apache.org/r/55357/diff/
Testing
-------
As is, the cluster setup in our existing preemption benchmark does not reflect
the improvements resulting from this patch. Currently, all existing victims can
be preempted, therefore all `PendingTaskProcessor` has to is look at the next
slave.
```
BEFORE
Benchmark
(numPendingTasks) Mode Cnt Score Error Units
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
1 thrpt 10 75.386 ± 2.984 ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
10 thrpt 10 74.584 ± 2.598 ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
100 thrpt 10 79.731 ± 2.182 ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
1000 thrpt 10 66.386 ± 1.833 ops/s
AFTER
Benchmark
(numPendingTasks) Mode Cnt Score Error Units
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
1 thrpt 10 78.266 ± 3.290 ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
10 thrpt 10 76.743 ± 2.073 ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
100 thrpt 10 75.343 ± 1.943 ops/s
SchedulingBenchmarks.PreemptorSlotSearchBenchmark.runBenchmark
1000 thrpt 10 68.284 ± 2.413 ops/s
```
I need to further imprpve the cluster setup for this benchmark to reflect the
improvements in the patch. A more representative cluster setup would be one in
which only a subset of potential victims pass
`PreemptionVictimFilter.filterPreemptionVictims()` test.
Thanks,
Mehrdad Nurolahzade