Re: Review Request 32352: Making preemptor asynchronous. Part 3(final) - background service.

Maxim Khutornenko Tue, 24 Mar 2015 14:26:20 -0700


> On March 24, 2015, 6:32 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessor.java,
> >  line 100
> > <https://reviews.apache.org/r/32352/diff/1/?file=902121#file902121line100>
> >
> >     This regresses on some important characteristics of `TaskGroups`:
> >     
> >     - starvation avoidance: a very large job (even a low priority one) can 
> > starve a single restarting instance
> >     - backoffs: we will perform redundant and possibly hopeless work to try 
> > to schedule the same things without reducing frequency
> >     
> >     It might be easier to retrofit `TaskGroups` to allow injection of the 
> > underlying work than to reimplement those behaviors, but i'm interested in 
> > what you think.


>starvation avoidance: a very large job (even a low priority one) can starve a 
>single restarting instance

I think this can be a fair concern when the number of pending tasks exceeds the 
number of theoretical slots in the cluster. I don't think the `TaskGroups` fits 
here given its penalty system, rescheduling calculator and executor driven 
evaluation. We can go with a (hopefully) simpler `TaskGroup` "circular buffer" 
interator batch algorithm instead, for example:

Create a list of `TaskGroup` instances ordered by the number of pending tasks 
in each group (largest first).
Create (or maintain) a circular *buffer* from the above list.
For each slave:
  While no slot found and we have not looped:
    Evaluate *current* `TaskGroup`
    Move *current* to next
  
After all slaves are evaluated we will have a mapping of `TaskGroup`s to slots 
that we can create victims from.

Consider TaskGroup to pending task mapping:
TG1 | TG2 | TG3 | TG4
----|-----|-----|----
1.1 | 2.1 | 3.1 | 4.1
----|-----|-----|----
1.2 | 2.2 | 3.2 | 
----|-----|-----|----
1.3 | 2.3 |     |
----|-----|-----|----
1.4 |     |     |
----|-----|-----|----
1.5 |

and slaves A, B, C, D, E, F, G, H

A possible slot allocation after all slaves are processed could be:

A:1.1, B:NONE, C:3.1, D:4.1, E:1.2, F:2.1, G:3.2, H:1.3

This will give a fair chance to smaller job while making sure larger jobs are 
not penalized unnecessarily. 

>backoffs: we will perform redundant and possibly hopeless work to try to 
>schedule the same things without reducing frequency

I am not concerned about backoffs here. While we are going to run the above 
algorithm every minute, we will only evaluate tasks that don't already have 
slots reserved for. With slot reservation timeout currently set to 5 minutes, I 
don't see much redundancy that we would have to throttle.

If you agree with the above proposal I will add it into AURORA-1219 to address 
shortly after.


> On March 24, 2015, 6:32 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessor.java,
> >  line 73
> > <https://reviews.apache.org/r/32352/diff/1/?file=902121#file902121line73>
> >
> >     I'm beginning to question this pattern.  It seems reasonable to insist 
> > that the global injector _never_ has a binding for a general type like 
> > this.  With that in mind, you should be comfortable using a private module 
> > and an unqualified binding, saving this kind of clutter.  What do you think?

I am not keen on this idea given that the clutter will shift from attributes to 
private modules but I willing to give it a try to possibly it within the module 
itself.


> On March 24, 2015, 6:32 p.m., Bill Farner wrote:
> > src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java, lines 
> > 227-229
> > <https://reviews.apache.org/r/32352/diff/1/?file=902120#file902120line227>
> >
> >     Isn't this scenario transparent to the scheduler, though?  If i 
> > understand you correctly, an available reservation means the slot has been 
> > freed and is now a reserved Offer, which the assigner function handles.
> >     
> >     However, another nice TODO would be to eventually remove the fallback 
> > to preemption here.  Seems reasonable that this body of code only speaks in 
> > Offers, and the newly-established territory is responsible for generating 
> > offers from tasks when it sees fit.

I am not quite sure what you mean here. If by "newly-established territory" you 
mean preemptor then I don't see it a good fit for acting on its own quite yet. 
There is always a chance a PENDING task is killed after a slot is found and 
preemptor will kill tasks which resources will never get reclaimed.

The idea I tried to capture by this TODO is elimination of Round 3 below:
- Round 1: Pending task fails to schedule, no slot found in preemptor
- Some time later a slot is found by the preemptor
- Round 2: Pending task fails to schedule, calls into preemptor and preempts 
tasks on a slave. **Slave reservation is created but can't be used right away 
as currently coded.**
- Round 3: Pending task fails to schedule, reservation is found and task is 
finally scheduled.


- Maxim


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/32352/#review77609
-----------------------------------------------------------


On March 21, 2015, 2:19 a.m., Maxim Khutornenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/32352/
> -----------------------------------------------------------
> 
> (Updated March 21, 2015, 2:19 a.m.)
> 
> 
> Review request for Aurora, Bill Farner and Zameer Manji.
> 
> 
> Bugs: AURORA-1158
>     https://issues.apache.org/jira/browse/AURORA-1158
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Summary of changes:
> - The PreemptorImpl now only hosts slot validation and task preemption logic 
> and requires a write transaction.
> - PendingTaskProcessor is called every minute to pull all available PENDING 
> tasks and search for preemption slots.
> - TaskScheduler has no connection to slot search anymore. It now completely 
> relies on periodic PreemptionService to find available slots.
> - preemption_delay is reduced from 10 to 3 minutes.
> 
> Benchmark refactoring will be addressed separately.
> 
> 
> Diffs
> -----
> 
>   src/jmh/java/org/apache/aurora/benchmark/SchedulingBenchmarks.java 
> 5309e8140fff411da30ee87c1b3b1a55d6fdaeeb 
>   src/main/java/org/apache/aurora/scheduler/async/TaskScheduler.java 
> 1b9d741dba7b9c2663ff119cd44adc8403c0d257 
>   
> src/main/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessor.java
>  PRE-CREATION 
>   
> src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptionSlotCache.java
>  4ca36e5fe2cfd326f4d4f37f70dbcd0060109e73 
>   
> src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptionSlotFinder.java
>  84bcdc57d798aca229d4f184cae065ec4dcf8fc5 
>   src/main/java/org/apache/aurora/scheduler/async/preemptor/Preemptor.java 
> 84791a272f245c729706af95d70c7f1631bfe99c 
>   
> src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImpl.java 
> 18a2e6032ba86ff7efab4d42a4d83798a1d06b06 
>   
> src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorMetrics.java
>  782e751f5b05391ebeee4d947570cc174dddece2 
>   
> src/main/java/org/apache/aurora/scheduler/async/preemptor/PreemptorModule.java
>  7034a07eae1f94d7a0bbccdf8146cf3ed0a5424e 
>   src/main/java/org/apache/aurora/scheduler/filter/AttributeAggregate.java 
> da7b662c3ca4040221805cba81e5ac7b64fb3df4 
>   src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerImplTest.java 
> 29fe156da19f3c08af00a951bb3baccf2b97a6cb 
>   src/test/java/org/apache/aurora/scheduler/async/TaskSchedulerTest.java 
> f5c2128e90eb5c849d68431225661d1cfa7da0cc 
>   
> src/test/java/org/apache/aurora/scheduler/async/preemptor/PendingTaskProcessorTest.java
>  PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/async/preemptor/PreemptorImplTest.java
>  d17c4fb513afdb7d8ef6d7c2b0aef86c1f47c082 
>   
> src/test/java/org/apache/aurora/scheduler/async/preemptor/PreemptorModuleTest.java
>  0e2e958a053e5cee280b947f7349c076e2addb45 
> 
> Diff: https://reviews.apache.org/r/32352/diff/
> 
> 
> Testing
> -------
> 
> ./gradlew -Pq build
> Manual testing in vagrant.
> 
> 
> Thanks,
> 
> Maxim Khutornenko
> 
>

Re: Review Request 32352: Making preemptor asynchronous. Part 3(final) - background service.

Reply via email to