Re: Review Request 58259: Add update affinity to Scheduler

Stephan Erb Tue, 18 Apr 2017 10:35:34 -0700


> On April 17, 2017, 10:27 p.m., Stephan Erb wrote:
> > The code change looks decent to me. 
> > 
> > However, I am unsure about two things:
> > 
> > * For us it is common to have jobs with #instance in the ballpark of 
> > #agents. The proposed code change could easily block a significant number 
> > of agents for scheduling, even if there would be enough capacity for other 
> > job instances. So while improving the MTTA for job udpates, this could 
> > easily lead to increased MTTA for regularly launched jobs (cron, adhoc, 
> > etc).
> > * The alternative dynamic reservation proposal has the advantage that it 
> > works when multiple frameworks are used. Would it be plausible to just 
> > reserve any used resources in a generic fashion, so that we ensure 
> > reservations always come back to Aurora and cannot be intercepted by 
> > another framework?
> > 
> > Please run `./gradlew jmh -Pbenchmarks='SchedulingBenchmarks.*'` to help 
> > ensure the scheduling changes don't come with an unexpected performance 
> > regression.
> 
> David McLaughlin wrote:
>     I think for (1) you described a problem that wouldn't be an issue in 
> clusters with decent amounts of capacity available. It's only really an issue 
> in low capacity clusters. And this change is specifically targetting the use 
> case you mentioned (big, hard-to-schedule task of an important production job 
> being killed for an update and some low priority task like a cron taking its 
> place and then the prod job not being able to be scheduled.. triggering 
> preemption and churn across the cluster - rinse, repeat for thousands of 
> instances of a task). 
>     
>     We run Aurora as a single framework, so can't really speak to (2). I 
> think though you'd just want Dynamic Reservations for this? Is that what 
> you're suggesting? Now we're back to the other approach which also has a 
> bunch of open questions.
>     
>     To be clear - this approach has one major difference I care about: it 
> does not expose this to users via a new tier. In practice it means we don't 
> need to ask people to opt in to what is essentially caching, and we also 
> don't need to expose the reserved tier for users (Twitter also has the use 
> case where we want to expose user-managed dynamic reservations via some 
> reserved tier).


(1) Yeah good point. We will probably have to see how this behaves in pratice 
on smaller clusters. I have also realized that the batch size, rather than 
#instances is the limiting factor.

(2) What I was aiming at is probably orthogonal to the implementation itself: 
In a multi-framework world neither the preemptor nor this affinity patch will 
work nicely. Aurora will release resources and expect those come back. They 
probably never will. 

The question is though: Rather than going to the trouble of conditionally 
reserving resources using a tier setting, would it be feasible to 
unconditionally reserve all resources Aurora uses? That way we could guarantee 
they always bounce back. If we don't need them any longer, we could unreserve 
them. As stated, this could be independent of the patch here, as it would also 
apply to preemption.


> On April 17, 2017, 10:27 p.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java
> > Lines 52-55 (patched)
> > <https://reviews.apache.org/r/58259/diff/2/?file=1689854#file1689854line52>
> >
> >     I am trying to understand if this is a good default for this 
> > best-effort feature.
> >     
> >     What is your cluster-wide MTTA? It should give us a decent hint for a 
> > suitable default.
> 
> David McLaughlin wrote:
>     Our MTTA can range from a couple milliseconds to several minutes. Depends 
> how many tasks are pending and how full the cluster is.

If I understand this correctly, this patch will help the "good case" but could 
fall down quickly during overload: If the cluster is getting overloaded with 
pending tasks, the 1 min timeout might not be sufficient to place a job in its 
reserved spot. This will then lead to preemptions that further aggregate the 
overload situation. 

We will need a counter to track those expired reservations.


- Stephan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58259/#review172122
-----------------------------------------------------------


On April 12, 2017, 9:51 a.m., David McLaughlin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58259/
> -----------------------------------------------------------
> 
> (Updated April 12, 2017, 9:51 a.m.)
> 
> 
> Review request for Aurora, Santhosh Kumar Shanmugham, Stephan Erb, and Zameer 
> Manji.
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> In the Dynamic Reservations review (and on the mailing list), I mentioned 
> that we could implement update affinity with less complexity using the same 
> technique as preemption. Here is how that would work. 
> 
> This just adds a simple wrapper around the preemptor's BiCache structure and 
> then optimistically tries to keep an agent free for a task during the update 
> process. 
> 
> 
> Note: I don't bother even checking the resources before reserving the agent. 
> I figure there is a chance the agent has enough room, and if not we'll catch 
> it when we attempt to veto the offer. We need to always check the offer like 
> this anyway in case constraints change. In the worst case it adds some delay 
> in the rare cases you increase resources. 
> 
> We also don't persist the reservations, so if the Scheduler fails over during 
> an update, the worst case is that any instances between the KILLED and 
> ASSIGNED in-flight batch need to fall back to the current first-fit 
> scheduling algorithm.
> 
> 
> Diffs
> -----
> 
>   src/main/java/org/apache/aurora/scheduler/base/TaskTestUtil.java 
> f0b148cd158d61cd89cc51dca9f3fa4c6feb1b49 
>   src/main/java/org/apache/aurora/scheduler/scheduling/TaskScheduler.java 
> 203f62bacc47470545d095e4d25f7e0f25990ed9 
>   src/main/java/org/apache/aurora/scheduler/state/TaskAssigner.java 
> a177b301203143539b052524d14043ec8a85a46d 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceAction.java 
> b4cd01b3e03029157d5ca5d1d8e79f01296b57c2 
>   
> src/main/java/org/apache/aurora/scheduler/updater/InstanceActionHandler.java 
> f25dc0c6d9c05833b9938b023669c9c36a489f68 
>   src/main/java/org/apache/aurora/scheduler/updater/InstanceUpdater.java 
> c129896d8cd54abd2634e2a339c27921042b0162 
>   
> src/main/java/org/apache/aurora/scheduler/updater/JobUpdateControllerImpl.java
>  e14112479807b4477b82554caf84fe733f62cf58 
>   src/main/java/org/apache/aurora/scheduler/updater/StateEvaluator.java 
> c95943d242dc2f539778bdc9e071f342005e8de3 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdateAgentReserver.java 
> PRE-CREATION 
>   src/main/java/org/apache/aurora/scheduler/updater/UpdaterModule.java 
> 13cbdadad606d9acaadc541320b22b0ae538cc5e 
>   
> src/test/java/org/apache/aurora/scheduler/scheduling/TaskSchedulerImplTest.java
>  fa1a81785802b82542030e1aae786fe9570d9827 
>   src/test/java/org/apache/aurora/scheduler/state/TaskAssignerImplTest.java 
> cf2d25ec2e407df7159e0021ddb44adf937e1777 
>   src/test/java/org/apache/aurora/scheduler/updater/AddTaskTest.java 
> b2c4c66850dd8f35e06a631809530faa3b776252 
>   src/test/java/org/apache/aurora/scheduler/updater/InstanceUpdaterTest.java 
> c78c7fbd7d600586136863c99ce3d7387895efee 
>   src/test/java/org/apache/aurora/scheduler/updater/JobUpdaterIT.java 
> 30b44f88a5b8477e917da21d92361aea1a39ceeb 
>   src/test/java/org/apache/aurora/scheduler/updater/KillTaskTest.java 
> 833fd62c870f96b96343ee5e0eed0d439536381f 
>   
> src/test/java/org/apache/aurora/scheduler/updater/NullAgentReserverTest.java 
> PRE-CREATION 
>   
> src/test/java/org/apache/aurora/scheduler/updater/UpdateAgentReserverImplTest.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/58259/diff/2/
> 
> 
> Testing
> -------
> 
> ./gradlew build
> ./src/test/sh/org/apache/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> David McLaughlin
> 
>

Re: Review Request 58259: Add update affinity to Scheduler

Reply via email to