Jason Lowe commented on YARN-4108:

For the case where we are preempting containers but the other job/queue cannot 
take them because of user limit: that's clearly a bug in the preemption logic 
and not related to the problem of matching up preemption events to pending asks 
so we satisfy the preemption trigger.  We should never be preempting containers 
if a user limit would foil our ability to reassign those resources.  If we are 
then the preemption logic is miscalculating the amount of pending demand that 
triggered the preemption in the first place.

For the case where the pending ask that is triggering the preemption has very 
strict and narrow locality requirements: yeah, that's a tough one.  If the 
locality requirement can be relaxed then it's not too difficult -- by the time 
we preempt we'll have given up on looking for locality by then.  However if the 
locality requirement cannot be relaxed then preemption could easily thrash 
wildly if the resources that can be preempted do not satisfy the pending ask.  
We would need to be very concious of the request we're trying to satisfy -- 
preemption may not be able to satisfy the request at all in some cases.

I was thinking along the reservation lines as well.  When we are trying to 
satisfy a request on a busy cluster we already make a reservation on a node.  
When we decide to preempt we can move the request's reservation to the node 
where we decided to preempt containers.  The problem is that we are now 
changing the algorithm for deciding what gets shot -- it used to be least 
amount of work lost, but now with locality introduced into the equation there 
needs to be a weighting of container duration and locality in the mix.

This would be a lot more straightforward if the scheduler wasn't trying to 
peephole optimize by only looking at one node at a time when it schedules.  If 
the scheduler could look across nodes and figure out which node "wins" in terms 
of sufficiently preemptable resources with the lowest cost of preemption then 
it could the send the preemption requests/kills to the containers on that node 
and move the reservation to that node.  Looking at only one node at a time 
means we may have to do "scheduling opportunity" hacks to let it see enough 
nodes to make a good decision.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4108
>                 URL: https://issues.apache.org/jira/browse/YARN-4108
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).

This message was sent by Atlassian JIRA

Reply via email to