[ 
https://issues.apache.org/jira/browse/YARN-4108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728290#comment-14728290
 ] 

Wangda Tan commented on YARN-4108:
----------------------------------

Choices to handle this issue:
*Approach #1: Do some dry-runs before make decision to preempt a container:*
Suggested by Meng Ding at: 
https://issues.apache.org/jira/browse/YARN-3769?focusedCommentId=14709856&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14709856.
Problems of this approach is:
a. Implementation of dry-run is very expensive, we need to add dry-run flag 
everywhere to scheduler
b. Assume we have dry-run implemented, it doesn't help following cases (or very 
hard to obtain in the dry-run semantics):
- Preempt 5 small containers in one node to allocate 1 big container. If we do 
dry-run when make decision of preempting each small container, we still cannot 
allocate the big container.
- Preempt 100 containers from queue-a, it is possible queue-b is still not able 
to allocate container because of user-limit.

*Approach #2: Lazy preemption: mark container to-be preempted in preemption 
policy using existing logic, and decide if to preempt it while doing 
allocation.*
This is originally posted by myself at: 
https://issues.apache.org/jira/browse/YARN-3769?focusedCommentId=14573638&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14573638.
Problems of this approach is:
- It doesn't help the case: when queue-b wants to allocate containers on 
host\[1-3\], but preemption policy choose to preempt queue-a's contaienrs on 
host\[4-6\].

*One possible solution in my mind:*
I haven't thought through it, but it seems doable, just throw it here for 
discussion. Which is *lazy binding of container preemption*:
- In preemption policy, it won't decide which containers to preempt, instead, 
it will record for each queue/app/user, how much resources can be preempted.
- In allocation logic, we will deduct preemptable resource for each 
queue/app/user, which is similar to how continuous-reservation-lookup works. If 
scheduler found a container can be allocated after preempt several containers 
in the node, it will put these containers to wait-list, and update preemptable 
resources for the queue/user/app. If a container finished or other 
queue/user/app don't need the resource anymore, container will be removed from 
wait-list.

Will think through it and upload design doc.

> CapacityScheduler: Improve preemption to preempt only those containers that 
> would satisfy the incoming request
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4108
>                 URL: https://issues.apache.org/jira/browse/YARN-4108
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>
> This is sibling JIRA for YARN-2154. We should make sure container preemption 
> is more effective.
> *Requirements:*:
> 1) Can handle case of user-limit preemption
> 2) Can handle case of resource placement requirements, such as: hard-locality 
> (I only want to use rack-1) / node-constraints (YARN-3409) / black-list (I 
> don't want to use rack1 and host\[1-3\])
> 3) Can handle preemption within a queue: cross user preemption (YARN-2113), 
> cross applicaiton preemption (such as priority-based (YARN-1963) / 
> fairness-based (YARN-3319)).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to