[ 
https://issues.apache.org/jira/browse/YARN-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16405166#comment-16405166
 ] 

Eric Payne commented on YARN-8020:
----------------------------------

[~leftnoteasy], sorry for the delay.
{quote}explain why preemption doesn't happen for the case you mentioned
{quote}
As it turns out, the corner case I'm running into is not related to DRF. It has 
the same behavior with the default resource calculator.

The use case is this:
 - QueueA is preemptable and is running App1 which is consuming the entire 
cluster
 - App2 is submitted to QueueB with container requests where each container is 
larger then the user limit for QueueB.

In this case, preemption will not occur.

DETAILS:
 - Cluster size: 20G
 - Cluster Min container size: 1G
 - QueueA capacity: 10G
 - QueueB capacity: 10G
 - QueueB MULP: 10%

ACTIONS:
 - App1 running in QueueA consumes 20G
 - App2 is submitted to QueueB with AM size 1G and map container sizes 4G.
 - App2's Max Resource is 1G when it is requesting the AM container (10% of 10G 
== 1G). The preemption monitor sees that the pending request is 1G and that 
App2's headroom is 1G, so it preempts 1G from App1 in QueueA.
 - The Capacity Scheduler assigns 1G to App2 in QueueB. App2 begins running the 
AM container.
 - App2 requests several map containers at 4G each. App2's Max Resource is 
computed to be 2G. (((active user's used resources/# active users) + min 
container size) == (1G/1 + 1G) == 2G)). This leaves 1G of headroom for App2.
 - The preemption monitor sees that the requested container size for App2 is 4G 
which is larger than the 1G headroom, so the preemption monitor does not 
preempt.

Technically, this behavior is slightly out of sync with the way the capacity 
scheduler assigns containers. As long as the headroom for an app is 0 or more, 
the capacity scheduler will assign one more container, no matter how big the 
container is, so the preemption monitor should go ahead and preempt in this 
case. I'm not sure I want it to, though, because it's better to be conservative 
than to preempt when it should not.

[~kyungwan nam], on what version of YARN are you seeing this problem? I am not 
seeing any DRF-related issues in 2.8 or 3.x.

> when DRF is used, preemption does not trigger due to incorrect idealAssigned
> ----------------------------------------------------------------------------
>
>                 Key: YARN-8020
>                 URL: https://issues.apache.org/jira/browse/YARN-8020
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: kyungwan nam
>            Priority: Major
>
> I’ve met that Inter Queue Preemption does not work.
> It happens when DRF is used and submitting application with a large number of 
> vcores.
> IMHO, idealAssigned can be set incorrectly by following code.
> {code}
> // This function "accepts" all the resources it can (pending) and return
> // the unused ones
> Resource offer(Resource avail, ResourceCalculator rc,
>     Resource clusterResource, boolean considersReservedResource) {
>   Resource absMaxCapIdealAssignedDelta = Resources.componentwiseMax(
>       Resources.subtract(getMax(), idealAssigned),
>       Resource.newInstance(0, 0));
>   // accepted = min{avail,
>   //               max - assigned,
>   //               current + pending - assigned,
>   //               # Make sure a queue will not get more than max of its
>   //               # used/guaranteed, this is to make sure preemption won't
>   //               # happen if all active queues are beyond their guaranteed
>   //               # This is for leaf queue only.
>   //               max(guaranteed, used) - assigned}
>   // remain = avail - accepted
>   Resource accepted = Resources.min(rc, clusterResource,
>       absMaxCapIdealAssignedDelta,
>       Resources.min(rc, clusterResource, avail, Resources
>           /*
>            * When we're using FifoPreemptionSelector (considerReservedResource
>            * = false).
>            *
>            * We should deduct reserved resource from pending to avoid 
> excessive
>            * preemption:
>            *
>            * For example, if an under-utilized queue has used = reserved = 20.
>            * Preemption policy will try to preempt 20 containers (which is not
>            * satisfied) from different hosts.
>            *
>            * In FifoPreemptionSelector, there's no guarantee that preempted
>            * resource can be used by pending request, so policy will preempt
>            * resources repeatly.
>            */
>           .subtract(Resources.add(getUsed(),
>               (considersReservedResource ? pending : pendingDeductReserved)),
>               idealAssigned)));
> {code}
> let’s say,
> * cluster resource : <Memory:200GB, VCores:20>
> * idealAssigned(assigned): <Memory:100GB, VCores:10>
> * avail: <Memory:181GB, Vcores:1>
> * current: <Memory:19GB, Vcores:19>
> * pending: <Memory:0, Vcores:0>
> current + pending - assigned: <Memory:-181GB, Vcores:9>
> min ( avail, (current + pending - assigned) ) : <Memory:-181GB, Vcores:9>
> accepted: <Memory:-181GB, Vcores:9>
> as a result, idealAssigned will be <Memory:-81GB, VCores:19>, which does not 
> trigger preemption.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to