[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478104#comment-16478104
 ] 

Jason Lowe commented on YARN-8292:
----------------------------------

bq. After preemption, there're at least one 0 major resources (which indicates 
that the queue is still satisfied after preemption).

I'm still confused by this point.  How is that not going to be always true when 
the cluster has a rarely-used resource dimension?  For example, let's say GPU 
is one of the dimensions, and all the apps that want to use GPUs are all 
running in only one of many queues on the cluster.  The other queues will all 
have zero for their GPU usage, and any cross-queue preemptions between those 
other queues will all have zero in the GPU resource for toObtainFromPartition 
and toObtainAfterPreemption.  In other words, it effectively disabled the less 
than Resources.none check when comparing preemptions between these 
non-GPU-using queues because GPU will always be zero so isAnyMajorResourceZero 
will always be true.  Or am I missing something?

For the case of not wanting to kill a container that is (4, 1, 1) when the ask 
is only (3, -1, -1), the comparison against Resources.none should cover that.  
What is an example scenario where the additional check if any resource 
dimension is zero is needed to do the right thing?  From the scenario I 
described above, I can see where it can (incorrectly?) override the comparison 
against Resources.none and preempt a (4, 1, 0) container when the ask is only 
(3, -1, 0).


> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8292
>                 URL: https://issues.apache.org/jira/browse/YARN-8292
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Sumana Sathish
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem: 
>   
> {code}
>     //   guaranteed,  max,    used,   pending
>     "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
>         "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
>         "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
>         "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to