[
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478104#comment-16478104
]
Jason Lowe commented on YARN-8292:
----------------------------------
bq. After preemption, there're at least one 0 major resources (which indicates
that the queue is still satisfied after preemption).
I'm still confused by this point. How is that not going to be always true when
the cluster has a rarely-used resource dimension? For example, let's say GPU
is one of the dimensions, and all the apps that want to use GPUs are all
running in only one of many queues on the cluster. The other queues will all
have zero for their GPU usage, and any cross-queue preemptions between those
other queues will all have zero in the GPU resource for toObtainFromPartition
and toObtainAfterPreemption. In other words, it effectively disabled the less
than Resources.none check when comparing preemptions between these
non-GPU-using queues because GPU will always be zero so isAnyMajorResourceZero
will always be true. Or am I missing something?
For the case of not wanting to kill a container that is (4, 1, 1) when the ask
is only (3, -1, -1), the comparison against Resources.none should cover that.
What is an example scenario where the additional check if any resource
dimension is zero is needed to do the right thing? From the scenario I
described above, I can see where it can (incorrectly?) override the comparison
against Resources.none and preempt a (4, 1, 0) container when the ask is only
(3, -1, 0).
> Fix the dominant resource preemption cannot happen when some of the resource
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Sumana Sathish
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem:
>
> {code}
> // guaranteed, max, used, pending
> "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]