[
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483181#comment-16483181
]
Wangda Tan commented on YARN-8292:
----------------------------------
Thanks [~eepayne],
I just checked both,
For the infra queue preemption behavior:
bq. For example, if gpu is the extended resource, but no apps are currently
using gpu in the queue, no intra-queue preemption will take place.
I think you're correct, the change I propose is:
{code}
if (conservativeDRF) {
// When we want to do less aggressive preemption, we don't want to
// preempt from any resource type if after preemption it becomes 0 or
// negative.
// For example:
// - to-obtain = <30, 20, 0>, container <20, 20, 0> => allowed
// - to-obtain = <30, 20, 0>, container <10, 10, 1> => disallowed
// - to-obtain = <20, 30, 1>, container <20, 30, 1> => allowed
// - to-obtain = <10, 20, 1>, container <11, 11, 0> = disallowed.
doPreempt = Resources.lessThan(rc, clusterResource,
Resources
.componentwiseMin(toObtainAfterPreemption, Resources.none()),
Resources.componentwiseMin(toObtainByPartition, Resources.none()));
{code}
However, this causes many (more than 20) infra queue preemption test cases
failure. Since the logic (ver.005 patch) is not a regression. Can we address
this in a separate JIRA if we cannot come with some simple solution?
For:
bq. I don't think this is necessary. ..
Actually this is required after the change.
TLDR;
We now deduct unassigned (005) while doing calculation, but the previous logic
doesn't.
The previous logic deduct it after each iteration:
{code}
Resources.addTo(wQassigned, wQdone);
}
Resources.subtractFrom(unassigned, wQassigned);
{code}
In our logic, we need the up-to-date unassigned to cap the {{wQavail}}, so we
deduct it with the calculation.
{code}
// Make sure it is not beyond unassigned
wQavail = Resources.componentwiseMin(wQavail, unassigned);
{code}
> Fix the dominant resource preemption cannot happen when some of the resource
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Sumana Sathish
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch,
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch
>
>
> This is an example of the problem:
>
> {code}
> // guaranteed, max, used, pending
> "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]