[
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478244#comment-16478244
]
Wangda Tan commented on YARN-8292:
----------------------------------
[~eepayne],
It is actually on in {{setup()}}. :).
[~jlowe],
I can understand ur suggestion now, but simply drop the check as you mentioned:
bq. I think the check for a zero resource can be dropped and it simplifies to
the toObtainAfterPreemption component-wise max'd with zero is less than the
amount to obtain from the partition (after being max'd with zero).
Is not enough.
The reason is, we want to make sure no over-preemption happens. For example. If
res-to-obtain = (3, 0, 0), and container has size = (4, 1, 0) (The 3rd type is
0 for both). We don't want the preempt happen because it will make the queue
under utilized. And it can preempt more containers than required. We need to
make sure that:
{code}
Σ(selected-container.resource) <= (for all resource types)
Σ(queue.to-be-obtain)
selected-container queue
{code}
In my previous patch, as you mentioned if some resource type are always 0, it
will invalidate the check. So I added a check:
{code}
// If a toObtain resource type == 0, set it to -1 to avoid 0 resource
// type affect following doPreemption check: isAnyMajorResourceZero
for (ResourceInformation ri : toObtainByPartition.getResources()) {
if (ri.getValue() == 0) {
ri.setValue(-1);
}
}
{code}
Before
{code}
if (Resources.greaterThan(rc, clusterResource, toObtainByPartition,
Resources.none())
{code}
It looks like the problem can be solved. Please let me know if you think
different.
> Fix the dominant resource preemption cannot happen when some of the resource
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Sumana Sathish
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-8292.001.patch
>
>
> This is an example of the problem:
>
> {code}
> // guaranteed, max, used, pending
> "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]