[
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478266#comment-16478266
]
Wangda Tan commented on YARN-8292:
----------------------------------
[~jlowe],
I think you're correct :). I take my word back, my previous assumption:
{code}
Σ(selected-container.resource) <= (for all resource types)
Σ(queue.to-be-obtain)
selected-container queue
{code}
Can break one case which one starving queue need to preempt containers from two
over-utilized queues.
For example:
{code}
queue-A,
guaranteed: <30,50> , used: <40, 60>.
queue-B,
guaranteed: <30,50>, used: <40, 60>
{code}
Assume we have a queue C want 20:20 resources.
So in this case, both of queue-A/queue-B, resource to obtain = 10:10
If containers running on the system have same size = 20:30. Under my existing
approach, nothing can be preempted. This is also why some UT failed.
I just used your approach:
bq. I think the check for a zero resource can be dropped and it simplifies to
the toObtainAfterPreemption component-wise max'd with zero is less than the
amount to obtain from the partition (after being max'd with zero).
With the 0 resource type check I commented above:
{code}
// If a toObtain resource type == 0, set it to -1 to avoid 0 resource
// type affect following doPreemption check: isAnyMajorResourceZero
for (ResourceInformation ri : toObtainByPartition.getResources()) {
if (ri.getValue() == 0) {
ri.setValue(-1);
}
}
{code}
Now everything works. Please check the attached patch (ver.3) to see if it
works.
> Fix the dominant resource preemption cannot happen when some of the resource
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Sumana Sathish
> Assignee: Wangda Tan
> Priority: Critical
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch,
> YARN-8292.003.patch
>
>
> This is an example of the problem:
>
> {code}
> // guaranteed, max, used, pending
> "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]