[
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000125#comment-17000125
]
Eric Payne commented on YARN-8292:
----------------------------------
Thanks a lot [~jhung] for looking at this. I apologize in advance for the
lengthy response.
{quote}Should we just commit YARN-10033 to branch-2.10 to address the issue you
fixed between YARN-8292.branch-2.010.patch and YARN-8292.branch-2.10.011.patch?
Then we can commit YARN-8292.branch-2.10.010.patch to branch-2.10.
{quote}
No, I don't think so.
TL;DR: Since there is no cached effective max resource in 2.10, YARN-10033
can't be backported to 2.10.
The root cause of the test failure in YARN-10033 was as follows:
- In 3.x, {{TempQueuePerPartition#getMax}} uses the cached {{effMaxRes}}
value. In 2.10, {{getMax}} is calculated.
- When changes for YARN-8292 were added, they changed the amount of
preemptions in some cases because it is now taking into account resource
components that are negative and non-negative.
- When {{TestProportionalCapacityPreemptionPolicy}} mocks effective max
resource (effMaxRes), it always sets Vcores to 0.
- Since YARN-8292 changed behavior,
{{TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource}}
should have also changed the number of expected preemptions. However, since
{{TestProportionalCapacityPreemptionPolicy}} did not mock {{effMaxRes}}
correctly, the fact that this unit test should have changed was missed when
YARN-8292 was put into 3.x. This is what YARN-10033 addressed.
- The changes made in YARN-8292.branch-2.10.011.patch to
{{TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource}}
are the same that should have been done originally when YARN-8292 when
committed to 3.x.
> Fix the dominant resource preemption cannot happen when some of the resource
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
> Key: YARN-8292
> URL: https://issues.apache.org/jira/browse/YARN-8292
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Reporter: Sumana Sathish
> Assignee: Wangda Tan
> Priority: Critical
> Fix For: 3.2.0, 3.1.1
>
> Attachments: YARN-8292.001.patch, YARN-8292.002.patch,
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch,
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch,
> YARN-8292.009.patch, YARN-8292.branch-2.009.patch,
> YARN-8292.branch-2.010.patch, YARN-8292.branch-2.10.011.patch
>
>
> This is an example of the problem:
>
> {code}
> // guaranteed, max, used, pending
> "root(=[30:18:6 30:18:6 12:12:6 1:1:1]);" + //root
> "-a(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // a
> "-b(=[10:6:2 10:6:2 6:6:3 0:0:0]);" + // b
> "-c(=[10:6:2 10:6:2 0:0:0 1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]