[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17000125#comment-17000125
 ] 

Eric Payne commented on YARN-8292:
----------------------------------

Thanks a lot [~jhung] for looking at this. I apologize in advance for the 
lengthy response.
{quote}Should we just commit YARN-10033 to branch-2.10 to address the issue you 
fixed between YARN-8292.branch-2.010.patch and YARN-8292.branch-2.10.011.patch? 
Then we can commit YARN-8292.branch-2.10.010.patch to branch-2.10.
{quote}
No, I don't think so. 
 TL;DR: Since there is no cached effective max resource in 2.10, YARN-10033 
can't be backported to 2.10.
 The root cause of the test failure in YARN-10033 was as follows:
 - In 3.x, {{TempQueuePerPartition#getMax}} uses the cached {{effMaxRes}} 
value. In 2.10, {{getMax}} is calculated.
 - When changes for YARN-8292 were added, they changed the amount of 
preemptions in some cases because it is now taking into account resource 
components that are negative and non-negative.
 - When {{TestProportionalCapacityPreemptionPolicy}} mocks effective max 
resource (effMaxRes), it always sets Vcores to 0.
 - Since YARN-8292 changed behavior, 
{{TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource}} 
should have also changed the number of expected preemptions. However, since 
{{TestProportionalCapacityPreemptionPolicy}} did not mock {{effMaxRes}} 
correctly, the fact that this unit test should have changed was missed when 
YARN-8292 was put into 3.x. This is what YARN-10033 addressed.
 - The changes made in YARN-8292.branch-2.10.011.patch to 
{{TestProportionalCapacityPreemptionPolicy#testPreemptionWithVCoreResource}} 
are the same that should have been done originally when YARN-8292 when 
committed to 3.x.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8292
>                 URL: https://issues.apache.org/jira/browse/YARN-8292
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Sumana Sathish
>            Assignee: Wangda Tan
>            Priority: Critical
>             Fix For: 3.2.0, 3.1.1
>
>         Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch, 
> YARN-8292.006.patch, YARN-8292.007.patch, YARN-8292.008.patch, 
> YARN-8292.009.patch, YARN-8292.branch-2.009.patch, 
> YARN-8292.branch-2.010.patch, YARN-8292.branch-2.10.011.patch
>
>
> This is an example of the problem: 
>   
> {code}
>     //   guaranteed,  max,    used,   pending
>     "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
>         "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
>         "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
>         "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to