[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483181#comment-16483181
 ] 

Wangda Tan commented on YARN-8292:
----------------------------------

Thanks [~eepayne], 

I just checked both, 

For the infra queue preemption behavior:
bq. For example, if gpu is the extended resource, but no apps are currently 
using gpu in the queue, no intra-queue preemption will take place.
I think you're correct, the change I propose is:
{code} 
      if (conservativeDRF) {
        // When we want to do less aggressive preemption, we don't want to
        // preempt from any resource type if after preemption it becomes 0 or
        // negative.
        // For example:
        // - to-obtain = <30, 20, 0>, container <20, 20, 0> => allowed
        // - to-obtain = <30, 20, 0>, container <10, 10, 1> => disallowed
        // - to-obtain = <20, 30, 1>, container <20, 30, 1> => allowed
        // - to-obtain = <10, 20, 1>, container <11, 11, 0> = disallowed.
        doPreempt = Resources.lessThan(rc, clusterResource,
            Resources
                .componentwiseMin(toObtainAfterPreemption, Resources.none()),
            Resources.componentwiseMin(toObtainByPartition, Resources.none()));
{code}
However, this causes many (more than 20) infra queue preemption test cases 
failure. Since the logic (ver.005 patch) is not a regression. Can we address 
this in a separate JIRA if we cannot come with some simple solution? 

For:
bq. I don't think this is necessary. ..
Actually this is required after the change.
TLDR;
We now deduct unassigned (005) while doing calculation, but the previous logic 
doesn't. 
The previous logic deduct it after each iteration:
{code}
        Resources.addTo(wQassigned, wQdone);
      }
      Resources.subtractFrom(unassigned, wQassigned);
{code}

In our logic, we need the up-to-date unassigned to cap the {{wQavail}}, so we 
deduct it with the calculation.
{code}
        // Make sure it is not beyond unassigned
        wQavail = Resources.componentwiseMin(wQavail, unassigned);
{code}

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8292
>                 URL: https://issues.apache.org/jira/browse/YARN-8292
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Sumana Sathish
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch, YARN-8292.004.patch, YARN-8292.005.patch
>
>
> This is an example of the problem: 
>   
> {code}
>     //   guaranteed,  max,    used,   pending
>     "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
>         "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
>         "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
>         "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to