[ 
https://issues.apache.org/jira/browse/YARN-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478266#comment-16478266
 ] 

Wangda Tan commented on YARN-8292:
----------------------------------

[~jlowe],
I think you're correct :). I take my word back, my previous assumption:

{code}
   Σ(selected-container.resource) <= (for all resource types) 
Σ(queue.to-be-obtain)
selected-container                                          queue
{code}

Can break one case which one starving queue need to preempt containers from two 
over-utilized queues. 
For example:
{code}
queue-A, 
guaranteed: <30,50> , used: <40, 60>.

queue-B,
guaranteed: <30,50>, used: <40, 60>
{code} 

Assume we have a queue C want 20:20 resources.
So in this case, both of queue-A/queue-B, resource to obtain = 10:10

If containers running on the system have same size = 20:30. Under my existing 
approach, nothing can be preempted. This is also why some UT failed.

I just used your approach: 
bq. I think the check for a zero resource can be dropped and it simplifies to 
the toObtainAfterPreemption component-wise max'd with zero is less than the 
amount to obtain from the partition (after being max'd with zero).

With the 0 resource type check I commented above:
{code}
    // If a toObtain resource type == 0, set it to -1 to avoid 0 resource
    // type affect following doPreemption check: isAnyMajorResourceZero
    for (ResourceInformation ri : toObtainByPartition.getResources()) {
      if (ri.getValue() == 0) {
        ri.setValue(-1);
      }
    }
{code} 

Now everything works. Please check the attached patch (ver.3) to see if it 
works.

> Fix the dominant resource preemption cannot happen when some of the resource 
> vector becomes negative
> ----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8292
>                 URL: https://issues.apache.org/jira/browse/YARN-8292
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Sumana Sathish
>            Assignee: Wangda Tan
>            Priority: Critical
>         Attachments: YARN-8292.001.patch, YARN-8292.002.patch, 
> YARN-8292.003.patch
>
>
> This is an example of the problem: 
>   
> {code}
>     //   guaranteed,  max,    used,   pending
>     "root(=[30:18:6  30:18:6 12:12:6 1:1:1]);" + //root
>         "-a(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // a
>         "-b(=[10:6:2 10:6:2  6:6:3   0:0:0]);" + // b
>         "-c(=[10:6:2 10:6:2  0:0:0   1:1:1])"; // c
> {code}
> There're 3 resource types. Total resource of the cluster is 30:18:6
> For both of a/b, there're 3 containers running, each of container is 2:2:1.
> Queue c uses 0 resource, and have 1:1:1 pending resource.
> Under existing logic, preemption cannot happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to