[ 
https://issues.apache.org/jira/browse/YARN-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326629#comment-15326629
 ] 

Karthik Kambatla commented on YARN-5077:
----------------------------------------

Interesting approach on the last patch. 

Few comments:
# Can we extend it to address YARN-4866 as well, so we have a uniform approach? 
# Instead of checking for weight, we might want to check if fairshare 
memory/cpu being 0. That way, we will also address cases where the weight is 
really small due to which the fairshare is essentially 0.
# FSQueue#getMaxShare appears to be not checking the parent queues. Shouldn't 
we be checking that? FWIW, I am not a fan of our current approach of querying 
AllocationConfiguration. Will it be better to use FSQueue to store 
queue-specific information instead? I am comfortable with tackling that in 
another JIRA either before or immediately after this. 


> Fix FSLeafQueue#getFairShare() for queues with weight 0.0
> ---------------------------------------------------------
>
>                 Key: YARN-5077
>                 URL: https://issues.apache.org/jira/browse/YARN-5077
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Yufei Gu
>            Assignee: Yufei Gu
>         Attachments: YARN-5077.001.patch, YARN-5077.002.patch, 
> YARN-5077.003.patch, YARN-5077.004.patch, YARN-5077.005.patch, 
> YARN-5077.006.patch, YARN-5077.007.patch
>
>
> 1) When a queue's weight is set to 0.0, FSLeafQueue#getFairShare() returns 
> <memory:0, vCores:0> 
> 2) When a queue's weight is nonzero, FSLeafQueue#getFairShare() returns 
> <memory:16384, vCores:8>
> In case 1), that means no container ever gets allocated for an AM because 
> from the viewpoint of the RM, there is never any headroom to allocate a 
> container on that queue.
> For example, we have a pool with the following weights: 
> - root.dev 0.0 
> - root.product 1.0
> The root.dev is a best effort pool and should only get resources if 
> root.product is not running. In our tests, with no jobs running under 
> root.product, jobs started in root.dev queue stay stuck in ACCEPT phase and 
> never start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to