[ 
https://issues.apache.org/jira/browse/YARN-10839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Ahuja updated YARN-10839:
-----------------------------------
    Affects Version/s: 2.7.5
                       3.3.1

> queueMaxAppsDefault when set blindly caps the root queue's maxRunningApps 
> setting to this value ignoring any individually overriden maxRunningApps 
> setting for child queues in FairScheduler
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10839
>                 URL: https://issues.apache.org/jira/browse/YARN-10839
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 2.7.5, 3.3.1
>            Reporter: Siddharth Ahuja
>            Assignee: Siddharth Ahuja
>            Priority: Major
>
> [queueMaxAppsDefault|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format]
>  sets the default running app limit for queues (including the root queue) 
> which can be overridden by individual child queues through the maxRunningApps 
> setting.
> Consider a simple FairScheduler XML as follows:
> {code}
> <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
> <allocations>
>     <queue name="root">
>         <weight>1.0</weight>
>         <schedulingPolicy>drf</schedulingPolicy>
>         <aclSubmitApps>*</aclSubmitApps>
>         <aclAdministerApps>*</aclAdministerApps>
>         <queue name="default">
>             <weight>1.0</weight>
>             <schedulingPolicy>drf</schedulingPolicy>
>         </queue>
>         <queue name="A">
>             <minResources>1024000 mb, 1000 vcores</minResources>
>             <maxRunningApps>15</maxRunningApps>
>             <weight>2.0</weight>
>             <schedulingPolicy>drf</schedulingPolicy>
>         </queue>
>         <queue name="B">
>             <minResources>512000 mb, 500 vcores</minResources>
>             <maxRunningApps>10</maxRunningApps>
>             <weight>1.0</weight>
>             <schedulingPolicy>drf</schedulingPolicy>
>         </queue>
>     </queue>
>     <queueMaxAppsDefault>3</queueMaxAppsDefault>
>     <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
>     <queuePlacementPolicy>
>         <rule name="specified" create="true"/>
>         <rule name="user" create="true"/>
>     </queuePlacementPolicy>
> </allocations>
> {code}
> Here:
> * {{queueMaxAppsDefault}} is set to 3 {{maxRunningApps}} by default.
> * root queue does not have any maxRunningApps limit set,
> * maxRunningApps for child queues - root.A is 15 and for root.B is 10.
> From above, if users wants to submit jobs to root.B, they are (incorrectly) 
> capped to 3, not 15 because the root queue (parent) itself is capped to 3 
> because of the queueMaxAppsDefault setting.
> Users' observations are thus seeing their apps stuck in ACCEPTED state.
> Either the above FairScheduler XML should have been rejected by the 
> ResourceManager, or, the root queue should have been capped to the maximum 
> maxRunningApps setting defined for a leaf queue. 
> Possible solution -> If root queue has no maxRunningApps set and 
> queueMaxAppsDefault is set to a lower value than maxRunningApps for an 
> individual leaf queue, then, the root queue should implicitly be capped to 
> the latter, instead of queueMaxAppsDefault.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to