Siddharth Ahuja created YARN-10839:
--------------------------------------

             Summary: queueMaxAppsDefault when set blindly caps the root 
queue's maxRunningApps setting to this value ignoring any individually 
overriden maxRunningApps setting for child queues in FairScheduler
                 Key: YARN-10839
                 URL: https://issues.apache.org/jira/browse/YARN-10839
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Siddharth Ahuja


[queueMaxAppsDefault|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format]
 sets the default running app limit for queues (including the root queue) which 
can be overridden by individual child queues through the maxRunningApps setting.

Consider a simple FairScheduler XML as follows:

{code}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
    <queue name="root">
        <weight>1.0</weight>
        <schedulingPolicy>drf</schedulingPolicy>
        <aclSubmitApps>*</aclSubmitApps>
        <aclAdministerApps>*</aclAdministerApps>
        <queue name="default">
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
        </queue>
        <queue name="A">
            <minResources>1024000 mb, 1000 vcores</minResources>
            <maxRunningApps>15</maxRunningApps>
            <weight>2.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
        </queue>
        <queue name="B">
            <minResources>512000 mb, 500 vcores</minResources>
            <maxRunningApps>10</maxRunningApps>
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
        </queue>
    </queue>
    <queueMaxAppsDefault>3</queueMaxAppsDefault>
    <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
    <queuePlacementPolicy>
        <rule name="specified" create="true"/>
        <rule name="user" create="true"/>
    </queuePlacementPolicy>
</allocations>
{code}

Here:
* {{queueMaxAppsDefault}} is set to 3 {{maxRunningApps}} by default.
* root queue does not have any maxRunningApps limit set,
* maxRunningApps for child queues - root.A is 15 and for root.B is 10.

>From above, if users wants to submit jobs to root.B, they are (incorrectly) 
>capped to 3, not 15 because the root queue (parent) itself is capped to 3 
>because of the queueMaxAppsDefault setting.

Users' observations are thus seeing their apps stuck in ACCEPTED state.

Either the above FairScheduler XML should have been rejected by the 
ResourceManager, or, the root queue should have been capped to the maximum 
maxRunningApps setting defined for a leaf queue. 

Possible solution -> If root queue has no maxRunningApps set and 
queueMaxAppsDefault is set to a lower value than maxRunningApps for an 
individual leaf queue, then, the root queue should implicitly be capped to the 
latter, instead of queueMaxAppsDefault.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to