Siddharth Ahuja created YARN-10839:
--------------------------------------
Summary: queueMaxAppsDefault when set blindly caps the root
queue's maxRunningApps setting to this value ignoring any individually
overriden maxRunningApps setting for child queues in FairScheduler
Key: YARN-10839
URL: https://issues.apache.org/jira/browse/YARN-10839
Project: Hadoop YARN
Issue Type: Bug
Reporter: Siddharth Ahuja
[queueMaxAppsDefault|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format]
sets the default running app limit for queues (including the root queue) which
can be overridden by individual child queues through the maxRunningApps setting.
Consider a simple FairScheduler XML as follows:
{code}
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<allocations>
<queue name="root">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
<aclSubmitApps>*</aclSubmitApps>
<aclAdministerApps>*</aclAdministerApps>
<queue name="default">
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
<queue name="A">
<minResources>1024000 mb, 1000 vcores</minResources>
<maxRunningApps>15</maxRunningApps>
<weight>2.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
<queue name="B">
<minResources>512000 mb, 500 vcores</minResources>
<maxRunningApps>10</maxRunningApps>
<weight>1.0</weight>
<schedulingPolicy>drf</schedulingPolicy>
</queue>
</queue>
<queueMaxAppsDefault>3</queueMaxAppsDefault>
<defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy>
<queuePlacementPolicy>
<rule name="specified" create="true"/>
<rule name="user" create="true"/>
</queuePlacementPolicy>
</allocations>
{code}
Here:
* {{queueMaxAppsDefault}} is set to 3 {{maxRunningApps}} by default.
* root queue does not have any maxRunningApps limit set,
* maxRunningApps for child queues - root.A is 15 and for root.B is 10.
>From above, if users wants to submit jobs to root.B, they are (incorrectly)
>capped to 3, not 15 because the root queue (parent) itself is capped to 3
>because of the queueMaxAppsDefault setting.
Users' observations are thus seeing their apps stuck in ACCEPTED state.
Either the above FairScheduler XML should have been rejected by the
ResourceManager, or, the root queue should have been capped to the maximum
maxRunningApps setting defined for a leaf queue.
Possible solution -> If root queue has no maxRunningApps set and
queueMaxAppsDefault is set to a lower value than maxRunningApps for an
individual leaf queue, then, the root queue should implicitly be capped to the
latter, instead of queueMaxAppsDefault.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]