Siddharth Ahuja created YARN-10839: -------------------------------------- Summary: queueMaxAppsDefault when set blindly caps the root queue's maxRunningApps setting to this value ignoring any individually overriden maxRunningApps setting for child queues in FairScheduler Key: YARN-10839 URL: https://issues.apache.org/jira/browse/YARN-10839 Project: Hadoop YARN Issue Type: Bug Reporter: Siddharth Ahuja
[queueMaxAppsDefault|https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Allocation_file_format] sets the default running app limit for queues (including the root queue) which can be overridden by individual child queues through the maxRunningApps setting. Consider a simple FairScheduler XML as follows: {code} <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <allocations> <queue name="root"> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> <aclSubmitApps>*</aclSubmitApps> <aclAdministerApps>*</aclAdministerApps> <queue name="default"> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> </queue> <queue name="A"> <minResources>1024000 mb, 1000 vcores</minResources> <maxRunningApps>15</maxRunningApps> <weight>2.0</weight> <schedulingPolicy>drf</schedulingPolicy> </queue> <queue name="B"> <minResources>512000 mb, 500 vcores</minResources> <maxRunningApps>10</maxRunningApps> <weight>1.0</weight> <schedulingPolicy>drf</schedulingPolicy> </queue> </queue> <queueMaxAppsDefault>3</queueMaxAppsDefault> <defaultQueueSchedulingPolicy>drf</defaultQueueSchedulingPolicy> <queuePlacementPolicy> <rule name="specified" create="true"/> <rule name="user" create="true"/> </queuePlacementPolicy> </allocations> {code} Here: * {{queueMaxAppsDefault}} is set to 3 {{maxRunningApps}} by default. * root queue does not have any maxRunningApps limit set, * maxRunningApps for child queues - root.A is 15 and for root.B is 10. >From above, if users wants to submit jobs to root.B, they are (incorrectly) >capped to 3, not 15 because the root queue (parent) itself is capped to 3 >because of the queueMaxAppsDefault setting. Users' observations are thus seeing their apps stuck in ACCEPTED state. Either the above FairScheduler XML should have been rejected by the ResourceManager, or, the root queue should have been capped to the maximum maxRunningApps setting defined for a leaf queue. Possible solution -> If root queue has no maxRunningApps set and queueMaxAppsDefault is set to a lower value than maxRunningApps for an individual leaf queue, then, the root queue should implicitly be capped to the latter, instead of queueMaxAppsDefault. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org