[
https://issues.apache.org/jira/browse/YARN-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17053309#comment-17053309
]
Prabhu Joseph commented on YARN-9879:
-------------------------------------
Thanks [~shuzirra] for the patch. Have tested below scenarios with the patch
and it works fine except two issues.
1. Job Submission with leaf queuename and full queue path.
2. Queue Placement
3. Auto Creation of Leaf Queue.
4. RM UI
5. RMWebService Scheduler response.
6. RMAdminService RefreshQueues
7. Scheduler Configuration Mutation API - add / remove / update queue.
8. Recovery
9. RM JMX Metrics - YARN-9772
*Issue 1: RM fails to start when a dynamic parent queue "batch"
(auto-create-child-queue.enabled=true) and another leaf queue "batch" exists.*
CS Config:
root.batch -> (auto-create-child-queue.enabled=true)
root.default
root.A.batch
yarn.scheduler.capacity.queue-mappings = u:%user:batch.%user*
{code:java}
2020-03-06 00:54:59,239 ERROR
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting
ResourceManager
org.apache.hadoop.service.ServiceStateException:
org.apache.hadoop.yarn.exceptions.YarnException: Failed to initialize queues
at
org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:109)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:876)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:1288)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:339)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1576)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to
initialize queues
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:757)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initScheduler(CapacityScheduler.java:342)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.serviceInit(CapacityScheduler.java:418)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
... 7 more
Caused by: java.io.IOException: mapping contains invalid or non-leaf queue
[%user] and invalid parent queue [batch]
at
org.apache.hadoop.yarn.server.resourcemanager.placement.QueuePlacementRuleUtils.validateQueueMappingUnderParentQueue(QueuePlacementRuleUtils.java:50)
at
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.validateAndGetAutoCreatedQueueMapping(UserGroupMappingPlacementRule.java:363)
at
org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.initialize(UserGroupMappingPlacementRule.java:298)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getUserGroupMappingPlacementRule(CapacityScheduler.java:674)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.updatePlacementRules(CapacityScheduler.java:709)
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.initializeQueues(CapacityScheduler.java:750)
{code}
*Complete CS Config to repro above issue:*
{code:java}
<configuration xmlns:xi="http://www.w3.org/2001/XInclude">
<property><name>yarn.scheduler.capacity.root.batch.leaf-queue-template.capacity</name>
<value>40</value></property>
<property><name>yarn.scheduler.capacity.queue-mappings</name>
<value>u:%user:batch.%user</value></property>
<property><name>yarn.scheduler.capacity.root.batch.auto-create-child-queue.enabled</name>
<value>true</value></property>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,batch,A</value>
</property>
<property>
<name>yarn.scheduler.capacity.queue-mappings-override.enable</name>
<value>false</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.capacity</name>
<value>100</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.batch.capacity</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.A.capacity</name>
<value>20</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.A.queues</name>
<value>batch</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.A.batch.capacity</name>
<value>100</value>
</property>
</configuration>
{code}
*Issue 2:*
*RM Starts fine with below queue config but when submitting job with queuename
"A" it fails. The job submission works fine when specifying the full queue name
root.B.A. There is only one leaf queue with queuename "A" and the placement has
to find that right?*
root.A.B
root.B.A
{code:java}
yarn jar
/HADOOP/hadoop-3.3.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.3.0-SNAPSHOT-tests.jar
sleep -Dmapreduce.job.queuename=A -m 1 -mt 1
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit
application_1583486216805_0002 to YARN : Application
application_1583486216805_0002 submitted by user hive to unknown queue: A
at
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:336)
at
org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:304)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:331)
... 25 more
{code}
> Allow multiple leaf queues with the same name in CS
> ---------------------------------------------------
>
> Key: YARN-9879
> URL: https://issues.apache.org/jira/browse/YARN-9879
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Gergely Pollak
> Assignee: Gergely Pollak
> Priority: Major
> Labels: fs2cs
> Attachments: CSQueue.getQueueUsage.txt, DesignDoc_v1.pdf,
> YARN-9879.POC001.patch, YARN-9879.POC002.patch, YARN-9879.POC003.patch,
> YARN-9879.POC004.patch, YARN-9879.POC005.patch, YARN-9879.POC006.patch,
> YARN-9879.POC007.patch, YARN-9879.POC008.patch, YARN-9879.POC009.patch,
> YARN-9879.POC010.patch, YARN-9879.POC011.patch
>
>
> Currently the leaf queue's name must be unique regardless of its position in
> the queue hierarchy.
> Design doc and first proposal is being made, I'll attach it as soon as it's
> done.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]