Prabhu Joseph created YARN-10287:
------------------------------------
Summary: Update scheduler-conf corrupts the CS configuration when
removing queue which is referred in queue mapping
Key: YARN-10287
URL: https://issues.apache.org/jira/browse/YARN-10287
Project: Hadoop YARN
Issue Type: Sub-task
Components: capacity scheduler
Affects Versions: 3.3.0
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph
Update scheduler-conf corrupts the CS configuration when removing queue which
is referred in queue mapping. The deletion is failed with below error message
but the queue got removed and job submission failed but not removed from the
ZKConfigurationStore. On subsequent modify using scheduler-conf, the queue
appears again from ZKConfigurationStore
{code}
2020-05-22 12:38:38,252 ERROR
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception
thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : mapping contains invalid or
non-leaf queue Prod
at
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
at
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
at
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
{code}
*Repro:*
{code}
1. Setup Queue Mapping
yarn.scheduler.capacity.root.queues=default,dummy
yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy
2. Stop the root.dummy queue
<update-queue>
<queue-name>root.dummy</queue-name>
<params>
<entry>
<key>state</key>
<value>STOPPED</value>
</entry>
</params>
</update-queue>
3. Delete the root.dummy queue
curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml"
'http://<RM_IP>:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'
<sched-conf>
<update-queue>
<queue-name>root.default</queue-name>
<params>
<entry>
<key>capacity</key>
<value>100</value>
</entry>
</params>
</update-queue>
<remove-queue>root.dummy</remove-queue>
</sched-conf>
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]