Prabhu Joseph created YARN-10287:
------------------------------------

             Summary: Update scheduler-conf corrupts the CS configuration when 
removing queue which is referred in queue mapping
                 Key: YARN-10287
                 URL: https://issues.apache.org/jira/browse/YARN-10287
             Project: Hadoop YARN
          Issue Type: Sub-task
          Components: capacity scheduler
    Affects Versions: 3.3.0
            Reporter: Prabhu Joseph
            Assignee: Prabhu Joseph


Update scheduler-conf corrupts the CS configuration when removing queue which 
is referred in queue mapping.  The deletion is failed with below error message 
but the queue got removed and job submission failed but not removed from the 
ZKConfigurationStore. On subsequent modify using scheduler-conf, the queue 
appears again from ZKConfigurationStore

{code}
2020-05-22 12:38:38,252 ERROR 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices: Exception 
thrown when modifying configuration.
java.io.IOException: Failed to re-init queues : mapping contains invalid or 
non-leaf queue Prod
        at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:478)
        at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:430)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2389)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices$13.run(RMWebServices.java:2377)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
        at 
org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices.updateSchedulerConfiguration(RMWebServices.java:2377)
{code}

*Repro:*

{code}
1. Setup Queue Mapping

yarn.scheduler.capacity.root.queues=default,dummy
yarn.scheduler.capacity.queue-mappings=g:hadoop:dummy

2. Stop the root.dummy queue

<update-queue>
       <queue-name>root.dummy</queue-name>
       <params>
         <entry>
           <key>state</key>
           <value>STOPPED</value>
         </entry>
       </params>
     </update-queue>
         
         
3. Delete the root.dummy queue

curl --negotiate -u : -X PUT -d @abc.xml -H "Content-type: application/xml" 
'http://<RM_IP>:8088/ws/v1/cluster/scheduler-conf?user.name=yarn'

<sched-conf>
      <update-queue>
          <queue-name>root.default</queue-name>
          <params>
            <entry>
              <key>capacity</key>
              <value>100</value>
            </entry>
          </params>
        </update-queue>

        <remove-queue>root.dummy</remove-queue>
      </sched-conf>  
{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to