Wang, Xinglong created YARN-9980:
------------------------------------

             Summary: App hangs in accepted when moved from DEFAULT_PARTITION 
queue to an exclusive partition queue
                 Key: YARN-9980
                 URL: https://issues.apache.org/jira/browse/YARN-9980
             Project: Hadoop YARN
          Issue Type: Improvement
            Reporter: Wang, Xinglong
            Assignee: Wang, Xinglong
         Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png

App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive 
partition queue.

queue_root
queue_a   ----- default_partition
queue_b   ----- exclusive partition x, default partition is x

When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will 
give default_partition as AM_LABEL_EXPRESSION to this app, then it gets an am1 
and runs. And if later, the app is moved to queue_b, and the am1 is 
preempted/killed/failed, it will schedule another am2 if am retry number 
allows. But this time the resource request for this am2 is with 
AM_LABEL_EXPRESSION = default_partition, the issue is queue_b don't have any 
resource with default_partition, then this app will be in accepted state 
forever in RM UI.

My understanding is that, since the app was submitted with no 
AM_LABEL_EXPRESSION, And in the code base, we allow in our code for such kind 
of app to run with current queue's default partition.
Here for the move queue scenario, we should also let the app to run 
successfully. That means am2 should get queue_b's default partition x resource 
to run instead of pending forever.

In our production, we have a landing queue with default_partition, we have some 
kind of route mechanism to route apps in this queue to other queues including 
queues with exclusive partition.

 !Screen Shot 2019-11-14 at 5.11.39 PM.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org

Reply via email to