[ 
https://issues.apache.org/jira/browse/YARN-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang, Xinglong updated YARN-9980:
---------------------------------
    Attachment: YARN-9980.001.patch

> App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive 
> partition queue
> ---------------------------------------------------------------------------------------------
>
>                 Key: YARN-9980
>                 URL: https://issues.apache.org/jira/browse/YARN-9980
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Wang, Xinglong
>            Assignee: Wang, Xinglong
>            Priority: Minor
>         Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png, 
> YARN-9980.001.patch
>
>
> App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive 
> partition queue.
> queue_root
> queue_a   ----- default_partition
> queue_b   ----- exclusive partition x, default partition is x
> When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will 
> give default_partition as AM_LABEL_EXPRESSION to this app, then it gets an 
> am1 and runs. And if later, the app is moved to queue_b, and the am1 is 
> preempted/killed/failed, it will schedule another am2 if am retry number 
> allows. But this time the resource request for this am2 is with 
> AM_LABEL_EXPRESSION = default_partition, the issue is queue_b don't have any 
> resource with default_partition, then this app will be in accepted state 
> forever in RM UI.
> My understanding is that, since the app was submitted with no 
> AM_LABEL_EXPRESSION, And in the code base, we allow in our code for such kind 
> of app to run with current queue's default partition.
> Here for the move queue scenario, we should also let the app to run 
> successfully. That means am2 should get queue_b's default partition x 
> resource to run instead of pending forever.
> In our production, we have a landing queue with default_partition, we have 
> some kind of route mechanism to route apps in this queue to other queues 
> including queues with exclusive partition.
>  !Screen Shot 2019-11-14 at 5.11.39 PM.png! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to