[
https://issues.apache.org/jira/browse/YARN-9980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wang, Xinglong updated YARN-9980:
---------------------------------
Attachment: YARN-9980.001.patch
> App hangs in accepted when moved from DEFAULT_PARTITION queue to an exclusive
> partition queue
> ---------------------------------------------------------------------------------------------
>
> Key: YARN-9980
> URL: https://issues.apache.org/jira/browse/YARN-9980
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Wang, Xinglong
> Assignee: Wang, Xinglong
> Priority: Minor
> Attachments: Screen Shot 2019-11-14 at 5.11.39 PM.png,
> YARN-9980.001.patch
>
>
> App hangs in accpeted when moved from DEFAULT_PARTITION queue to an exclusive
> partition queue.
> queue_root
> queue_a ----- default_partition
> queue_b ----- exclusive partition x, default partition is x
> When an app is submitted to queue_a, with AM_LABEL_EXPRESSION unset, RM will
> give default_partition as AM_LABEL_EXPRESSION to this app, then it gets an
> am1 and runs. And if later, the app is moved to queue_b, and the am1 is
> preempted/killed/failed, it will schedule another am2 if am retry number
> allows. But this time the resource request for this am2 is with
> AM_LABEL_EXPRESSION = default_partition, the issue is queue_b don't have any
> resource with default_partition, then this app will be in accepted state
> forever in RM UI.
> My understanding is that, since the app was submitted with no
> AM_LABEL_EXPRESSION, And in the code base, we allow in our code for such kind
> of app to run with current queue's default partition.
> Here for the move queue scenario, we should also let the app to run
> successfully. That means am2 should get queue_b's default partition x
> resource to run instead of pending forever.
> In our production, we have a landing queue with default_partition, we have
> some kind of route mechanism to route apps in this queue to other queues
> including queues with exclusive partition.
> !Screen Shot 2019-11-14 at 5.11.39 PM.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]