[
https://issues.apache.org/jira/browse/YARN-8232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459656#comment-16459656
]
Hu Ziqian edited comment on YARN-8232 at 5/1/18 12:29 PM:
----------------------------------------------------------
hi [~leftnoteasy], I've added a new patch with an unit test.
was (Author: ziqian hu):
hi [~leftnoteasy], I've add a new patch with an unit test.
> RMContainer lost queue name when RM HA happens
> ----------------------------------------------
>
> Key: YARN-8232
> URL: https://issues.apache.org/jira/browse/YARN-8232
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Affects Versions: 2.8.3
> Reporter: Hu Ziqian
> Assignee: Hu Ziqian
> Priority: Major
> Attachments: YARN-8232-branch-2.8.3.001.patch, YARN-8232.001.patch,
> YARN-8232.002.patch
>
>
> RMContainer has a member variable queuename to store which queue the
> container belongs to. When RM HA happens and RMContainers are recovered by
> scheduler based on NM reports, the queue name isn't recovered and always be
> null.
> This situation causes some problems. Here is a case in preemption. Preemption
> uses container's queue name to deduct preemptable resources when we use more
> than one preempt selector, (for example, enable intra-queue preemption,) .
> The detail is in
> {code:java}
> CapacitySchedulerPreemptionUtils.deductPreemptableResourcesBasedSelectedCandidates(){code}
> If the contain's queue name is null, this function will throw a
> YarnRuntimeException because it tries to get the container's
> TempQueuePerPartition and the preemption fails.
> Our patch solved this problem by setting container queue name when recover
> containers. The patch is based on branch-2.8.3.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]