[
https://issues.apache.org/jira/browse/YARN-2354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Li Lu updated YARN-2354:
------------------------
Attachment: YARN-2354-072514.patch
The problem was on numRequestedContainers. In the previous version, initially,
it was set to numTotalContainers - previousAMRunningContainers.size(). Then, on
container completion, the number of containers that need to to relaunched is
calculated by numTotalContainers - numRequestedContainers, and normally this
equals to previousAMRunningContainers.size(). If the containers are not reused
(no -keep_containers_across_application_attempts), there should be no
previousAMRunningContainers, so this problem only occurs when
-keep_containers_across_application_attempts is set.
I'm also fixing the testDSRestartWithPreviousRunningContainers UT associated
with this issue.
> DistributedShell may allocate more containers than client specified after it
> restarts
> -------------------------------------------------------------------------------------
>
> Key: YARN-2354
> URL: https://issues.apache.org/jira/browse/YARN-2354
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Jian He
> Assignee: Li Lu
> Attachments: YARN-2354-072514.patch
>
>
> To reproduce, run distributed shell with -num_containers option,
> In ApplicationMaster.java, the following code has some issue.
> {code}
> int numTotalContainersToRequest =
> numTotalContainers - previousAMRunningContainers.size();
> for (int i = 0; i < numTotalContainersToRequest; ++i) {
> ContainerRequest containerAsk = setupContainerAskForRM();
> amRMClient.addContainerRequest(containerAsk);
> }
> numRequestedContainers.set(numTotalContainersToRequest);
> {code}
> numRequestedContainers doesn't account for previous AM's requested
> containers. so numRequestedContainers should be set to numTotalContainers
--
This message was sent by Atlassian JIRA
(v6.2#6252)