Li Lu updated YARN-2354:

    Attachment: YARN-2354-072514.patch

The problem was on numRequestedContainers. In the previous version, initially, 
it was set to numTotalContainers - previousAMRunningContainers.size(). Then, on 
container completion, the number of containers that need to to relaunched is 
calculated by numTotalContainers - numRequestedContainers, and normally this 
equals to previousAMRunningContainers.size(). If the containers are not reused 
(no -keep_containers_across_application_attempts), there should be no 
previousAMRunningContainers, so this problem only occurs when 
-keep_containers_across_application_attempts is set. 

I'm also fixing the testDSRestartWithPreviousRunningContainers UT associated 
with this issue. 

> DistributedShell may allocate more containers than client specified after it 
> restarts
> -------------------------------------------------------------------------------------
>                 Key: YARN-2354
>                 URL: https://issues.apache.org/jira/browse/YARN-2354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jian He
>            Assignee: Li Lu
>         Attachments: YARN-2354-072514.patch
> To reproduce, run distributed shell with -num_containers option,
> In ApplicationMaster.java, the following code has some issue.
> {code}
>   int numTotalContainersToRequest =
>         numTotalContainers - previousAMRunningContainers.size();
>     for (int i = 0; i < numTotalContainersToRequest; ++i) {
>       ContainerRequest containerAsk = setupContainerAskForRM();
>       amRMClient.addContainerRequest(containerAsk);
>     }
>     numRequestedContainers.set(numTotalContainersToRequest);
> {code}
>  numRequestedContainers doesn't account for previous AM's requested 
> containers. so numRequestedContainers should be set to numTotalContainers

This message was sent by Atlassian JIRA

Reply via email to