Chengbing Liu updated YARN-2997:
    Attachment: YARN-2997.2.patch

Updated patch.

It handles the following issues:
* If a container is completed, and the corresponding application is still 
running, the NM will send duplicated reports about the container, which is 
* Currently, if a heartbeat with RM and NM is lost, while the NM is sending a 
completed container status whose application is in finished state, it will not 
send again. In the updated patch, the NM will store all the completed container 
statuses and resend them after a lost heartbeat.
* Some test cases are is fixed based on the above considerations.

Please help review the patch, thanks!

> NM keeps sending finished containers to RM until app is finished
> ----------------------------------------------------------------
>                 Key: YARN-2997
>                 URL: https://issues.apache.org/jira/browse/YARN-2997
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Chengbing Liu
>         Attachments: YARN-2997.2.patch, YARN-2997.patch
> We have seen in RM log a lot of
> {quote}
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
> Null container completed...
> {quote}
> It is caused by NM sending completed containers repeatedly until the app is 
> finished. On the RM side, the container is already released, hence 
> {{getRMContainer}} returns null.

This message was sent by Atlassian JIRA

Reply via email to