[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chengbing Liu updated YARN-2997: -------------------------------- Attachment: YARN-2997.2.patch Updated patch. It handles the following issues: * If a container is completed, and the corresponding application is still running, the NM will send duplicated reports about the container, which is unnecesary. * Currently, if a heartbeat with RM and NM is lost, while the NM is sending a completed container status whose application is in finished state, it will not send again. In the updated patch, the NM will store all the completed container statuses and resend them after a lost heartbeat. * Some test cases are is fixed based on the above considerations. Please help review the patch, thanks! > NM keeps sending finished containers to RM until app is finished > ---------------------------------------------------------------- > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.6.0 > Reporter: Chengbing Liu > Attachments: YARN-2997.2.patch, YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)