[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.5.patch Update: use HashMap instead of LinkedHashMap. > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, > YARN-2997.5.patch, YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.4.patch Updated patch. The testing-only method is removed. {{pendingCompletedContainers.clear()}} is added at the end of {{removeOrTrackCompletedContainersFromContext}}, and also in RESYNC section to clear the cache so that these outdated container statuses will not be reported to the restarted RM. > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, > YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jian He updated YARN-2997: -- Assignee: Chengbing Liu > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.3.patch Updated patch: * fix potential pendingContainersToRemove leak. * remove unnecessary {{pendingCompletedContainers.clear();}} and add clearPendingCompletedContainers() for testing purpose only. * Add comments for modified tests. * Switch order of {{assertEquals}}. Expected value should come first to prevent confusions. > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu > Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.2.patch Updated patch. It handles the following issues: * If a container is completed, and the corresponding application is still running, the NM will send duplicated reports about the container, which is unnecesary. * Currently, if a heartbeat with RM and NM is lost, while the NM is sending a completed container status whose application is in finished state, it will not send again. In the updated patch, the NM will store all the completed container statuses and resend them after a lost heartbeat. * Some test cases are is fixed based on the above considerations. Please help review the patch, thanks! > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu > Attachments: YARN-2997.2.patch, YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Attachment: YARN-2997.patch Report to RM only once by not calling {{containerStatuses.add(containerStatus);}} from the second time on. Tested on a real cluster and it works well. > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu > Attachments: YARN-2997.patch > > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Description: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {{getRMContainer}} returns null. was: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {getRMContainer} returns null. > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {{getRMContainer}} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished
[ https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengbing Liu updated YARN-2997: Description: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {getRMContainer} returns null. was: We have seen in RM log a lot of {quote} INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Null container completed... {quote} It is caused by NM sending completed containers repeatedly until the app is finished. On the RM side, the container is already released, hence {quote}getRMContainer{quote} returns null. > NM keeps sending finished containers to RM until app is finished > > > Key: YARN-2997 > URL: https://issues.apache.org/jira/browse/YARN-2997 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.6.0 >Reporter: Chengbing Liu > > We have seen in RM log a lot of > {quote} > INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {quote} > It is caused by NM sending completed containers repeatedly until the app is > finished. On the RM side, the container is already released, hence > {getRMContainer} returns null. -- This message was sent by Atlassian JIRA (v6.3.4#6332)