[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-07 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.5.patch

Update: use HashMap instead of LinkedHashMap.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.5.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-06 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.4.patch

Updated patch.

The testing-only method is removed. {{pendingCompletedContainers.clear()}} is 
added at the end of {{removeOrTrackCompletedContainersFromContext}}, and also 
in RESYNC section to clear the cache so that these outdated container statuses 
will not be reported to the restarted RM.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.4.patch, 
 YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-05 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated YARN-2997:
--
Assignee: Chengbing Liu

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
Assignee: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2015-01-03 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.3.patch

Updated patch:
* fix potential pendingContainersToRemove leak.
* remove unnecessary {{pendingCompletedContainers.clear();}} and add 
clearPendingCompletedContainers() for testing purpose only.
* Add comments for modified tests.
* Switch order of {{assertEquals}}. Expected value should come first to prevent 
confusions.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.3.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2014-12-31 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.2.patch

Updated patch.

It handles the following issues:
* If a container is completed, and the corresponding application is still 
running, the NM will send duplicated reports about the container, which is 
unnecesary.
* Currently, if a heartbeat with RM and NM is lost, while the NM is sending a 
completed container status whose application is in finished state, it will not 
send again. In the updated patch, the NM will store all the completed container 
statuses and resend them after a lost heartbeat.
* Some test cases are is fixed based on the above considerations.

Please help review the patch, thanks!

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.2.patch, YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2014-12-29 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Description: 
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{getRMContainer} returns null.

  was:
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{quote}getRMContainer{quote} returns null.


 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu

 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {getRMContainer} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2014-12-29 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Description: 
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{{getRMContainer}} returns null.

  was:
We have seen in RM log a lot of
{quote}
INFO 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
Null container completed...
{quote}

It is caused by NM sending completed containers repeatedly until the app is 
finished. On the RM side, the container is already released, hence 
{getRMContainer} returns null.


 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu

 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-2997) NM keeps sending finished containers to RM until app is finished

2014-12-29 Thread Chengbing Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chengbing Liu updated YARN-2997:

Attachment: YARN-2997.patch

Report to RM only once by not calling 
{{containerStatuses.add(containerStatus);}} from the second time on.

Tested on a real cluster and it works well.

 NM keeps sending finished containers to RM until app is finished
 

 Key: YARN-2997
 URL: https://issues.apache.org/jira/browse/YARN-2997
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Affects Versions: 2.6.0
Reporter: Chengbing Liu
 Attachments: YARN-2997.patch


 We have seen in RM log a lot of
 {quote}
 INFO 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: 
 Null container completed...
 {quote}
 It is caused by NM sending completed containers repeatedly until the app is 
 finished. On the RM side, the container is already released, hence 
 {{getRMContainer}} returns null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)