[ https://issues.apache.org/jira/browse/YARN-2612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jun Gong resolved YARN-2612. ---------------------------- Resolution: Duplicate > Some completed containers are not reported to NM > ------------------------------------------------ > > Key: YARN-2612 > URL: https://issues.apache.org/jira/browse/YARN-2612 > Project: Hadoop YARN > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.6.0 > Reporter: Jun Gong > Fix For: 2.6.0 > > > We are testing RM work preserving restart and found the following logs when > we ran a simple MapReduce task "PI". Some completed containers which already > pulled by AM never reported back to NM, so NM continuously report the > completed containers while AM had finished. > {code} > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:42,228 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:43,230 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > 2014-09-26 17:00:44,233 INFO > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: > Null container completed... > {code} > In YARN-1372, NM will report completed containers to RM until it gets ACK > from RM. If AM does not call allocate, which means that AM does not ack RM, > RM will not ack NM. We([~chenchun]) have observed these two cases when > running Mapreduce task 'pi': > 1) RM sends completed containers to AM. After receiving it, AM thinks it has > done the work and does not need resource, so it does not call allocate. > 2) When AM finishes, it could not ack to RM because AM itself has not > finished yet. > We think when RMAppAttempt call BaseFinalTransition, it means AppAttempt > finishes, then RM could send this AppAttempt's completed containers to NM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)