[jira] [Commented] (YARN-8695) ERROR: Container complete event for unknown container id

Jason Lowe (JIRA) Thu, 30 Aug 2018 11:10:10 -0700


    [ 
https://issues.apache.org/jira/browse/YARN-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597738#comment-16597738
 ]


Jason Lowe commented on YARN-8695:
----------------------------------

Here's the relevant portion of the log:
{noformat}
2018-08-30 09:30:33,606 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
containers 1
2018-08-30 09:30:33,606 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
container Container: [ContainerId: container_e02_1535621162012_0001_01_000007, 
NodeId: mgmt-hadoop-dn-0.node.dc1.pnda.local:45454, NodeHttpAddress: 
mgmt-hadoop-dn-0.node.dc1.pnda.local:8042, Resource: <memory:512, vCores:1>, 
Priority: 20, Token: Token { kind: ContainerToken, service: 10.0.1.158:45454 }, 
] for a map as either  container memory less than required <memory:384, 
vCores:1> or no pending map tasks - maps.isEmpty=true
2018-08-30 09:30:33,606 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0 
CompletedMaps:0 CompletedReds:0 ContAlloc:6 ContRel:1 HostLocal:0 RackLocal:0
2018-08-30 09:30:34,625 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for 
application_1535621162012_0001: ask=0 release= 1 newContainers=0 
finishedContainers=1 resourcelimit=<memory:1024, vCores:2> knownNMs=1
2018-08-30 09:30:34,626 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed 
container container_e02_1535621162012_0001_01_000007
2018-08-30 09:30:34,626 ERROR [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete 
event for unknown container id container_e02_1535621162012_0001_01_000007
{noformat}

So it is similar to what I expected.  What happened here is the AM received an 
extra container from the RM, so it immediately discarded it.  When the RM later 
responded with a container released notification to confirm it, the AM had 
already forgotten about the container and logged that warning.  It is 
completely benign.

As to why the AM received an extra container, this is an inherent race in the 
AM-RM allocation protocol.  Just as the AM was updating an older request for 
containers the RM granted some of those older containers, resulting in more 
containers being returned than desired.  See YARN-110 and YARN-1902 for some 
discussions on that.

> ERROR: Container complete event for unknown container id
> --------------------------------------------------------
>
>                 Key: YARN-8695
>                 URL: https://issues.apache.org/jira/browse/YARN-8695
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: RM
>            Reporter: sivasankar
>            Priority: Minor
>         Attachments: Log.txt
>
>
> Have deployed a cluster with *3 data nodes*. YARN/MapReduce2/HDFS version is 
> *2.7.3* on HDP. While running teragen and Gobblin the following Yarn errors 
> get reported in the logs. Errors get reported only when the map tasks defined 
> for the job less than or equals to the number of data nodes in the cluster.
> For *Teragen* -Dmapreduce.job.maps=4
> For *Gobblin* mr.job.max.mappers=4
> There are no errors if the map tasks(splits) are <= number of data nodes.
>  2018-08-16 06:54:05,681 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: *Container 
> complete event for unknown container id 
> container_1534394833079_0012_01_000006*
> 2018-08-16 05:00:50,138 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container 
> complete event for unknown container id 
> container_1534394833079_0001_01_000055 2018-08-16 05:00:50,138 INFO 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1534394833079_0001_01_000054 2018-08-16 
> 05:00:50,138 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container 
> complete event for unknown container id 
> container_1534394833079_0001_01_000054 2018-08-16 05:00:50,138 INFO 
> [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_1534394833079_0001_01_000053 2018-08-16 
> 05:00:50,138 ERROR [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container 
> complete event for unknown container id container_1534394833079_0001_01_000053
> *Note*: There is no functionality issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (YARN-8695) ERROR: Container complete event for unknown container id

Reply via email to