[
https://issues.apache.org/jira/browse/YARN-8695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16597738#comment-16597738
]
Jason Lowe commented on YARN-8695:
----------------------------------
Here's the relevant portion of the log:
{noformat}
2018-08-30 09:30:33,606 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated
containers 1
2018-08-30 09:30:33,606 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign
container Container: [ContainerId: container_e02_1535621162012_0001_01_000007,
NodeId: mgmt-hadoop-dn-0.node.dc1.pnda.local:45454, NodeHttpAddress:
mgmt-hadoop-dn-0.node.dc1.pnda.local:8042, Resource: <memory:512, vCores:1>,
Priority: 20, Token: Token { kind: ContainerToken, service: 10.0.1.158:45454 },
] for a map as either container memory less than required <memory:384,
vCores:1> or no pending map tasks - maps.isEmpty=true
2018-08-30 09:30:33,606 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling:
PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:5 AssignedReds:0
CompletedMaps:0 CompletedReds:0 ContAlloc:6 ContRel:1 HostLocal:0 RackLocal:0
2018-08-30 09:30:34,625 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for
application_1535621162012_0001: ask=0 release= 1 newContainers=0
finishedContainers=1 resourcelimit=<memory:1024, vCores:2> knownNMs=1
2018-08-30 09:30:34,626 INFO [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed
container container_e02_1535621162012_0001_01_000007
2018-08-30 09:30:34,626 ERROR [RMCommunicator Allocator]
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete
event for unknown container id container_e02_1535621162012_0001_01_000007
{noformat}
So it is similar to what I expected. What happened here is the AM received an
extra container from the RM, so it immediately discarded it. When the RM later
responded with a container released notification to confirm it, the AM had
already forgotten about the container and logged that warning. It is
completely benign.
As to why the AM received an extra container, this is an inherent race in the
AM-RM allocation protocol. Just as the AM was updating an older request for
containers the RM granted some of those older containers, resulting in more
containers being returned than desired. See YARN-110 and YARN-1902 for some
discussions on that.
> ERROR: Container complete event for unknown container id
> --------------------------------------------------------
>
> Key: YARN-8695
> URL: https://issues.apache.org/jira/browse/YARN-8695
> Project: Hadoop YARN
> Issue Type: Bug
> Components: RM
> Reporter: sivasankar
> Priority: Minor
> Attachments: Log.txt
>
>
> Have deployed a cluster with *3 data nodes*. YARN/MapReduce2/HDFS version is
> *2.7.3* on HDP. While running teragen and Gobblin the following Yarn errors
> get reported in the logs. Errors get reported only when the map tasks defined
> for the job less than or equals to the number of data nodes in the cluster.
> For *Teragen* -Dmapreduce.job.maps=4
> For *Gobblin* mr.job.max.mappers=4
> There are no errors if the map tasks(splits) are <= number of data nodes.
> 2018-08-16 06:54:05,681 ERROR [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: *Container
> complete event for unknown container id
> container_1534394833079_0012_01_000006*
> 2018-08-16 05:00:50,138 ERROR [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container
> complete event for unknown container id
> container_1534394833079_0001_01_000055 2018-08-16 05:00:50,138 INFO
> [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received
> completed container container_1534394833079_0001_01_000054 2018-08-16
> 05:00:50,138 ERROR [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container
> complete event for unknown container id
> container_1534394833079_0001_01_000054 2018-08-16 05:00:50,138 INFO
> [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received
> completed container container_1534394833079_0001_01_000053 2018-08-16
> 05:00:50,138 ERROR [RMCommunicator Allocator]
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container
> complete event for unknown container id container_1534394833079_0001_01_000053
> *Note*: There is no functionality issue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]