[
https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346297#comment-15346297
]
Jun Gong commented on YARN-5290:
--------------------------------
Thanks [~jlowe] for reporting the issue!
We came across the issue some time ago. I tried the thought in YARN-4148: RM
does not release app's resource until containers actually finish and NM
releases the resource.
Another thought(copied from YARN-4148): NM records its total resource and
available resource. When launching a container, NM checks available resource
and waits until there is enough resource for container. But there might be a
time gap from AM's perspective, AM thinks it has launched container, however
container might be waiting for its resource.
> ResourceManager can place more containers on a node than the node size allows
> -----------------------------------------------------------------------------
>
> Key: YARN-5290
> URL: https://issues.apache.org/jira/browse/YARN-5290
> Project: Hadoop YARN
> Issue Type: Bug
> Components: resourcemanager
> Reporter: Jason Lowe
>
> When the ResourceManager or an ApplicationMaster kills a container the RM
> scheduler instantly thinks the container is dead and frees those resources
> within the scheduler bookkeeping. However that container can still be
> running on the node until the node heartbeats back into the RM and is told to
> kill the container. If the RM allocates the space associated with the
> released container and gives it to an AM quickly enough, the AM can launch a
> new container while the old container is still running on the NM. That leads
> to a scenario where we're technically running more resources on the node than
> the node advertised to the RM.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]