[ 
https://issues.apache.org/jira/browse/YARN-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15345677#comment-15345677
 ] 

Joep Rottinghuis commented on YARN-5290:
----------------------------------------

It seems to be another one in a series of bugs rooted in mismatch of state 
between NMs and the RM.
Aside from playing whack-a-mole is it possible to make a more structural / 
architectural fix?



> ResourceManager can place more containers on a node than the node size allows
> -----------------------------------------------------------------------------
>
>                 Key: YARN-5290
>                 URL: https://issues.apache.org/jira/browse/YARN-5290
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jason Lowe
>
> When the ResourceManager or an ApplicationMaster kills a container the RM 
> scheduler instantly thinks the container is dead and frees those resources 
> within the scheduler bookkeeping.  However that container can still be 
> running on the node until the node heartbeats back into the RM and is told to 
> kill the container.  If the RM allocates the space associated with the 
> released container and gives it to an AM quickly enough, the AM can launch a 
> new container while the old container is still running on the NM.  That leads 
> to a scenario where we're technically running more resources on the node than 
> the node advertised to the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to