Rohith Sharma K S created YARN-5279:
---------------------------------------
Summary: Potential Container leak in NM in preemption flow
Key: YARN-5279
URL: https://issues.apache.org/jira/browse/YARN-5279
Project: Hadoop YARN
Issue Type: Bug
Components: nodemanager, resourcemanager
Reporter: Rohith Sharma K S
Assignee: Rohith Sharma K S
In discussion YARN-4862
[comment|https://issues.apache.org/jira/browse/YARN-4862?focusedCommentId=15341538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15341538],
it is observed that there could be a container leak in NodeManager whenever
container is preempted from RM
Basically if NM receives same containerId details in {{containersToCleanUp}}
and {{containersToBeRemovedFromNM}} in the same heartbeat then container will
never-ever removed in NMContext. Rather NM kills the container of
containersToCleanup and send back status again to RM. But RM blindly reject the
status since RMContainer is already removed and it is null.
I think whenever RMContainer is null, RMNode should be informed to send
{{containersToBeRemovedFromNM}} so that NM will remove from its context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]