[
https://issues.apache.org/jira/browse/YARN-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohith Sharma K S updated YARN-5279:
------------------------------------
Attachment: 0001-YARN-5279.patch
Updated the patch for informing RMNodeImple that untracked containers should be
removed from corresponding NodeManager. In a given patch, I reused the event
type {{RMNodeEventType.FINISHED_CONTAINERS_PULLED_BY_AM}} from scheduler.
> Potential Container leak in NM in preemption flow
> -------------------------------------------------
>
> Key: YARN-5279
> URL: https://issues.apache.org/jira/browse/YARN-5279
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager, resourcemanager
> Reporter: Rohith Sharma K S
> Assignee: Rohith Sharma K S
> Attachments: 0001-YARN-5279.patch
>
>
> In discussion YARN-4862
> [comment|https://issues.apache.org/jira/browse/YARN-4862?focusedCommentId=15341538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15341538],
> it is observed that there could be a container leak in NodeManager whenever
> container is preempted from RM
> Basically if NM receives same containerId details in {{containersToCleanUp}}
> and {{containersToBeRemovedFromNM}} in the same heartbeat then container
> will never-ever removed in NMContext. Rather NM kills the container of
> containersToCleanup and send back status again to RM. But RM blindly reject
> the status since RMContainer is already removed and it is null.
> I think whenever RMContainer is null, RMNode should be informed to send
> {{containersToBeRemovedFromNM}} so that NM will remove from its context.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]