[
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230248#comment-15230248
]
Jason Lowe commented on YARN-4924:
----------------------------------
Yeah, now that the NM registers with the list of apps it thinks are active and
the RM tells it to finish any apps that shouldn't be active we should be
covered. We'll need to leave in some recovery code for finished apps so we can
clean up any lingering finished app events from the state store, but we can
remove the code to store the events.
> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.0.0, 2.7.2
> Reporter: Nathan Roberts
>
> It's probably a small window but we observed a case where the NM crashed and
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)