[
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232430#comment-15232430
]
Jason Lowe commented on YARN-4924:
----------------------------------
Thanks for the patch!
I don't think removeDeprecatedKeys is an appropriate API in the state store.
It's too generic of a name, and it's being invoked in a very specific place
during recovery that may not be appropriate for other keys that may become
deprecated. IMHO this doesn't need to be in the API at all -- the state store
implementation can clean up these keys as part of loading the application state.
If we want to get sophisticated we can increment the minor state store version
(i.e.: still compatible) and only clean up these keys when we're loading the
old version, but I'm not sure we need to go through all that for this. The key
delete pass should be very fast during recovery if it's already been done.
Other parts of the patch look good.
> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>
> Key: YARN-4924
> URL: https://issues.apache.org/jira/browse/YARN-4924
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.0.0, 2.7.2
> Reporter: Nathan Roberts
> Assignee: sandflee
> Attachments: YARN-4924.01.patch
>
>
> It's probably a small window but we observed a case where the NM crashed and
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)