[ 
https://issues.apache.org/jira/browse/YARN-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15232430#comment-15232430
 ] 

Jason Lowe commented on YARN-4924:
----------------------------------

Thanks for the patch!

I don't think removeDeprecatedKeys is an appropriate API in the state store.  
It's too generic of a name, and it's being invoked in a very specific place 
during recovery that may not be appropriate for other keys that may become 
deprecated.  IMHO this doesn't need to be in the API at all -- the state store 
implementation can clean up these keys as part of loading the application state.

If we want to get sophisticated we can increment the minor state store version 
(i.e.: still compatible) and only clean up these keys when we're loading the 
old version, but I'm not sure we need to go through all that for this.  The key 
delete pass should be very fast during recovery if it's already been done.

Other parts of the patch look good.


> NM recovery race can lead to container not cleaned up
> -----------------------------------------------------
>
>                 Key: YARN-4924
>                 URL: https://issues.apache.org/jira/browse/YARN-4924
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.0.0, 2.7.2
>            Reporter: Nathan Roberts
>            Assignee: sandflee
>         Attachments: YARN-4924.01.patch
>
>
> It's probably a small window but we observed a case where the NM crashed and 
> then a container was not properly cleaned up during recovery.
> I will add details in first comment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to