[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15490475#comment-15490475
 ] 

Jason Lowe commented on YARN-5547:
----------------------------------

To be clear, the skipping containers during recovery is _never_ the right thing 
to do, so that's not really a valid option.

As I understand it, the current proposal is this:
* Add a new table to the state store that will contain a list of container keys 
that need compatibility processing when performing rolling downgrades
* Each key in that table will have a descriptor associated with it that will 
indicate how the recovery of the corresponding container needs to be handled.  
Options include:
** Killing the corresponding container
** Removing the key and recovering the container normally
* Any unrecognized container key that is not described in the table will cause 
the corresponding container to be killed during recovery.

We don't have to implement the entire thing in this JIRA.  We could do the 
unrecognized=kill implementation first then add the table of keys feature in a 
subsequent JIRA.

> NMLeveldbStateStore should be more tolerant of unknown keys
> -----------------------------------------------------------
>
>                 Key: YARN-5547
>                 URL: https://issues.apache.org/jira/browse/YARN-5547
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Ajith S
>         Attachments: YARN-5547.01.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to