[ 
https://issues.apache.org/jira/browse/YARN-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801788#comment-15801788
 ] 

Jason Lowe commented on YARN-5547:
----------------------------------

bq. for deleting the unknown keys, would it be ok to remove unknown keys in 
NMLeveldbStateStoreService.loadContainerState(ContainerId, LeveldbIterator, 
String) .?

That should be OK as long as we record the container as killed before we remove 
the unknown keys.  When we eventually add the ability to ignore unknown keys 
without killing the container then it can be problematic.  For example:
# NM is on version V and is using key K, which is new in version V, that is not 
deemed critical to the recovery of a running container.
# NM is downgraded to version V-1
# On startup, NM with version V-1 deletes the unknown key K for the container 
but keeps it running because it was deemed safe to ignore in the (yet to be 
added) state store key descriptor table
# With the container still running, NM is upgraded to version V again 
# Now the container has lost key K yet was started on NM version V and 
continues to run on NM version V.

If we skip the unknown keys that are deemed "safe to ignore" then we can leak 
per the concern above if the container completes on version V-1.  One way to 
fix that case is to have the NM always try to delete the list of unknown keys 
in the (yet to be added) safe-to-ignore key descriptor table when the container 
completes.  Should be fine unless that table gets to be particularly large.  
But we don't have to implement that now, only when we add the ability to ignore 
unknown keys without killing a container.  For the purposes of this JIRA, we 
will always be killing containers that have unknown keys so it's simpler.


> NMLeveldbStateStore should be more tolerant of unknown keys
> -----------------------------------------------------------
>
>                 Key: YARN-5547
>                 URL: https://issues.apache.org/jira/browse/YARN-5547
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>    Affects Versions: 2.6.0
>            Reporter: Jason Lowe
>            Assignee: Ajith S
>         Attachments: YARN-5547.01.patch, YARN-5547.02.patch, 
> YARN-5547.03.patch
>
>
> Whenever new keys are added to the NM state store it will break rolling 
> downgrades because the code will throw if it encounters an unrecognized key.  
> If instead it skipped unrecognized keys it could be simpler to continue 
> supporting rolling downgrades.  We need to define the semantics of 
> unrecognized keys when containers and apps are cleaned up, e.g.: we may want 
> to delete all keys underneath an app or container directory when it is being 
> removed from the state store to prevent leaking unrecognized keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to