[ 
https://issues.apache.org/jira/browse/YARN-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076296#comment-14076296
 ] 

Junping Du commented on YARN-1354:
----------------------------------

Thanks [~jlowe] for updating the patch! A few quick comments so far:
{code}
+        try {
+          this.context.getNMStateStore().finishApplication(appID);
+        } catch (IOException e) {
+          LOG.error("Unable to update application state in store", e);
+        }
{code}
Looks like we only log when persistent effort get failed as we did for other 
components before. In this case, what would happen if storeApplication(), 
finishApplication(), removeApplication() failed with application related 
information get inconsistent after restart?

In ContainerManagerImpl.java
{code}
+  private void recoverApplication(ContainerManagerApplicationProto p)
+      throws IOException {
+    ApplicationId appId = new ApplicationIdPBImpl(p.getId());
+    Credentials creds = new Credentials();
+    creds.readTokenStorageStream(
+        new DataInputStream(p.getCredentials().newInput()));
      ...
{code}
Do we need special warning if get failed on deserializing credential here? i.e. 
adding something like version mismatch, etc. It could happen when any changes 
happen in future on credentials object which is a writable object.

More comments will come later.

> Recover applications upon nodemanager restart
> ---------------------------------------------
>
>                 Key: YARN-1354
>                 URL: https://issues.apache.org/jira/browse/YARN-1354
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1354-v1.patch, 
> YARN-1354-v2-and-YARN-1987-and-YARN-1362.patch, YARN-1354-v3.patch, 
> YARN-1354-v4.patch, YARN-1354-v5.patch
>
>
> The set of active applications in the nodemanager context need to be 
> recovered for work-preserving nodemanager restart



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to