[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

Junping Du (JIRA) Tue, 20 May 2014 08:45:15 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003489#comment-14003489
 ]


Junping Du commented on YARN-1338:
----------------------------------

[~jlowe], thanks again for your patch here! A few comments so far:
One question in general: beside null store and a leveled store, I saw a memory 
store implemented there but no usage so far. Does it helps in some scenario or 
only for test purpose? 

In NodeManager#serviceInit()
{code}
    if (recoveryEnabled) {
        ...
+      nmStore = new NMLeveldbStateStoreService();
+    } else {
+      nmStore = new NMNullStateStoreService();
     }
+    nmStore.init(conf);
+    nmStore.start();
{code}
Can we abstract code since if block into a method, something like: 
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler. 
 
In yarn_server_nodemanager_recovery.proto,
{code}
+message LocalizedResourceProto {
+  optional LocalResourceProto resource = 1;
+  optional string localPath = 2;
+  optional int64 size = 3;
+}
{code}
Does size here represent for size of local resource? If so, may be duplicated 
with the size within LocalResourceProto?

In ResourceLocalizationService.java
{code}
+  //Recover localized resources after an NM restart
+  public void recoverLocalizedResources(RecoveredLocalizationState state)
+      throws URISyntaxException {
+      ...
+      for (Map.Entry<ApplicationId, LocalResourceTrackerState> appEntry :
+           userResources.getAppTrackerStates().entrySet()) {
+        ApplicationId appId = appEntry.getKey();
+        ...
+        recoverTrackerResources(tracker, appEntry.getValue());
+      }
+    }
+  }
{code}
May be we should check appResourceState(appEntry.getValue)’s localizedResources 
and inProgressResources is not empty before recover it as we check for 
userResourceState?

In NMMemoryStateStoreService#loadLocalizationState()
{code}
          ...
+        if (tk.appId == null) {
+          rur.privateTrackerState = loadTrackerState(ts);
+        } else {
+          rur.appTrackerStates.put(tk.appId, loadTrackerState(ts));
+        }
          ...
{code}
May be even in case tk.appId !=null, we should load private resource state as 
well?

Given the patch is big enough, I haven’t finished my review although walk 
though it a few times. More comments may come later.

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1338.patch, YARN-1338v2.patch, 
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files 
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers 
> are using them
> * For even non work preserving restart this will be useful in the sense that 
> we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

Reply via email to