[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

Junping Du (JIRA) Wed, 21 May 2014 18:12:06 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005493#comment-14005493
 ]


Junping Du commented on YARN-1338:
----------------------------------

Thanks for addressing my comments, [~jlowe]! Some additional comments:
I think currently we are using initStorage(conf) to create DB items for storing 
NMState when NM is start for the first time and the same method for locating DB 
items when NM is restart. Do we have any code to destroy DB items for NMState 
when NM is decommissioned (not expecting short-term restart)? If not, when NM 
is recommissioned - which should be recognized as a fresh node, it will still 
have stale NMState info if NM_RECOVERY_DIR and DB_NAME not changed. Do I miss 
anything here?

In LocalResourcesTrackerImpl#recoverResource()
{code}
+    incrementFileCountForLocalCacheDirectory(localDir.getParent());
{code}
Given localDir is already the parent of localPath, may be we should just 
increment locaDir rather than its parent? I didn't see we have unit test to 
check file count for resource directory after recovery. May be we should add 
some?

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1338.patch, YARN-1338v2.patch, 
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files 
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers 
> are using them
> * For even non work preserving restart this will be useful in the sense that 
> we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

Reply via email to