Jason Lowe updated YARN-1338:

    Attachment: YARN-1338v5.patch

Thanks for the review, Junping!  Attaching a patch to address your comments 
with specific responses below.

bq. beside null store and a leveled store, I saw a memory store implemented 
there but no usage so far. Does it helps in some scenario or only for test 

It's only for use in unit tests which is why it's located under src/test/.  It 
stores state in the memory of the JVM itself, so it's not very useful for 
real-world recovery scenarios.  The state is lost when the NM crashes/exits.

bq. Can we abstract code since if block into a method, something like: 
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler.


bq. Does size here represent for size of local resource? If so, may be 
duplicated with the size within LocalResourceProto?

As I understand it they are slightly different.  The size in the 
LocalResourceProto is the size of the resource that will be downloaded, while 
the size in LocalizedResource (and also persisted in LocalizedResourceProto) is 
the size of the resource on the local disk.  These can be different if the 
resource is uncompressed/unarchived after downloading (e.g.: a .tar.gz 

bq. May be we should check appResourceState(appEntry.getValue)’s 
localizedResources and inProgressResources is not empty before recover it as we 
check for userResourceState?

Done.  I also added a LocalResourceTrackerState#isEmpty method to make the code 
a bit cleaner.

bq. May be even in case tk.appId !=null, we should load private resource state 
as well?

No, if tk.appId is not null then this is state for an app-specific resource 
tracker and not for a private resource tracker.  See the javadoc for 
NMStateStoreService#startResourceLocalization or 
NMStateStoreService#finishResourceLocalziation for some hints, and I also added 
some comments to the NMMemoryStateStoreService to clarify how the user and 
appId are used to discern public vs. private vs. app-specific trackers.

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>         Attachments: YARN-1338.patch, YARN-1338v2.patch, 
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch, YARN-1338v5.patch
> Today when node manager restarts we clean up all the distributed cache files 
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers 
> are using them
> * For even non work preserving restart this will be useful in the sense that 
> we don't have to download them again if needed by future tasks.

This message was sent by Atlassian JIRA

Reply via email to