[
https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003489#comment-14003489
]
Advertising
Junping Du commented on YARN-1338:
----------------------------------
[~jlowe], thanks again for your patch here! A few comments so far:
One question in general: beside null store and a leveled store, I saw a memory
store implemented there but no usage so far. Does it helps in some scenario or
only for test purpose?
In NodeManager#serviceInit()
{code}
if (recoveryEnabled) {
...
+ nmStore = new NMLeveldbStateStoreService();
+ } else {
+ nmStore = new NMNullStateStoreService();
}
+ nmStore.init(conf);
+ nmStore.start();
{code}
Can we abstract code since if block into a method, something like:
initializeNMStore(conf)? which can make NodeManager#serviceInit() simpler.
In yarn_server_nodemanager_recovery.proto,
{code}
+message LocalizedResourceProto {
+ optional LocalResourceProto resource = 1;
+ optional string localPath = 2;
+ optional int64 size = 3;
+}
{code}
Does size here represent for size of local resource? If so, may be duplicated
with the size within LocalResourceProto?
In ResourceLocalizationService.java
{code}
+ //Recover localized resources after an NM restart
+ public void recoverLocalizedResources(RecoveredLocalizationState state)
+ throws URISyntaxException {
+ ...
+ for (Map.Entry<ApplicationId, LocalResourceTrackerState> appEntry :
+ userResources.getAppTrackerStates().entrySet()) {
+ ApplicationId appId = appEntry.getKey();
+ ...
+ recoverTrackerResources(tracker, appEntry.getValue());
+ }
+ }
+ }
{code}
May be we should check appResourceState(appEntry.getValue)’s localizedResources
and inProgressResources is not empty before recover it as we check for
userResourceState?
In NMMemoryStateStoreService#loadLocalizationState()
{code}
...
+ if (tk.appId == null) {
+ rur.privateTrackerState = loadTrackerState(ts);
+ } else {
+ rur.appTrackerStates.put(tk.appId, loadTrackerState(ts));
+ }
...
{code}
May be even in case tk.appId !=null, we should load private resource state as
well?
Given the patch is big enough, I haven’t finished my review although walk
though it a few times. More comments may come later.
> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>
> Key: YARN-1338
> URL: https://issues.apache.org/jira/browse/YARN-1338
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: YARN-1338.patch, YARN-1338v2.patch,
> YARN-1338v3-and-YARN-1987.patch, YARN-1338v4.patch
>
>
> Today when node manager restarts we clean up all the distributed cache files
> from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers
> are using them
> * For even non work preserving restart this will be useful in the sense that
> we don't have to download them again if needed by future tasks.
--
This message was sent by Atlassian JIRA
(v6.2#6252)