[
https://issues.apache.org/jira/browse/YARN-1757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated YARN-1757:
-----------------------------
Attachment: YARN-1757-v2.patch
Thanks for the review, Karthik!
bq. Nit: YarnConfiguration: We might want to add a NM_RECOVERY_PREFIX for all
recovery related configs?
Done.
bq. The default recovery-dir should probably be something more specific to
nm-recovery - /tmp/yarn-nm-recovery?
Yes, I did this in yarn-default.xml but forgot to update the default in
YarnConfiguration. Ended up removing the default and just rely on
yarn-default.xml, much like the fs uri for RM's file system state store.
bq. Nit: Should we add an NMUtils class for static helper methods like
isRecoveryEnabled()?
It's just a boolean config, very straightforward to access -- not sure an
NMUtils method adds a lot of value here.
bq. Nit: Rename variables to stateStore instead of stateStorage - that would go
with the conventions used in RM better, and is shorter
Done.
bq. Nit: AuxServices#createStorageDir: May be add a comment to say control flow
through FileNotFound is cheaper than explicitly checking if the file exists?
bq. NameNode should also use something similar, instead of directly creating
the directory? May be, another candidate to move to NMUtils?
Actually I ended up just using the straightforward mkdirs approach. It already
checks for existence and updates the permissions of the directory. Since this
is the local filesystem we're talking about, there shouldn't be any significant
performance implications here and no need for a separate method to create
directories.
bq. TestAuxServices: We should check if we have two directories created too?
It's already checking in each service init whether a directory specific to that
service is being created, but I went ahead and added a check for two aux
service recovery directories.
> Auxiliary service support for nodemanager recovery
> --------------------------------------------------
>
> Key: YARN-1757
> URL: https://issues.apache.org/jira/browse/YARN-1757
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: 2.3.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Attachments: YARN-1757-v2.patch, YARN-1757.patch, YARN-1757.patch
>
>
> There needs to be a mechanism for communicating to auxiliary services whether
> nodemanager recovery is enabled and where they should store their state.
--
This message was sent by Atlassian JIRA
(v6.2#6252)