[
https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066725#comment-16066725
]
Jason Lowe commented on YARN-6708:
----------------------------------
bq. If Node manager service and users are in different group should be able to
check the availability of existing cache folders during download and recovery.
Right, I was confusing this with HDFS group behavior where we'd setup the
parent directory to ensure the group of child directories is always one the NM
participates in. As you say, we need it to be more permissive (at least 0711)
so the NM will always be able to stat the localized resource. 0755 should be
fine, the parent directory has already locked the filecache down to 710 anyway
so only the user and the NM's group can get in.
bq. Already existing FSDownload code handles this case.
What I meant by that comment is it's weird that we _sometimes_ create the
destDirPath leaf directory before calling FSDownload. If the parent is the
local cache root we do not and leave that for FSDownload to do, but if the
parent isn't the root then we do create it. We should either always create it
or never create it. For example, I think the code should be more like the
following. Also note the simplified logic where the whole path object is
pushed onto the stack which saves having to manipulate the path when we pop
things off the stack:
{code}
Path parent = destDirPath.getParent();
Path cacheRoot = LocalCacheDirectoryManager.getCacheDirectoryRoot(parent);
Stack<String> dirs = new Stack<String>();
while (parent != null && !parent.equals(cacheRoot)) {
dirs.push(parent);
parent = parent.getParent();
}
// Create sub directories with same permission
while (!dirs.isEmpty()) {
createDir(lfs, dirs.pop(), USER_CACHE_FOLDER_PERMS);
}
{code}
That way we always leave it to FSDownload to create the leaf directory. If
we'd rather always create it before calling FSDownload then we simply push
destDirPath on the stack before the loop.
bq. FSDownload should have been in nodemanager since its tightly coupled to the
directoy permission wrt to localization
With respect to permissions on that directory, yes, although those directory
permissions are a typical default and not necessarily specific to localization.
It's currently used by code outside of the nodemanager and therefore a bit
tricky to move without breaking things that may be using it.
bq. FSDownload handles the final cache directory permissions. Even if 0/0/85 is
created before download, in FSDownload for 85 the same could get reset rt??
FSDownload will reset an existing cache directory permissions to 0755 iff the
umask is too restrictive to allow 0755 by default. Either way the directory is
going to be 0755 after FSDownload is done with it, and that's all we need.
bq. The directory permission is 755 and in jenkins the umask is 022 to validate
directory rights for code change used reflection. Container localizer USERCACHE
permission could be package private but the above point of FSDownload will set
the rights to 0755 or we should be checking only 0/0??
We should only be checking 0/0. We already know the leaf directory will be
0755 because FSDownload does that for us. The unit test can go ahead and
verify 0/0/85 is the right permissions, but it will be the case even before the
patch. Still probably worth it in case somehow they're not correct. What's
critical is to test the permissions of 0/ and 0/0/.
As far as the umask is concerned, you should be able to override it by passing
a FileContext to the ContainerLocalizer where we've explicitly called setUMask
on it to the desired umask you need for the test. We should not modify
constants in classes to execute this test, especially via reflection. That's
very confusing and could break other tests in the same test suite that run
afterwards since the classes have been scribbled upon.
Please also address the checkstyle comments in the new patch. I'm guessing
many of them will become moot as part of the unit test rewrite.
> Nodemanager container crash after ext3 folder limit
> ---------------------------------------------------
>
> Key: YARN-6708
> URL: https://issues.apache.org/jira/browse/YARN-6708
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Priority: Critical
> Attachments: YARN-6708.001.patch, YARN-6708.002.patch,
> YARN-6708.003.patch, YARN-6708.004.patch
>
>
> Configure umask as *027* for nodemanager service user
> and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After
> 4 *private* dir localization next directory will be *0/14*
> Local Directory cache manager
> {code}
> vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l
> total 28
> drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./
> drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../
> drwxr-x--- 3 mapred users 4096 Jun 10 14:36 0/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:15 10/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:22 11/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:27 12/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:31 13/
> {code}
> *drwxr-x---* 3 mapred users 4096 Jun 10 14:36 0/ is only *750*
> Nodemanager user will not be able check for localization path exists or not.
> {{LocalResourcesTrackerImpl}}
> {code}
> case REQUEST:
> if (rsrc != null && (!isResourcePresent(rsrc))) {
> LOG.info("Resource " + rsrc.getLocalPath()
> + " is missing, localizing it again");
> removeResource(req);
> rsrc = null;
> }
> if (null == rsrc) {
> rsrc = new LocalizedResource(req, dispatcher);
> localrsrc.put(req, rsrc);
> }
> break;
> {code}
> *isResourcePresent* will always return false and same resource will be
> localized to {{0}} to next unique number
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]