[
https://issues.apache.org/jira/browse/YARN-6708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063878#comment-16063878
]
Jason Lowe commented on YARN-6708:
----------------------------------
Thanks for the report and the patch!
I'm not a fan of moving LocalCacheDirectoryManager to yarn-common. It is
_very_ specific to the peculiarities of how container localization works, and
therefore isn't really a reusable component as something in yarn-common would
imply.
Instead I think it's more appropriate for the directory to be created _before_
FSDownload tries to download it. Notice that FSDownload when it creates the
directory is not expecting to create parents, because it is explicitly calling
the mkdir form that should fail when parent directories do not exist. That
made me wonder how this is actually working in practice, and I found that the
place where the parents are getting auto-created is actually this code chunk in
ContainerLocalizer:
{code}
Callable<Path> download(Path path, LocalResource rsrc,
UserGroupInformation ugi) throws IOException {
diskValidator.checkStatus(new File(path.toUri().getRawPath()));
return new FSDownloadWrapper(lfs, ugi, conf, path, rsrc);
}
{code}
The checkStatus call is calling checkDir which in turn calls
mkdirsWithExistsCheck. That's creating the parent directories with default
permissions. I'd rather see ContainerLocalizer setup the parent directories
with proper permissions before calling FSDownload. ContainerLocalizer is
already in the appropriate package to leverage LocalCacheDirectoryManager and
seems like a more appropriate place to make this change.
> Nodemanager container crash after ext3 folder limit
> ---------------------------------------------------
>
> Key: YARN-6708
> URL: https://issues.apache.org/jira/browse/YARN-6708
> Project: Hadoop YARN
> Issue Type: Bug
> Reporter: Bibin A Chundatt
> Assignee: Bibin A Chundatt
> Priority: Critical
> Attachments: YARN-6708.001.patch, YARN-6708.002.patch,
> YARN-6708.003.patch
>
>
> Configure umask as *027* for nodemanager service user
> and {{yarn.nodemanager.local-cache.max-files-per-directory}} as {{40}}. After
> 4 *private* dir localization next directory will be *0/14*
> Local Directory cache manager
> {code}
> vm2:/opt/hadoop/release/data/nmlocal/usercache/mapred/filecache # l
> total 28
> drwx--x--- 7 mapred hadoop 4096 Jun 10 14:35 ./
> drwxr-s--- 4 mapred hadoop 4096 Jun 10 12:07 ../
> drwxr-x--- 3 mapred users 4096 Jun 10 14:36 0/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:15 10/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:22 11/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:27 12/
> drwxr-xr-x 3 mapred users 4096 Jun 10 12:31 13/
> {code}
> *drwxr-x---* 3 mapred users 4096 Jun 10 14:36 0/ is only *750*
> Nodemanager user will not be able check for localization path exists or not.
> {{LocalResourcesTrackerImpl}}
> {code}
> case REQUEST:
> if (rsrc != null && (!isResourcePresent(rsrc))) {
> LOG.info("Resource " + rsrc.getLocalPath()
> + " is missing, localizing it again");
> removeResource(req);
> rsrc = null;
> }
> if (null == rsrc) {
> rsrc = new LocalizedResource(req, dispatcher);
> localrsrc.put(req, rsrc);
> }
> break;
> {code}
> *isResourcePresent* will always return false and same resource will be
> localized to {{0}} to next unique number
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]