Shane Kumpf created YARN-7879:
---------------------------------
Summary: NM user is unable to access the application filecache due
to permissions
Key: YARN-7879
URL: https://issues.apache.org/jira/browse/YARN-7879
Project: Hadoop YARN
Issue Type: Bug
Reporter: Shane Kumpf
I noticed the following log entries where localization was being retried on
several MR AM files.
{code}
2018-02-02 02:53:02,905 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Resource
/hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/11/job.jar
is missing, localizing it again
2018-02-02 02:53:42,908 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Resource
/hadoop-yarn/usercache/hadoopuser/appcache/application_1517539453610_0001/filecache/13/job.xml
is missing, localizing it again
{code}
The cluster is configured to use LCE and
{{yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user}} is set
to a user ({{hadoopuser}}) that is in the {{hadoop}} group. The user has a
umask of {{0002}}. The cluser is configured with
{{fs.permissions.umask-mode=022}}, coming from {{core-default}}. Setting the
local-user to {{nobody}}, who is not a login user or in the {{hadoop}} group,
produces the same results.
{code}
[hadoopuser@y7001 ~]$ umask
0002
[hadoopuser@y7001 ~]$ id
uid=1003(hadoopuser) gid=1004(hadoopuser) groups=1004(hadoopuser),1001(hadoop)
{code}
The cause of the log entry was tracked down a simple !file.exists call in
{{LocalResourcesTrackerImpl#isResourcePresent}}.
{code}
public boolean isResourcePresent(LocalizedResource rsrc) {
boolean ret = true;
if (rsrc.getState() == ResourceState.LOCALIZED) {
File file = new File(rsrc.getLocalPath().toUri().getRawPath().
toString());
if (!file.exists()) {
ret = false;
} else if (dirsHandler != null) {
ret = checkLocalResource(rsrc);
}
}
return ret;
}
{code}
The Resources Tracker runs as the NM user, in this case {{yarn}}. The files
being retried are in the filecache. The directories in the filecache are all
owned by the local-user's primary group and 700 perms, which makes it
unreadable by the {{yarn}} user.
{code}
[root@y7001 ~]# ls -la
/hadoop-yarn/usercache/hadoopuser/appcache/application_1517540536531_0001/filecache
total 0
drwx--x---. 6 hadoopuser hadoop 46 Feb 2 03:06 .
drwxr-s---. 4 hadoopuser hadoop 73 Feb 2 03:07 ..
drwx------. 2 hadoopuser hadoopuser 61 Feb 2 03:05 10
drwx------. 3 hadoopuser hadoopuser 21 Feb 2 03:05 11
drwx------. 2 hadoopuser hadoopuser 45 Feb 2 03:06 12
drwx------. 2 hadoopuser hadoopuser 41 Feb 2 03:06 13
{code}
I saw YARN-5287, but that appears to be related to a restrictive umask and the
usercache itself. I was unable to locate any other known issues that seemed
relevent. Is the above already known? a configuration issue?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]