john lilley created YARN-3514:
---------------------------------
Summary: Active directory usernames like domain\login cause YARN
failures
Key: YARN-3514
URL: https://issues.apache.org/jira/browse/YARN-3514
Project: Hadoop YARN
Issue Type: Bug
Components: yarn
Affects Versions: 2.2.0
Environment: CentOS6
Reporter: john lilley
Priority: Minor
We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is
Kerberos-enabled and uses an external AD domain controller for the KDC. We are
able to authenticate, browse HDFS, etc. However, YARN fails during
localization because it seems to get confused by the presence of a \ character
in the local user name.
Our AD authentication on the nodes goes through sssd and set configured to map
AD users onto the form domain\username. For example, our test user has a
Kerberos principal of [email protected] and that maps onto a CentOS user
"domain\hadoopuser". We have no problem validating that user with PAM, logging
in as that user, su-ing to that user, etc.
However, when we attempt to run a YARN application master, the localization
step fails when setting up the local cache directory for the AM. The error
that comes out of the RM logs:
2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication:
ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED,
diagnostics='Application application_1429295486450_0001 failed 1 times due to
AM Container for appattempt_1429295486450_0001_000001 exited with exitCode:
-1000 due to: Application application_1429295486450_0001 initialization failed
(exitCode=255) with output: main : command provided 0
main : user is DOMAIN\hadoopuser
main : requested yarn user is domain\hadoopuser
org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create directory:
/data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
at
org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
.Failing this attempt.. Failing the application.'
However, when we look on the node launching the AM, we see this:
[root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
[root@rpb-cdh-kerb-2 usercache]# ls -l
drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
There appears to be different treatment of the \ character in different places.
Something creates the directory as "domain\hadoopuser" but something else
later attempts to use it as "domain%5Chadoopuser". I’m not sure where or why
the URL escapement converts the \ to %5C or why this is not consistent.
I should also mention, for the sake of completeness, our auth_to_local rule is
set up to map [email protected] to domain\user:
RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)