[ 
https://issues.apache.org/jira/browse/YARN-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503666#comment-14503666
 ] 

Chris Nauroth commented on YARN-3514:
-------------------------------------

[~john.lil...@redpoint.net], thank you for the detailed bug report.

I believe the root cause is likely to be in container localization's URI 
parsing to construct the local download path.  The relevant code is in 
{{ContainerLocalizer#download}}:

{code}
  Callable<Path> download(Path path, LocalResource rsrc,
      UserGroupInformation ugi) throws IOException {
    DiskChecker.checkDir(new File(path.toUri().getRawPath()));
    return new FSDownload(lfs, ugi, conf, path, rsrc);
  }
{code}

We're taking a {{Path}} and converting it to URI form, but I don't think 
{{getRawPath}} is the correct call for us to access the path portion of the 
URI.  A possible fix would be to switch to {{getPath}}, which would actually 
decode back to the original form.

{code}
scala> new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getRawPath()
new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getRawPath()
res4: java.lang.String = domain%5Chadoopuser

scala> new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getPath()
new org.apache.hadoop.fs.Path("domain\\hadoopuser").toUri().getPath()
res5: java.lang.String = domain\hadoopuser
{code}


> Active directory usernames like domain\login cause YARN failures
> ----------------------------------------------------------------
>
>                 Key: YARN-3514
>                 URL: https://issues.apache.org/jira/browse/YARN-3514
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.2.0
>         Environment: CentOS6
>            Reporter: john lilley
>            Priority: Minor
>
> We have a 2.2.0 (Cloudera 5.3) cluster running on CentOS6 that is 
> Kerberos-enabled and uses an external AD domain controller for the KDC.  We 
> are able to authenticate, browse HDFS, etc.  However, YARN fails during 
> localization because it seems to get confused by the presence of a \ 
> character in the local user name.
> Our AD authentication on the nodes goes through sssd and set configured to 
> map AD users onto the form domain\username.  For example, our test user has a 
> Kerberos principal of hadoopu...@domain.com and that maps onto a CentOS user 
> "domain\hadoopuser".  We have no problem validating that user with PAM, 
> logging in as that user, su-ing to that user, etc.
> However, when we attempt to run a YARN application master, the localization 
> step fails when setting up the local cache directory for the AM.  The error 
> that comes out of the RM logs:
> 2015-04-17 12:47:09 INFO net.redpoint.yarnapp.Client[0]: monitorApplication: 
> ApplicationReport: appId=1, state=FAILED, progress=0.0, finalStatus=FAILED, 
> diagnostics='Application application_1429295486450_0001 failed 1 times due to 
> AM Container for appattempt_1429295486450_0001_000001 exited with  exitCode: 
> -1000 due to: Application application_1429295486450_0001 initialization 
> failed (exitCode=255) with output: main : command provided 0
> main : user is DOMAIN\hadoopuser
> main : requested yarn user is domain\hadoopuser
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Cannot create 
> directory: 
> /data/yarn/nm/usercache/domain%5Chadoopuser/appcache/application_1429295486450_0001/filecache/10
>                 at 
> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:105)
>                 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.download(ContainerLocalizer.java:199)
>                 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:241)
>                 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
>                 at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:347)
> .Failing this attempt.. Failing the application.'
> However, when we look on the node launching the AM, we see this:
> [root@rpb-cdh-kerb-2 ~]# cd /data/yarn/nm/usercache
> [root@rpb-cdh-kerb-2 usercache]# ls -l
> drwxr-s--- 4 DOMAIN\hadoopuser yarn 4096 Apr 17 12:10 domain\hadoopuser
> There appears to be different treatment of the \ character in different 
> places.  Something creates the directory as "domain\hadoopuser" but something 
> else later attempts to use it as "domain%5Chadoopuser".  I’m not sure where 
> or why the URL escapement converts the \ to %5C or why this is not consistent.
> I should also mention, for the sake of completeness, our auth_to_local rule 
> is set up to map u...@domain.com to domain\user:
> RULE:[1:$1@$0](^.*@DOMAIN\.COM$)s/^(.*)@DOMAIN\.COM$/domain\\$1/g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to