[ 
https://issues.apache.org/jira/browse/YARN-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karam Singh updated YARN-2426:
------------------------------

    Description: 
Encountered this issue during using new YARN's RM WS for application 
submission, on single node cluster while submitting Distributed Shell 
application using RM WS(webservice).
For this we need  pass custom script and AppMaster jar along with webhdfs token.

Application was failing with ResouceManager was failing to renew token for user 
(appOwner). So RM was Rejecting application with following exception trace in 
RM log:
{code}
2014-08-19 03:12:54,733 WARN  security.DelegationTokenRenewer 
(DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(661)) - Unable to 
add the application to the delegation token renewer.
java.io.IOException: Failed to renew token: Kind: WEBHDFS delegation, Service: 
<NNHOST>:<FSPORT>, Ident: (WEBHDFS delegation token 2222 for hrt_qa)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:394)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$5(DelegationTokenRenewer.java:357)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Unexpected HTTP response: code=-1 != 200, 
op=RENEWDELEGATIONTOKEN, message=null
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:331)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:90)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:598)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:448)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:477)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:473)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.renewDelegationToken(WebHdfsFileSystem.java:1318)
        at 
org.apache.hadoop.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:73)
        at org.apache.hadoop.security.token.Token.renew(Token.java:377)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:477)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:1)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:473)
        at 
org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:392)
        ... 6 more
Caused by: java.io.IOException: The error stream is null.
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:304)
        at 
org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:329)
        ... 24 more
2014-08-19 03:12:54,735 DEBUG event.AsyncDispatcher 
(AsyncDispatcher.java:dispatch(164)) - Dispatching the event 
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppRejectedEvent.EventType:
 APP_REJECTED
{code}

>From exception trace it is clear that RM is try contact to Namenode on FSPort 
>instead of Http port and failing to renew token
 Looks like it is because WebHDFS token Namenodes IP and FSPort in delegation 
token instead of http. Causing RM to contact WebHDFS on FSPort and failing to 
renew token



  was:
Encountered this issue during using new YARN's RM WS for application 
submission, on single node cluster while submitting Distributed Shell 
application using RM WS(webservice).
For this we need  pass custom script and AppMaster jar along with webhdfs token 
to NodeManager for localization.

Distributed Shell Application was failing as Node was failing to localise 
AppMaster jar .
Following is the NM log while localizing AppMaster jar:
{code}
2014-08-18 01:53:52,434 INFO  authorize.ServiceAuthorizationManager 
(ServiceAuthorizationManager.java:authorize(114)) - Authorization successful 
for testing (auth:TOKEN) for protocol=interface 
org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
2014-08-18 01:53:52,757 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { 
webhdfs://<NAMENODEHOST>:<NAMENODEHTTPPORT>/user/<JARpPATH>, 1408352019488, 
FILE, null }, Authentication required
2014-08-18 01:53:52,758 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
webhdfs://<NAMENODEHOST>:<NAMENODEHTTPPORT>/user/<JARPATH>(-><NM_LOCAL_DIR>/usercache/<APP_USER>/appcache/application_1408351986532_0001/filecache/10/DshellAppMaster.jar)
 transitioned from DOWNLOADING to FAILED
2014-08-18 01:53:52,758 INFO  container.Container 
(ContainerImpl.java:handle(999)) - Container 
container_1408351986532_0001_01_000001 transitioned from LOCALIZING to 
LOCALIZATION_FAILED
{code}  

Which is similar to what we get is when we try access webhdfs in secure 
(kerberos) cluster without doing kinit
Whereas if we do curl -i -k -s 
'http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH>?op=listStatus&delegation=<same
 webhdfs token used in app submission structure>"
works properly
I also tried using 
http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/hadoopqa/<JAR_PATH> in 
app submission object instead of webhdfs:// uri format
Then NodeManger fail to localize as there is http filesystem scheme
{code}
14-08-18 02:03:31,343 INFO  authorize.ServiceAuthorizationManager 
(ServiceAuthorizationManager.java:authorize(114)) - Authorization successful 
for testing (auth:TOKEN) for protocol=interface org.apache.
hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
2014-08-18 02:03:31,583 INFO  localizer.ResourceLocalizationService 
(ResourceLocalizationService.java:update(1011)) - DEBUG: FAILED { 
http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH> 
1408352576841, FILE, null }, No FileSystem for scheme: http
2014-08-18 02:03:31,583 INFO  localizer.LocalizedResource 
(LocalizedResource.java:handle(203)) - Resource 
http://<NAMENODEHOST>:<NAMENODEHTTPPORT>/webhdfs/v1/user/<JAR_PATH>(-><NM_LOCAL_DIR>/usercache/<APP_USER>/appcache/application_1408352544163_0002/filecache/11/DshellAppMaster.jar)
 transitioned from DOWNLOADING to FAILED
{code}

Now do kinit without providing -C option for KRB5 cache path. So Ticket to goes 
to default KRB5 cache /tmp
Again submit same application object to Yarn WS, with webhdfs:// uri format 
paths and webhdfs token
This time NM is able download jar and custom shell script and application runs 
fine
Looks like following is happening:
webhdfs is trying look for krb ticket in NM while localising 
1. As 1st case there was to krb ticket there in default cache. Application 
failing while localising AppMaster jar
2. In second case as already kinit and krb ticket was present in /tmp (default 
KRB5 cache). AppMaster got localized successfully

    Environment: 
Hadoop Keberos (Secure) cluster with LinuxContainerExcutor is enabled
With SPNEGO on for Yarn new RM web services for application submission
So during application submission xml/json structure was pass webhdfs token

  was:
Hadoop Keberos (Secure) cluster with LinuxContainerExcutor is enabled
With SPNEGO on for Yarn new RM web services for application submission
While using kinit we are using -C (to specify cachepath).
Then while executing set export KRB5CCNAME = <path provided with -C option>

There is no kerberos ticket in default KRB5 cache path with is /tmp


        Summary: ResourceManger is not able renew WebHDFS token when 
application submitted by Yarn WebService  (was: NodeManger is not able use 
WebHDFS token properly to tallk to WebHDFS while localizing )

> ResourceManger is not able renew WebHDFS token when application submitted by 
> Yarn WebService
> --------------------------------------------------------------------------------------------
>
>                 Key: YARN-2426
>                 URL: https://issues.apache.org/jira/browse/YARN-2426
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager, resourcemanager, webapp
>    Affects Versions: 2.6.0
>         Environment: Hadoop Keberos (Secure) cluster with 
> LinuxContainerExcutor is enabled
> With SPNEGO on for Yarn new RM web services for application submission
> So during application submission xml/json structure was pass webhdfs token
>            Reporter: Karam Singh
>            Assignee: Varun Vasudev
>
> Encountered this issue during using new YARN's RM WS for application 
> submission, on single node cluster while submitting Distributed Shell 
> application using RM WS(webservice).
> For this we need  pass custom script and AppMaster jar along with webhdfs 
> token.
> Application was failing with ResouceManager was failing to renew token for 
> user (appOwner). So RM was Rejecting application with following exception 
> trace in RM log:
> {code}
> 2014-08-19 03:12:54,733 WARN  security.DelegationTokenRenewer 
> (DelegationTokenRenewer.java:handleDTRenewerAppSubmitEvent(661)) - Unable to 
> add the application to the delegation token renewer.
> java.io.IOException: Failed to renew token: Kind: WEBHDFS delegation, 
> Service: <NNHOST>:<FSPORT>, Ident: (WEBHDFS delegation token 2222 for hrt_qa)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:394)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$5(DelegationTokenRenewer.java:357)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.handleDTRenewerAppSubmitEvent(DelegationTokenRenewer.java:657)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:638)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Unexpected HTTP response: code=-1 != 200, 
> op=RENEWDELEGATIONTOKEN, message=null
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:331)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:90)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:598)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:448)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:477)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:473)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.renewDelegationToken(WebHdfsFileSystem.java:1318)
>         at 
> org.apache.hadoop.hdfs.web.TokenAspect$TokenManager.renew(TokenAspect.java:73)
>         at org.apache.hadoop.security.token.Token.renew(Token.java:377)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:477)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:1)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:473)
>         at 
> org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:392)
>         ... 6 more
> Caused by: java.io.IOException: The error stream is null.
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:304)
>         at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:329)
>         ... 24 more
> 2014-08-19 03:12:54,735 DEBUG event.AsyncDispatcher 
> (AsyncDispatcher.java:dispatch(164)) - Dispatching the event 
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppRejectedEvent.EventType:
>  APP_REJECTED
> {code}
> From exception trace it is clear that RM is try contact to Namenode on FSPort 
> instead of Http port and failing to renew token
>  Looks like it is because WebHDFS token Namenodes IP and FSPort in delegation 
> token instead of http. Causing RM to contact WebHDFS on FSPort and failing to 
> renew token



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to