[ 
https://issues.apache.org/jira/browse/YARN-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287561#comment-15287561
 ] 

Jian He commented on YARN-5098:
-------------------------------

bq. the app needs to be submitted without an HDFS token so the RM will acquire 
and manage it directly on the app's behalf
Btw, this is not necessary, RM will try to get the token on app's behalf if the 
token is going to expire, regardless whether the app provided the token or not 
in the first place.

I debugged this, with YARN-2704, in normal case, RM should get the new token 
and distribute it to NM if the token is going to expire. The problem here is 
that RM gets shutdown for a long time during which  the token expired. After RM 
restart, RM tries to recover the app and renew the token. Obviously the renew 
will fail because the token is expired, and so the log aggregation failed when 
the app completed.

One solution in my mind is to let RM request a new token and distribute it to 
NM, if the token renewal fails on app recovery. Right now the failure is just 
ignored and continue. 

> Yarn Application log Aggreagation fails due to NM can not get correct HDFS 
> delegation token
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-5098
>                 URL: https://issues.apache.org/jira/browse/YARN-5098
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Yesha Vora
>
> Environment : HA cluster
> Yarn application logs for long running application could not be gathered 
> because Nodemanager failed to talk to HDFS with below error.
> {code}
> 2016-05-16 18:18:28,533 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:finishLogAggregation(555)) - Application just 
> finished : application_1463170334122_0002
> 2016-05-16 18:18:28,545 WARN  ipc.Client (Client.java:run(705)) - Exception 
> encountered while connecting to the server :
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 171 for hrt_qa) can't be found in cache
>         at 
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:583)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:398)
>         at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:752)
>         at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:748)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:747)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$3100(Client.java:398)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1597)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1386)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:240)
>         at com.sun.proxy.$Proxy83.getServerDefaults(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy84.getServerDefaults(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1018)
>         at org.apache.hadoop.fs.Hdfs.getServerDefaults(Hdfs.java:156)
>         at 
> org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:550)
>         at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:687)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to