[
https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592737#comment-14592737
]
Zhijie Shen commented on YARN-3779:
-----------------------------------
Thanks for helping the issue, Vinod! It sounds the right cause of this issue. I
checked refreshJobRetentionSettings, which should have the same problem because
of accessing HDFS too.
I'm thinking it is more clear to fix the problem inside HSAdminServer. We still
need to cache the correct loginUGI. Then, inside HSAdminServer, once we
verified user's permission on a certain command, we use loginUGI to complete
the following process instead of the remote user. Thoughts?
> Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings
> in secure cluster
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-3779
> URL: https://issues.apache.org/jira/browse/YARN-3779
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.7.0
> Environment: mrV2, secure mode
> Reporter: Zhang Wei
> Assignee: Varun Saxena
> Priority: Critical
> Attachments: YARN-3779.01.patch, YARN-3779.02.patch,
> log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log
>
>
> {{GSSException}} is thrown everytime log aggregation deletion is attempted
> after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure
> cluster.
> The problem can be reproduced by following steps:
> 1. startup historyserver in secure cluster.
> 2. Log deletion happens as per expectation.
> 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh
> the configuration value.
> 4. All the subsequent attempts of log deletion fail with {{GSSException}}
> Following exception can be found in historyserver's log if log deletion is
> enabled.
> {noformat}
> 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this
> deletion attempt is being aborted | AggregatedLogDeletionService.java:127
> java.io.IOException: Failed on local exception: java.io.IOException:
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31";
> destination host is: "vm-33":25000;
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
> at org.apache.hadoop.ipc.Client.call(Client.java:1414)
> at org.apache.hadoop.ipc.Client.call(Client.java:1363)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy9.getListing(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
> at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy10.getListing(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
> at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
> at
> org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
> at java.util.TimerThread.mainLoop(Timer.java:555)
> at java.util.TimerThread.run(Timer.java:505)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS
> initiate failed [Caused by GSSException: No valid credentials provided
> (Mechanism level: Failed to find any Kerberos tgt)]
> at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
> at
> org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
> at
> org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462)
> at org.apache.hadoop.ipc.Client.call(Client.java:1381)
> ... 21 more
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to find
> any Kerberos tgt)]
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
> at
> org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:411)
> at
> org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:550)
> at
> org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:716)
> at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:712)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
> at
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:711)
> ... 24 more
> Caused by: GSSException: No valid credentials provided (Mechanism level:
> Failed to find any Kerberos tgt)
> at
> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
> at
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
> at
> sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
> at
> sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
> at
> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
> at
> sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
> ... 33 more
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)