[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594783#comment-14594783 ] Hadoop QA commented on YARN-3779: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | pre-patch | 15m 56s | Pre-patch trunk compilation is healthy. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 46s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 28s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 35s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 0m 55s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | mapreduce tests | 5m 53s | Tests passed in hadoop-mapreduce-client-hs. | | | | 43m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12740836/YARN-3779.03.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 055cd5a | | hadoop-mapreduce-client-hs test log | https://builds.apache.org/job/PreCommit-YARN-Build/8301/artifact/patchprocess/testrun_hadoop-mapreduce-client-hs.txt | | Test Results | https://builds.apache.org/job/PreCommit-YARN-Build/8301/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf908.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-YARN-Build/8301/console | This message was automatically generated. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > YARN-3779.03.patch, log_aggr_deletion_on_refresh_error.log, > log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Met
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594551#comment-14594551 ] Varun Saxena commented on YARN-3779: Added a patch and submitted it, fixing both cases. This JIRA should move to MAPREDUCE. But not moving it because not sure if Jenkins will be able to post results of the submitted patch then > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > YARN-3779.03.patch, log_aggr_deletion_on_refresh_error.log, > log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594423#comment-14594423 ] Varun Saxena commented on YARN-3779: [~vinodkv], thats correct. So do you want me to raise another JIRA for that ? Or do it as part of this one only ? > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) >
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593855#comment-14593855 ] Vinod Kumar Vavilapalli commented on YARN-3779: --- [~varun_saxena], I agree with Zhijie here. We may be lucky for now in case of refreshJobRention call depending on how we spawn threads. To future proof ourselves, I think the right behaviour is to simply depend on loginUser in both the cases. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.had
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593738#comment-14593738 ] Varun Saxena commented on YARN-3779: Will update the patch as per suggestions tomorrow morning. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.jav
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593386#comment-14593386 ] Varun Saxena commented on YARN-3779: [~vinodkv], [~zjshen], I had checked {{refreshJobRetentionSettings}} too when this issue came. And issue didn't happen there. This issue comes in the case of refreshLogRetentionSettings as a new thread is invoked(upon cancellation of {{Timer}}) which creates a new DFS Client to connect to namenode. In case of refresh Job retention settings, we use a {{ScheduledThreadPoolExecutor}} instead hence a new thread is not spawned on refresh. We simply cancel the {{ScheduledFuture}}. And in this case, issue doesn't happen. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subj
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14592737#comment-14592737 ] Zhijie Shen commented on YARN-3779: --- Thanks for helping the issue, Vinod! It sounds the right cause of this issue. I checked refreshJobRetentionSettings, which should have the same problem because of accessing HDFS too. I'm thinking it is more clear to fix the problem inside HSAdminServer. We still need to cache the correct loginUGI. Then, inside HSAdminServer, once we verified user's permission on a certain command, we use loginUGI to complete the following process instead of the remote user. Thoughts? > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apac
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14587165#comment-14587165 ] Zhijie Shen commented on YARN-3779: --- [~varun_saxena], do you know why ugi is still the same, but kerberos authentication gets failed? > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoo
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581927#comment-14581927 ] Varun Saxena commented on YARN-3779: By updated I mean attached. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) > at org.apache
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581926#comment-14581926 ] Varun Saxena commented on YARN-3779: [~xgong], also updated complete logs, one for demonstrating the problem and other demonstrating the fix(after patch above has been applied). Moreover, this issue can be fixed if I use {{ScheduledThreadPoolExecutor}} with one thread(which is anyways recommended for use over Timer) but as that fix wasn't directly related to the issue, hence didnt submit that as a solution. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch, > log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.a
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581712#comment-14581712 ] Varun Saxena commented on YARN-3779: [~xgong], after applying the patch, debug log on refreshing log retention setting is something as under. I will update both success and error logs too, a little while later. {noformat} 2015-06-11 14:49:56,973 DEBUG org.apache.hadoop.ipc.Server: Socket Reader #1 for port 10033: responding to null from 10.19.92.82:30295 Call#-33 Retry#-1 Wrote 22 bytes. 2015-06-11 14:49:56,981 DEBUG org.apache.hadoop.ipc.Server: got #-3 2015-06-11 14:49:57,014 DEBUG org.apache.hadoop.ipc.Server: Successfully authorized userInfo { effectiveUser: "hdfs/hua...@hadoop.com" } protocol: "org.apache.hadoop.mapreduce.v2.api.HSAdminRefreshProtocol" 2015-06-11 14:49:57,014 DEBUG org.apache.hadoop.ipc.Server: got #0 2015-06-11 14:49:57,015 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 10033: org.apache.hadoop.mapreduce.v2.api.HSAdminRefreshProtocol.refreshLogRetentionSettings from 10.19.92.82:30295 Call#0 Retry#0 for RpcKind RPC_PROTOCOL_BUFFER 2015-06-11 14:49:57,016 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:hdfs/hua...@hadoop.com (auth:KERBEROS) from:org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082) 2015-06-11 14:49:57,027 INFO org.apache.hadoop.mapreduce.v2.hs.server.HSAdminServer: HS Admin: refreshLogRetentionSettings invoked by user hdfs 2015-06-11 14:49:57,027 DEBUG org.apache.hadoop.ipc.Client: stopping client from cache: org.apache.hadoop.ipc.Client@2dfaea86 2015-06-11 14:49:57,079 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:hdfs/hua...@hadoop.com (auth:KERBEROS) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:136) 2015-06-11 14:49:57,079 DEBUG org.apache.hadoop.yarn.ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC 2015-06-11 14:49:57,079 DEBUG org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol 2015-06-11 14:49:57,080 DEBUG org.apache.hadoop.ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@2dfaea86 2015-06-11 14:49:57,081 INFO org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: aggregated log deletion started. 2015-06-11 14:49:57,081 INFO org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger: USER=hdfs IP=10.19.92.82 OPERATION=refreshLogRetentionSettings TARGET=HSAdminServerRESULT=SUCCESS 2015-06-11 14:49:57,081 DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:hdfs/hua...@hadoop.com (auth:KERBEROS) from:org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:83) 2015-06-11 14:49:57,081 DEBUG org.apache.hadoop.ipc.Server: Served: refreshLogRetentionSettings queueTime= 11 procesingTime= 55 2015-06-11 14:49:57,082 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 10033: responding to org.apache.hadoop.mapreduce.v2.api.HSAdminRefreshProtocol.refreshLogRetentionSettings from 10.19.92.82:30295 Call#0 Retry#0 2015-06-11 14:49:57,083 DEBUG org.apache.hadoop.ipc.Server: IPC Server handler 0 on 10033: responding to org.apache.hadoop.mapreduce.v2.api.HSAdminRefreshProtocol.refreshLogRetentionSettings from 10.19.92.82:30295 Call#0 Retry#0 Wrote 32 bytes. 2015-06-11 14:49:57,083 DEBUG org.apache.hadoop.ipc.Client: IPC Client (889891977) connection to /10.19.92.82:65110 from hdfs/hua...@hadoop.com sending #5 2015-06-11 14:49:57,084 DEBUG org.apache.hadoop.ipc.Client: IPC Client (889891977) connection to /10.19.92.82:65110 from hdfs/hua...@hadoop.com got value #5 2015-06-11 14:49:57,084 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: getListing took 1ms 2015-06-11 14:49:57,085 INFO org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: aggregated log deletion finished. {noformat} > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as p
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581701#comment-14581701 ] Varun Saxena commented on YARN-3779: Sure. Will share DEBUG logs for that too. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) > at org.apache.hadoop.ipc.Client.call(Client.java:1381) > ... 21 more > C
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581430#comment-14581430 ] Xuan Gong commented on YARN-3779: - [~varun_saxena] Thanks for the logs. Could you apply the patch and print the ugi ? > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1462) > at org.apache.hadoop.ipc.Client.call(Client.
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579194#comment-14579194 ] Varun Saxena commented on YARN-3779: Sorry the correct sequence of error logs is as under. After first GSSException, client i.e. historyserver keeps on retrying before giving up. {noformat} 2015-06-05 22:49:24,541 INFO Timer-3 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: aggregated log deletion started. 2015-06-05 22:49:24,541 INFO IPC Server handler 0 on 10033 org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger: USER=hdfs IP=10.19.92.82 OPERATION=refreshLogRetentionSettings TARGET=HSAdminServerRESULT=SUCCESS 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.use.legacy.blockreader.local = false 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.read.shortcircuit = false 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.domain.socket.data.traffic = false 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.domain.socket.path = 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://hacluster 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.use.legacy.blockreader.local = false 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.read.shortcircuit = false 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.domain.socket.data.traffic = false 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.domain.socket.path = 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.io.retry.RetryUtils: multipleLinearRandomRetry = null 2015-06-05 22:49:24,553 DEBUG Timer-3 org.apache.hadoop.ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@28194a50 2015-06-05 22:49:24,554 DEBUG Timer-3 org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil: DataTransferProtocol using SaslPropertiesResolver, configured QOP dfs.data.transfer.protection = authentication, configured class dfs.data.transfer.saslproperties.resolver.class = class org.apache.hadoop.security.SaslPropertiesResolver 2015-06-05 22:49:24,554 DEBUG Timer-3 org.apache.hadoop.ipc.Client: The ping interval is 6 ms. 2015-06-05 22:49:24,554 DEBUG Timer-3 org.apache.hadoop.ipc.Client: Connecting to /10.19.92.88:65110 2015-06-05 22:49:24,555 DEBUG Timer-3 org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:hdfs/hua...@hadoop.com (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:749) 2015-06-05 22:49:24,557 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB info:@org.apache.hadoop.security.KerberosInfo(clientPrincipal=, serverPrincipal=dfs.namenode.kerberos.principal) 2015-06-05 22:49:24,557 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: getting serverKey: dfs.namenode.kerberos.principal conf value: hdfs/hua...@hadoop.com principal: hdfs/hua...@hadoop.com 2015-06-05 22:49:24,557 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: RPC Server's Kerberos principal name for protocol=org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB is hdfs/hua...@hadoop.com 2015-06-05 22:49:24,557 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: Creating SASL GSSAPI(KERBEROS) client to authenticate to service at huawei 2015-06-05 22:49:24,558 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: Use KERBEROS authentication for protocol ClientNamenodeProtocolPB 2015-06-05 22:49:24,559 DEBUG Timer-3 org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException as:hdfs/hua...@hadoop.com (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 2015-06-05 22:49:24,560 DEBUG Timer-3 org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:hdfs/hua...@hadoop.com (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:668) 2015-06-05 22:49:24,561 WARN Timer-3 org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.Sasl
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579154#comment-14579154 ] Varun Saxena commented on YARN-3779: [~zjshen], GSSException was thrown while calling {{evaluateChallenge}} in SaslRpcClient.java I had printed the DEBUG logs when I tested this(at the history server side). It seems correct UGI is taken but still error comes. Below are the logs when error occurs after refresh of log retention settings. {noformat} 2015-06-05 22:49:24,541 INFO IPC Server handler 0 on 10033 org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger: USER=hdfs IP=10.19.92.82 OPERATION=refreshLogRetentionSettings TARGET=HSAdminServerRESULT=SUCCESS ... 2015-06-05 22:50:04,541 INFO Timer-3 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService: aggregated log deletion started. 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.use.legacy.blockreader.local = false 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.read.shortcircuit = false 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.domain.socket.data.traffic = false 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.domain.socket.path = 2015-06-05 22:49:24,550 DEBUG Timer-3 org.apache.hadoop.hdfs.DFSClient: Sets dfs.client.block.write.replace-datanode-on-failure.replication to 0 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.HAUtil: No HA service delegation token found for logical URI hdfs://hacluster 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.use.legacy.blockreader.local = false 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.read.shortcircuit = false 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.client.domain.socket.data.traffic = false 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.hdfs.client.impl.DfsClientConf$ShortCircuitConf: dfs.domain.socket.path = 2015-06-05 22:49:24,552 DEBUG Timer-3 org.apache.hadoop.io.retry.RetryUtils: multipleLinearRandomRetry = null 2015-06-05 22:49:24,553 DEBUG Timer-3 org.apache.hadoop.ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@28194a50 2015-06-05 22:49:24,554 DEBUG Timer-3 org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil: DataTransferProtocol using SaslPropertiesResolver, configured QOP dfs.data.transfer.protection = authentication, configured class dfs.data.transfer.saslproperties.resolver.class = class org.apache.hadoop.security.SaslPropertiesResolver 2015-06-05 22:49:24,554 DEBUG Timer-3 org.apache.hadoop.ipc.Client: The ping interval is 6 ms. 2015-06-05 22:50:04,542 DEBUG Timer-3 org.apache.hadoop.ipc.Client: Connecting to host-10-19-92-88/10.19.92.88:65110 2015-06-05 22:50:04,543 DEBUG Timer-3 org.apache.hadoop.security.UserGroupInformation: PrivilegedAction as:hdfs/hua...@hadoop.com (auth:KERBEROS) from:org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:749) 2015-06-05 22:50:04,544 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: Get kerberos info proto:interface org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB info:@org.apache.hadoop.security.KerberosInfo(clientPrincipal=, serverPrincipal=dfs.namenode.kerberos.principal) 2015-06-05 22:50:04,545 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: getting serverKey: dfs.namenode.kerberos.principal conf value: hdfs/hua...@hadoop.com principal: hdfs/hua...@hadoop.com 2015-06-05 22:50:04,545 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: RPC Server's Kerberos principal name for protocol=org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB is hdfs/hua...@hadoop.com 2015-06-05 22:50:04,545 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: Creating SASL GSSAPI(KERBEROS) client to authenticate to service at huawei 2015-06-05 22:50:04,546 DEBUG Timer-3 org.apache.hadoop.security.SaslRpcClient: Use KERBEROS authentication for protocol ClientNamenodeProtocolPB 2015-06-05 22:50:04,547 DEBUG Timer-3 org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException as:hdfs/hua...@hadoop.com (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.Gs
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577709#comment-14577709 ] Zhijie Shen commented on YARN-3779: --- No, I didn't simulate the problem. Just have a quick glance at the code. Log retention refresh will reschedule the deletion task, but this is done in the rpc call by the request user. So I'm not wondering if this changes the ug of the following deletion task. Can you try to print the ugi? Then, we can see what is changed. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connecti
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577570#comment-14577570 ] Varun Saxena commented on YARN-3779: [~zjshen], thanks for looking at this. Its the same user which is used for both starting the history server and for executing the refresh command. Timer will create a new thread on refresh and from then on, problem occurs. There is no problem if I use a ScheduledThreadPoolExecutor(with 1 thread) instead as that doesn't spawn a new thread. So it seems the new thread doesn't take the correct UGI. Are you able to simulate the issue ? I hope there is no issue in the way Kerberos has been set up in my cluster. > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.do
[jira] [Commented] (YARN-3779) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster
[ https://issues.apache.org/jira/browse/YARN-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577527#comment-14577527 ] Zhijie Shen commented on YARN-3779: --- So the problem is after refreshing, the deletion task is scheduled and executed by the ugi of who executes the refreshing command, right? > Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings > in secure cluster > -- > > Key: YARN-3779 > URL: https://issues.apache.org/jira/browse/YARN-3779 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 > Environment: mrV2, secure mode >Reporter: Zhang Wei >Assignee: Varun Saxena >Priority: Critical > Attachments: YARN-3779.01.patch, YARN-3779.02.patch > > > {{GSSException}} is thrown everytime log aggregation deletion is attempted > after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure > cluster. > The problem can be reproduced by following steps: > 1. startup historyserver in secure cluster. > 2. Log deletion happens as per expectation. > 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh > the configuration value. > 4. All the subsequent attempts of log deletion fail with {{GSSException}} > Following exception can be found in historyserver's log if log deletion is > enabled. > {noformat} > 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this > deletion attempt is being aborted | AggregatedLogDeletionService.java:127 > java.io.IOException: Failed on local exception: java.io.IOException: > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)]; Host Details : local host is: "vm-31/9.91.12.31"; > destination host is: "vm-33":25000; > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) > at org.apache.hadoop.ipc.Client.call(Client.java:1414) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) > at com.sun.proxy.$Proxy9.getListing(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519) > at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy10.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753) > at > org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749) > at > org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68) > at java.util.TimerThread.mainLoop(Timer.java:555) > at java.util.TimerThread.run(Timer.java:505) > Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS > initiate failed [Caused by GSSException: No valid credentials provided > (Mechanism level: Failed to find any Kerberos tgt)] > at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641) > at > org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724) > at > org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1