[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718307#comment-16718307 ] Wei-Chiu Chuang commented on HADOOP-15593: -- FYI -- we hit this issue again after patching HADOOP-15593 (on Hadoop 3.0.x). Somehow the tgt was destroyed upon renewal. We were on JDK8u141, and the issue went away after updating to JDK8u181. So as a data point, stay away from 8u141. Seems to be a JDK bug. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16564259#comment-16564259 ] Wangda Tan commented on HADOOP-15593: - cherry-picked to branch-3.1.1 > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16562361#comment-16562361 ] Wangda Tan commented on HADOOP-15593: - Updated fixed version to 3.1.2 given this don't exist in branch-3.1.1. But I think it is critical to get it backported to branch-3.1.1, I'm going to do this in a couple of hours, please let me know if you think different. cc: [~gabor.bota], [~eyang], [~xiaochen] > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Fix For: 3.2.0, 3.1.2 > > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559045#comment-16559045 ] Hudson commented on HADOOP-15593: - SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #14650 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14650/]) HADOOP-15593. Fixed NPE in UGI spawnAutoRenewalThreadForUserCreds. (eyang: rev 77721f39e26b630352a1f4087524a3fbd21ff06e) * (edit) hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java * (edit) hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559019#comment-16559019 ] Eric Yang commented on HADOOP-15593: [~xiaochen] Thank you for the review. +1 patch 5 looks good to me, will commit shortly. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16558998#comment-16558998 ] Xiao Chen commented on HADOOP-15593: Hi [~eyang], do you have any comments on patch 5, or are you interested in committing this if not? > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556850#comment-16556850 ] Xiao Chen commented on HADOOP-15593: +1 on patch 5 from me. Thanks all. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16556408#comment-16556408 ] Eric Yang commented on HADOOP-15593: [~xiaochen] Any concern with patch 005? > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch, HADOOP-15593.005.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1651#comment-1651 ] Gabor Bota commented on HADOOP-15593: - * I've added runRenewalLoop because of the unit testing. I just pass false if I don't want the loop to run. I don't use this flag for any other purpose. * I'll add a new patch with [~xiaochen]'s solution: {code:java} if (tgt.isDestroyed()) { //log and return; } try{ tgtEndTime = tgt.getEndTime().getTime(); } catch (NullPointerException npe) { // log and return; } {code} > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555783#comment-16555783 ] genericqa commented on HADOOP-15593: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 22s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 20s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 28m 36s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 51s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 33s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 49s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 10s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 58s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}126m 8s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HADOOP-15593 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12933039/HADOOP-15593.005.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1fc811cf39c1 3.13.0-144-generic #193-Ubuntu SMP Thu Mar 15 17:03:53 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 5be9f4a | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14942/testReport/ | | Max. process+thread count | 1464 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14942/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL:
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555154#comment-16555154 ] Xiao Chen commented on HADOOP-15593: Thanks for the clarification [~eyang]. According to the JDK issue, I think we should treat null endTime the same as tgt destroyed here. {code} - * @return the expiration time for this ticket's validity period. + * @return the expiration time for this ticket's validity period, + * or {@code null} if destroyed. */ public final java.util.Date getEndTime() { -return (Date) endTime.clone(); +return (endTime == null) ? null : (Date) endTime.clone(); } {code} > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555140#comment-16555140 ] Eric Yang commented on HADOOP-15593: [~xiaochen] If tgt.getEndTime().getTime() captured the NullPointerException, tgtEndTime = now, and nextRefresh is always smaller than now, renewal thread will not try to renew once more, and thread stops earlier than expected. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16555116#comment-16555116 ] Xiao Chen commented on HADOOP-15593: If we did what's proposed in [my previous comment|https://issues.apache.org/jira/browse/HADOOP-15593?focusedCommentId=16554585=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16554585], the case when tgt is destroyed will be handled by the {{return}} statement. In the rare race that the tgt gets destroyed after the code has gone after those lines, the rest of the logic including the nextRefresh part you pointed out does not depend on tgt anymore (it only depends on the local var {{tgtEndTime}}). We should be fine just it retry one more time and return on the tgt null check next time it enters the while loop. So we don't need to change {{now > nextRefresh}} part of code. Did I miss anything? > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554903#comment-16554903 ] Eric Yang commented on HADOOP-15593: [~xiaochen] Good catch on the logic. I think your suggestion to check isDestroy to stop the renew thread make sense. Patch 004 is incomplete. If a tgt is destroyed, it can not be renewed. This will stop the renew thread: {code} if (now > nextRefresh) { LOG.error("TGT is expired. Aborting renew thread for {}.", getUserName()); return; } {code} This part of code needs to be removed for the renewal thread to retry. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554680#comment-16554680 ] Xiao Chen commented on HADOOP-15593: It makes sense for the best-effort retries in general, so the renewal thread doesn't abort prematurely due to intermittent failures. But could you clarify a little more? If a tgt is destroyed, how can it be renewed? Looks to me KDC outage would result in relogin failure and possibly getTGT() being null without an exception, after which the current code just does a null check on tgt and return without retries. IMO we should be consistent with it and just return. I don't feel strongly that having the last try as patch 4 is a big problem, but it's not clear under which scenario this could possibly succeed. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554659#comment-16554659 ] Eric Yang commented on HADOOP-15593: [~xiaochen] Renewal thread supposed to run until max_renewable_life has been reached. If ticket end time is expired or unknown, but max_renewable_life is not expired, we would want the renewal loop to run. Maybe there is KDC outages to cause endTime = null, but retry should be attempted. Hadoop doesn't seem to have logic to check max_renewable_life, therefore, it may keep trying to prevent service outage for now. We probably want to open another ticket for enhancement to respect max_renewable_life. It is safer to retry than having cluster goes down because Kerberos tickets can not be renewed. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554585#comment-16554585 ] Xiao Chen commented on HADOOP-15593: Thanks for revving [~gabor.bota] and [~eyang] for the prompt review. Patch looks pretty good. I have 2 comments on the latest patch - In case of either tgt is destroyed (either isDestroyed()==true or getEndTime NPEs), is there any value in retrying? How about we do something like: {code} if (tgt.isDestroyed()) { //log and return; } try{ tgtEndTime = tgt.getEndTime().getTime(); } catch (NullPointerException npe) { // log and return; } {code} {{runRenewalLoop}} var won't be necessary if we do this. Thoughts? - The {{renewalFailures}} and {{renewalFailuresTotal}} metrics need to call {{value()}} in order to be logged correctly. This comes from existing code, but good to fix since we're touching it. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554453#comment-16554453 ] Eric Yang commented on HADOOP-15593: [~gabor.bota] Thank you for the patch. In patch 003, you have a tgt null check before entering the while loop. NPE will be thrown only for endTime = new Date(null). Unless tgt isDestroyed or uninitialized, otherwise, tgt always have an end time. This is the reason that proposal can work. Patch 004 proposal can work equally well. Therefore, +1 from my point of view. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch, HADOOP-15593.004.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554186#comment-16554186 ] genericqa commented on HADOOP-15593: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 17s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 39s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 14s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 49s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 34s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 13s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 20s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 40s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}123m 31s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HADOOP-15593 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932870/HADOOP-15593.004.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7703aa7e4d45 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 8461278 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14935/testReport/ | | Max. process+thread count | 1483 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14935/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL:
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554020#comment-16554020 ] Gabor Bota commented on HADOOP-15593: - Hi [~eyang], The first approach you've proposed won't work, because tgt.getEndTime() will throw an NPE. My approach will be the following: {code:java} long tgtEndTime = now; if (!tgt.isDestroyed()) { // As described in HADOOP-15593 we need to handle the case when // tgt.getEndTime() throws NPE because of JDK issue JDK-8147772 // NPE is only possible if this issue is not fixed in the JDK // currently used try{ tgtEndTime = tgt.getEndTime().getTime(); } catch (NullPointerException npe) { LOG.warn("NPE thrown while getting KerberosTicket endTime. The " + "endTime will be set to Time.now()"); } } {code} This seems to me the most straightforward. Hi [~xiaochen], You are right, there's no need to check if {{tgt != null}}. The problem with the unit test for this is that KerberosTicket#getEndTime is final, so cannot be mocked to throw NPE without using powermock. The best I can do now is to test the check for the isDestroyed flag. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553838#comment-16553838 ] Xiao Chen commented on HADOOP-15593: I'm also looking at this particular code block: {code} try { Date endTime = tgt.getEndTime(); if (tgt != null && endTime != null && !tgt.isDestroyed()) { tgtEndTime = endTime.getTime(); } } catch (NullPointerException npe) { {code} - Do we really need the tgt==null check at all? What's the scenario that tgt can be null here? (If it's needed, the check should happen before {{getEndTime}} call, but it doesn't look possible to me that tgt can be null. - Suggest to make the NPE try-catch strictly around the line we're trying to workaround: tgt.getEndTime(); Then also add a pointer to the JDK issue JDK-8147772 in the comment, to save future people the time to search on this jira. Should also explain the fact that the NPE is only possible prior to the JDK fix. - We also need a unit test for this. This can be done by using a mocked tgt > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553145#comment-16553145 ] Eric Yang commented on HADOOP-15593: Two possible approaches to fix the findbug error: {code} Date endTime = tgt.getEndTime(); if (tgt != null && endTime != null && !tgt.isDestroyed()) { tgtEndTime = endTime.getTime(); } else { tgtEndTime = now; } {code} or {code} try { Date endTime = tgt.getEndTime(); tgtEndTime = endTime.getTime(); } catch (NullPointerException npe) { LOG.warn("NPE thrown while getting KerberosTicket endTime. The " + "endTime will be set to Time.now()"); tgtEndTime = now; } {code} Both will work equally well. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552947#comment-16552947 ] genericqa commented on HADOOP-15593: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 26m 40s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 58s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 13s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 42s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 32s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 27m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 27m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 16s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 11m 9s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 47s{color} | {color:red} hadoop-common-project/hadoop-common generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 42s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}125m 2s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-common-project/hadoop-common | | | Nullcheck of UserGroupInformation$AutoRenewalForUserCredsRunnable.tgt at line 928 of value previously dereferenced in org.apache.hadoop.security.UserGroupInformation$AutoRenewalForUserCredsRunnable.run() At UserGroupInformation.java:928 of value previously dereferenced in org.apache.hadoop.security.UserGroupInformation$AutoRenewalForUserCredsRunnable.run() At UserGroupInformation.java:[line 927] | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HADOOP-15593 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932693/HADOOP-15593.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 7d61fc953217 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / bbe2f62 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | findbugs |
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16552760#comment-16552760 ] Gabor Bota commented on HADOOP-15593: - [~xiaochen], [~eyang] thanks for your comments. Based on your advice, I've added the check for getEndTime and catching the NPE to the implementation. I've also added a comment for future reference why's there a try-catch there. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch, > HADOOP-15593.003.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551442#comment-16551442 ] Eric Yang commented on HADOOP-15593: What if we catch the null pointer and reset tgtEndTime to now? When tgtEndTime is undefined for any reasons, with tgtEndTime reset to now, should have no ill effect within the scope of getNextTgtRenewalTime. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551105#comment-16551105 ] Xiao Chen commented on HADOOP-15593: Thanks [~gabor.bota] for working on this and others for comments! It makes sense to me to improve existing behavior in HADOOP-15622. >From the JDK issue [~jojochuang] pointed to, it looks like there are some JDK >improvement as well (https://bugs.openjdk.java.net/browse/JDK-8147772). >{code:java} public final java.util.Date getEndTime() { -return (Date) endTime.clone(); +return (endTime == null) ? null : (Date) endTime.clone(); } {code} So we also need to handle getEndTime() == null in the UGI code. Otherwise we'll see real NPEs from UGI this time :) Stepping back, the original exception is from {{KerberosTicket}}, whose {{endTime}} is null judging from some [code grepping|https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/classes/javax/security/auth/kerberos/KerberosTicket.java#L482]. While I think Daryn's comment would make this better, there is no atomicity guarantee between the {{tgt.isDestroyed}} check and the {{tgt.getEndTime}} call and older versions of JDK could still result in UGI to fail the same way - but that's the best we can do here. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551094#comment-16551094 ] genericqa commented on HADOOP-15593: | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 29s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 30m 43s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 36m 31s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 29s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 13m 52s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 38s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 33m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 33m 5s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 25s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 12m 19s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 52s{color} | {color:green} hadoop-common in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}146m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ba1ab08 | | JIRA Issue | HADOOP-15593 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932083/HADOOP-15593.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 800f0995ed9f 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 10:45:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / cbf2026 | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_171 | | findbugs | v3.1.0-RC1 | | Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/14923/testReport/ | | Max. process+thread count | 1359 (vs. ulimit of 1) | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14923/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL:
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550908#comment-16550908 ] Eric Yang commented on HADOOP-15593: [~gabor.bota] Thank you for creating the second issue for tracking. Jenkins Precommit build is triggered for patch 02. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550768#comment-16550768 ] Gabor Bota commented on HADOOP-15593: - I've created HADOOP-15622 to handle the issue with the nextRefresh. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550762#comment-16550762 ] Gabor Bota commented on HADOOP-15593: - Thanks [~eyang] for the review! There is no difference between the v1 and v2 solution about how this is handled. Please also note the following comment in the IOException cache, where getNextTgtRenewalTime is used and RetryPolicy is defined: {noformat} // Use a dummy maxRetries to create the policy. The policy will // only be used to get next retry time with exponential back-off. // The final retry time will be later limited within the // tgt endTime in getNextTgtRenewalTime. {noformat} I think a solution for this would be to move this to the try block, instead of creating the RetryPolicy in the catch block, so all renewal time would be based on the RetryPolicy. As this issue is a blocker one, so really need to be finished asap (also only aims to target the NPE), I will create another issue for changing the retry behavior. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549998#comment-16549998 ] Wangda Tan commented on HADOOP-15593: - [~eyang], sure, I'd like to include this in 3.1.1 if it won't take too long :). > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549994#comment-16549994 ] Eric Yang commented on HADOOP-15593: [~leftnoteasy] This issue could cause YarnRegistryDNS to stop working. It is best to have this issue included in 3.1.1 release. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Blocker > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549472#comment-16549472 ] Eric Yang commented on HADOOP-15593: [~gabor.bota] I know you are trying to retain existing behavior, but I think there are bugs in existing logic. The calculation of nextRefresh is based on: {code} nextRefresh = Math.max(getRefreshTime(tgt), now + kerberosMinSecondsBeforeRelogin); {code} Most of the time nextRefresh = getRefreshTime(tgt). If it is renewing exactly on refreshTime, and there are parallel operations using expired ticket. There is a time gap that some operations might not perform until the next tgt is obtained. Ideally, we want to keep service uninterrupted, therefore getNextTgtRenewalTime supposed to calculate the time a few minutes before Kerberos tgt expired to determine the nextRefresh time. It looks like we are not using getNextTgtRenewalTime method to calculate nextRefresh instead opt-in to use ticket expiration time as base line for nextRefresh. I think patch 2 approach can create time gap then strain on KDC server when ticket can not be renewed. It would be better to calculate nextRefresh based on getNextTgtRenewalTime. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16549167#comment-16549167 ] Gabor Bota commented on HADOOP-15593: - [~eyang], if you are commenting on my v1 patch, the {{tgtEndTime = tgt.getEndTime().getTime();}} is just for logging. If you are commenting on my v002 patch, I just made a refactor, so there should be no other modifications - I haven't modified how we handle {{nextRefreshTime}}. If you've found an issue with the refactor, or I've changed how the code should behave, please point out where I made the error. If you want to change how {{nextRefreshTime}} is handled, please create a separate ticket - this ticket is for fixing the NPE and testing the fix. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548383#comment-16548383 ] Eric Yang commented on HADOOP-15593: [~gabor.bota] Thank you for the patch. It might be safer to have nextRefreshTime to be earlier than actual end time to prevent disruption of service. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548139#comment-16548139 ] genericqa commented on HADOOP-15593: | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 6s{color} | {color:red} Docker failed to build yetus/hadoop:abb62dd. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HADOOP-15593 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12932083/HADOOP-15593.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/14905/console | | Powered by | Apache Yetus 0.8.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15593.001.patch, HADOOP-15593.002.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547762#comment-16547762 ] Gabor Bota commented on HADOOP-15593: - Submitted my first patch for the issue. If there's a unit test needed I can provide one, but the implementation UserGroupInformation#spawnAutoRenewalThreadForUserCreds() needs to be refactored. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > Attachments: HADOOP-15593.001.patch > > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541734#comment-16541734 ] Daryn Sharp commented on HADOOP-15593: -- The solution is probably call {{KerberosTicket#isDestroyed}} before checking the end time. However, this shouldn't cause issues for the RM unless there's another bug. The renewer is only used when a login uses a ticket cache. No server process should be using a ticket cache or you will be in for a rude surprise if something uses the RM unix user to kinit as a different user, or does a kdestroy. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15593) UserGroupInformation TGT renewer throws NPE
[ https://issues.apache.org/jira/browse/HADOOP-15593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16540646#comment-16540646 ] Wei-Chiu Chuang commented on HADOOP-15593: -- This is happening to a YARN RM too. Since it affects a critical service, I'm bumping the priority to critical. > UserGroupInformation TGT renewer throws NPE > --- > > Key: HADOOP-15593 > URL: https://issues.apache.org/jira/browse/HADOOP-15593 > Project: Hadoop Common > Issue Type: Bug > Components: security >Affects Versions: 3.0.0 >Reporter: Wei-Chiu Chuang >Assignee: Gabor Bota >Priority: Critical > > Found the following NPE thrown in UGI tgt renewer. The NPE was thrown within > an exception handler so the original exception was hidden, though it's likely > caused by expired tgt. > {noformat} > 18/07/02 10:30:57 ERROR util.SparkUncaughtExceptionHandler: Uncaught > exception in thread Thread[TGT Renewer for f...@example.com,5,main] > java.lang.NullPointerException > at > javax.security.auth.kerberos.KerberosTicket.getEndTime(KerberosTicket.java:482) > at > org.apache.hadoop.security.UserGroupInformation$1.run(UserGroupInformation.java:894) > at java.lang.Thread.run(Thread.java:748){noformat} > Suspect it's related to [https://bugs.openjdk.java.net/browse/JDK-8154889]. > The relevant code was added in HADOOP-13590. File this jira to handle the > exception better. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org