[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17657070#comment-17657070 ] Surendra Singh Lilhore commented on HADOOP-18581: - {quote}Are you planning to backport to other branches also..? {quote} Yes [~brahmareddy], I will backport this. {quote}Any insights on when this can happen. {quote} Yes, this issue happened in many prod cluster. Mostly this issue happened when one KDC is doing backup and it is not available for login request. When client trying to do the re-login but login failed because client is not able to failover to other available KDC server(failover failed because of wrong error code from first server). {quote}I checked your testcase which logout on other thread, but will be case..? {quote} Yes, This is very common in NameNode and journalnode case. QJM in NameNode is client for JournalNode and it will do re-login as client, but when this re-login fail it will impact the NameNode also because [UGI#unprotectedRelogin()|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1361] destroy the NameNode ticket. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655832#comment-17655832 ] Brahma Reddy Battula commented on HADOOP-18581: --- [~surendralilhore] thanks for reporting and working on this. Are you planning to backport to other branches also..? {quote}When JN client try to re-login and it fails, it will destroy server service ticket also and NameNode not able to server client request. We can see the below error logs in NameNode log file. {quote} Any insights on when this can happen.. I checked your testcase which logout on other thread, but will be case..? Are you using kerboes-1.15 client or kerboes-1.13 client.? > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655821#comment-17655821 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore merged PR #5248: URL: https://github.com/apache/hadoop/pull/5248 > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655820#comment-17655820 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1374897142 Thanks @cnauroth , @liuml07 , @ayushtkn and @steveloughran for review.. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17655601#comment-17655601 ] ASF GitHub Bot commented on HADOOP-18581: - hadoop-yetus commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1374252688 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 39s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 16s | | trunk passed | | +1 :green_heart: | compile | 23m 14s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 20m 26s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 13s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 41s | | trunk passed | | -1 :x: | javadoc | 1m 15s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/6/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 51s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 40s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 22m 24s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 22m 24s | | the patch passed | | +1 :green_heart: | compile | 20m 25s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 25s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 39s | | the patch passed | | -1 :x: | javadoc | 1m 6s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/6/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 38s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 8s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 19s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 0s | | The patch does not generate ASF License warnings. | | | | 205m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5248 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 9f1d5a9caf11 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 3f0538ef948fdca63943cefd43a224b6be61f875 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions |
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654651#comment-17654651 ] ASF GitHub Bot commented on HADOOP-18581: - cnauroth commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1061900219 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -153,6 +154,13 @@ public abstract class Server { private ExceptionsHandler exceptionsHandler = new ExceptionsHandler(); private Tracer tracer; private AlignmentContext alignmentContext; + + /** + * Allow server to do force Kerberos re-login once after failure irrespective + * of the last login time. + */ + private AtomicBoolean canTryForceLogin = new AtomicBoolean(true); Review Comment: Please mark this as `final`. ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -3322,6 +3346,27 @@ protected Server(String bindAddress, int port, metricsUpdaterInterval, metricsUpdaterInterval, TimeUnit.MILLISECONDS); } + private synchronized void doKerberosRelogin() throws IOException { +if(UserGroupInformation.getLoginUser().isLoginSuccess()){ + return; +} +LOG.warn("Initiating re-login from IPC Server"); +if (canTryForceLogin.get()) { Review Comment: I suggest shortening this a little to: ``` if (canTryForceLogin.compareAndSet(true, false)) { if (UserGroupInformation.isLoginKeytabBased()) { ... ``` > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654464#comment-17654464 ] ASF GitHub Bot commented on HADOOP-18581: - hadoop-yetus commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1370905767 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 34s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 17s | | trunk passed | | +1 :green_heart: | compile | 23m 4s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 20m 18s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 41s | | trunk passed | | -1 :x: | javadoc | 1m 15s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/5/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 49s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 42s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 58s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 58s | | the patch passed | | +1 :green_heart: | compile | 22m 24s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 22m 24s | | the patch passed | | +1 :green_heart: | compile | 20m 49s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 49s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 4s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 40s | | the patch passed | | -1 :x: | javadoc | 1m 3s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/5/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 48s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 46s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 4s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 29s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 0m 56s | | The patch does not generate ASF License warnings. | | | | 206m 0s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5248 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 17d088e9be4d 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f4a755b08bb7bcce7b6044f55e9db9be53dcd829 | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions |
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654379#comment-17654379 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1061327469 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: Thanks @cnauroth. @liuml07, Agreed for new force API for re-login. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654378#comment-17654378 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1061327469 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: Thanks @cnauroth. @liuml07 Agree for new force API for re-login. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654376#comment-17654376 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1370726197 Thanks @liuml07 for review. > 1. Is it possible to figure out some unit tests (not necessarily NN+JN case) for Server and/or UGI? Even the current code change is straightforward, it may be broken by mistake or misunderstanding in future. Adding UT for this scenario is difficult. Passing Sasl message to server without any proper channel and making it fail is difficult. > 2. Do we need `Server#canTryForceLogin` to be thread-safe for multiple connections? Changed `Server#canTryForceLogin` to `AtomicBoolean` > 3. Is it clear to extract the new code in `Server` to a private helper method? Re-login logic extracted in new method and made it synchronized to avoid multiple re-relogin in concurrent scenario. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17654213#comment-17654213 ] ASF GitHub Bot commented on HADOOP-18581: - liuml07 commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1370342353 It has been a while since last time I check the security code. My understanding fades away as security is complex and risky. Consider my comments non-binding here. It makes sense to have to force re-login for addressing the issue. 1. Is it possible to figure out some unit tests (not necessarily NN+JN case) for Server and/or UGI? Even the current code change is straightforward, it may be broken by mistake or misunderstanding in future. 2. Do we need `Server#canTryForceLogin` to be thread-safe for multiple connections? 3. Is it clear to extract the new code in `Server` to a private helper method? > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653633#comment-17653633 ] ASF GitHub Bot commented on HADOOP-18581: - hadoop-yetus commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1369031680 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 23s | | trunk passed | | +1 :green_heart: | compile | 23m 7s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 20m 35s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 13s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 41s | | trunk passed | | -1 :x: | javadoc | 1m 15s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/4/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 51s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 44s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 15s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 22m 28s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 22m 28s | | the patch passed | | +1 :green_heart: | compile | 20m 23s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 23s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 38s | | the patch passed | | -1 :x: | javadoc | 1m 7s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/4/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 50s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 45s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 14s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 21s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 2s | | The patch does not generate ASF License warnings. | | | | 205m 52s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5248 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 4f38a72f4e33 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / d2412f13af91a2fe9f91f2890f280223f235c2bc | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions |
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653576#comment-17653576 ] ASF GitHub Bot commented on HADOOP-18581: - hadoop-yetus commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1368849593 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 33s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 34s | | trunk passed | | +1 :green_heart: | compile | 25m 27s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 20m 56s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 42s | | trunk passed | | -1 :x: | javadoc | 1m 16s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/3/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 50s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 41s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 28s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 58s | | the patch passed | | +1 :green_heart: | compile | 22m 23s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 22m 23s | | the patch passed | | +1 :green_heart: | compile | 20m 22s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 22s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/3/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | +1 :green_heart: | checkstyle | 1m 8s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 37s | | the patch passed | | -1 :x: | javadoc | 1m 7s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/3/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 39s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 21s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 20s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 1s | | The patch does not generate ASF License warnings. | | | | 208m 53s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5248 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 566e48aa644e 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk /
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652556#comment-17652556 ] ASF GitHub Bot commented on HADOOP-18581: - cnauroth commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1058507029 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java: ## @@ -1287,14 +1314,18 @@ private void reloginFromKeytab(boolean checkTGT, boolean ignoreLastLoginTime) @InterfaceAudience.Public @InterfaceStability.Evolving public void reloginFromTicketCache() throws IOException { -if (!shouldRelogin() || !isFromTicket()) { +reloginFromTicketCache(false); + } + + private void reloginFromTicketCache(boolean ignoreLastLoginTime) throws IOException { + if (!shouldRelogin() || !isFromTicket()) { Review Comment: There is a Checkstyle warning here about indentation: ``` ./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java:1321: if (!shouldRelogin() || !isFromTicket()) {: 'if' has incorrect indentation level 5, expected level should be 4. [Indentation] ``` ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java: ## @@ -529,6 +529,16 @@ private void setLogin(LoginContext login) { user.setLogin(login); } + /** This method checks for a successful Kerberos login Review Comment: This is generating a new JavaDoc warning: ``` [ERROR] /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-5248/ubuntu-focal/src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java:535: warning: no @return [ERROR] public boolean isLoginSuccess() { [ERROR] ^ ``` Additionally, I suggest sticking to the existing style of line break right after the opening and an asterisk on each line: ``` /** * line 1 * line 2 * * @return foo */ ``` > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652543#comment-17652543 ] ASF GitHub Bot commented on HADOOP-18581: - hadoop-yetus commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1366808788 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 46s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 36s | | trunk passed | | +1 :green_heart: | compile | 23m 15s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 22m 30s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 15s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 52s | | trunk passed | | -1 :x: | javadoc | 1m 18s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/2/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 44s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 18s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 32s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 0s | | the patch passed | | +1 :green_heart: | compile | 22m 20s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 22m 20s | | the patch passed | | +1 :green_heart: | compile | 20m 29s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 29s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/2/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 1 new + 223 unchanged - 0 fixed = 224 total (was 223) | | +1 :green_heart: | mvnsite | 1m 40s | | the patch passed | | -1 :x: | javadoc | 1m 9s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/2/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 41s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 9s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 18s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 3s | | The patch does not generate ASF License warnings. | | | | 209m 36s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5248 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 8a804bdf1d5c 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652507#comment-17652507 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1058378415 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); Review Comment: Changed it to Warn > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652506#comment-17652506 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1058376241 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: Thanks @cnauroth. Added new API `forceReloginFromTicketCache()` and using both the force API in `Server.java` >A drawback is that it's potentially a dangerous API if used incorrectly, because it could spam the KDC. I have added check to use force login API only once in `Server.java` after failure and if it fails again then it will wait for 60 seconds. Handling this by adding **canTryForceLogin** in `Server.java.` > We could add that, but expanding the public API footprint of UserGroupInformation should not be taken lightly. Mostly people will use it for new development and they should aware of use case. > Ideally, I'd like to get a second opinion from one more committer. @liuml07 Please can you give your opinion here as you reviewed HADOOP-17159 > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652501#comment-17652501 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1058364619 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); +} else if (UserGroupInformation.isLoginTicketBased()) { + UserGroupInformation.getLoginUser().reloginFromTicketCache(); +} +try { + // try processing message again + saslResponse = processSaslMessage(saslMessage); + AUDITLOG.info("Retry " + AUTH_SUCCESSFUL_FOR + this.toString() + + ":" + attemptingUser + " after failure"); +} catch (IOException exp) { + tce = (IOException) getTrueCause(e); Review Comment: There is proper null check inside getTrueCause, it will not return null. ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); +} else if (UserGroupInformation.isLoginTicketBased()) { + UserGroupInformation.getLoginUser().reloginFromTicketCache(); +} +try { + // try processing message again + saslResponse = processSaslMessage(saslMessage); + AUDITLOG.info("Retry " + AUTH_SUCCESSFUL_FOR + this.toString() + + ":" + attemptingUser + " after failure"); +} catch (IOException exp) { + tce = (IOException) getTrueCause(e); Review Comment: There is proper null check inside getTrueCause(), it will not return null. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17652293#comment-17652293 ] ASF GitHub Bot commented on HADOOP-18581: - cnauroth commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1057844215 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: For keytab usage, there is `UserGroupInformation#forceReloginFromKeytab()`, which always does the login regardless of time since last login. There is no equivalent `forceReloginFromTicketCache()` though. We could add that, but expanding the public API footprint of `UserGroupInformation` should not be taken lightly. Ideally, I'd like to get a second opinion from one more committer. I think it's the right thing to do. A drawback is that it's potentially a dangerous API if used incorrectly, because it could spam the KDC. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651643#comment-17651643 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1056306460 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: @cnauroth , any suggestion to handle this 60 second delay ? > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651611#comment-17651611 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1056248810 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: > Would that still leave a server potentially in a bad state for up to 60 seconds? Yes, for 60 seconds server will in bad state. Earlier only option was to restart the server. Below is the test log for 60 second from my test cluster, after 60 second it is successfully logged-in : ``` 2022-12-23 10:27:19,117 INFO ipc.Server - Auth successful for hive/host1-x...@xxx.server.com (auth:KERBEROS) 2022-12-23 10:27:19,121 INFO authorize.ServiceAuthorizationManager - Authorization successful for hive/host1-x...@xxx.server.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol 2022-12-23 10:27:27,048 ERROR namenode.NameNode - Dummy logout thread... org.apache.hadoop.security.KerberosAuthException: Login failure for user: nn/host1-x...@xxx.server.com javax.security.auth.login.LoginException: Re-login failed at org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1203) at org.apache.hadoop.hdfs.server.namenode.NameNode$2.run(NameNode.java:1590) Caused by: javax.security.auth.login.LoginException: Re-login failed at org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1188) ... 1 more 2022-12-23 10:27:28,786 WARN ipc.Server - Auth failed for 10.x.y.z:46879:null (GSS initiate failed) with true cause: (GSS initiate failed) 2022-12-23 10:27:28,786 INFO ipc.Server - Initiating re-login from IPC Server 2022-12-23 10:27:28,786 INFO ipc.Server - Doing login from keytab 2022-12-23 10:27:28,786 WARN security.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1671791247048 . . . . . 2022-12-23 10:28:27,618 WARN ipc.Server - Auth failed for 10.x.y.z:45329:null (GSS initiate failed) with true cause: (GSS initiate failed) 2022-12-23 10:28:27,619 INFO ipc.Server - Initiating re-login from IPC Server 2022-12-23 10:28:27,619 INFO ipc.Server - Doing login from keytab 2022-12-23 10:28:27,652 INFO ipc.Server - Retry Auth successful for 10.x.y.z:45329:null after failure 2022-12-23 10:28:27,655 INFO ipc.Server - Auth successful for hive/hn1-x...@xxx.server.com (auth:KERBEROS) 2022-12-23 10:28:27,667 INFO authorize.ServiceAuthorizationManager - Authorization successful for hive/hn1-x...@xxx.server.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol ``` > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651603#comment-17651603 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1056248810 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: > Would that still leave a server potentially in a bad state for up to 60 seconds? Yes, for 60 seconds server will in bad state. Earlier only option was to restart the server. Below it the test log for 60 second from my test cluster : `2022-12-23 10:27:19,117 INFO ipc.Server - Auth successful for hive/host1-x...@xxx.server.com (auth:KERBEROS) 2022-12-23 10:27:19,121 INFO authorize.ServiceAuthorizationManager - Authorization successful for hive/host1-x...@xxx.server.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol 2022-12-23 10:27:27,048 ERROR namenode.NameNode - Dummy logout thread... org.apache.hadoop.security.KerberosAuthException: Login failure for user: nn/host1-x...@xxx.server.com javax.security.auth.login.LoginException: Re-login failed at org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1203) at org.apache.hadoop.hdfs.server.namenode.NameNode$2.run(NameNode.java:1590) Caused by: javax.security.auth.login.LoginException: Re-login failed at org.apache.hadoop.security.UserGroupInformation.unprotectedRelogin(UserGroupInformation.java:1188) ... 1 more 2022-12-23 10:27:28,786 WARN ipc.Server - Auth failed for 10.x.y.z:46879:null (GSS initiate failed) with true cause: (GSS initiate failed) 2022-12-23 10:27:28,786 INFO ipc.Server - Initiating re-login from IPC Server 2022-12-23 10:27:28,786 INFO ipc.Server - Doing login from keytab 2022-12-23 10:27:28,786 WARN security.UserGroupInformation - Not attempting to re-login since the last re-login was attempted less than 60 seconds before. Last Login=1671791247048 . . . . . 2022-12-23 10:28:27,618 WARN ipc.Server - Auth failed for 10.x.y.z:45329:null (GSS initiate failed) with true cause: (GSS initiate failed) 2022-12-23 10:28:27,619 INFO ipc.Server - Initiating re-login from IPC Server 2022-12-23 10:28:27,619 INFO ipc.Server - Doing login from keytab 2022-12-23 10:28:27,652 INFO ipc.Server - Retry Auth successful for 10.x.y.z:45329:null after failure 2022-12-23 10:28:27,655 INFO ipc.Server - Auth successful for hive/hn1-x...@xxx.server.com (auth:KERBEROS) 2022-12-23 10:28:27,667 INFO authorize.ServiceAuthorizationManager - Authorization successful for hive/hn1-x...@xxx.server.com (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol` > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651590#comment-17651590 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1056240772 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: @cnauroth Thanks for review. >@surendralilhore , thank you for the patch. I entered a few questions. Additionally, can you please describe if you've been able to simulate the problem in testing to confirm that this patch fixes it? (I assume unit testing isn't practical for this.) Yes, I simulated it by adding one thread in namenode and that thread will do logout in 2 minute after stating namenode. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651589#comment-17651589 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1363842249 > @surendralilhore , thank you for the patch. I entered a few questions. Additionally, can you please describe if you've been able to simulate the problem in testing to confirm that this patch fixes it? (I assume unit testing isn't practical for this.) Yes, I simulated it by adding one thread in namenode and that thread will do logout in 2 minute after stating namenode. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17651294#comment-17651294 ] ASF GitHub Bot commented on HADOOP-18581: - steveloughran commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1055492356 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java: ## @@ -529,6 +529,13 @@ private void setLogin(LoginContext login) { user.setLogin(login); } + /** This method is only helpful for HadoopLoginContext*/ + public boolean isLoginSuccess() { +LoginContext login = user.getLogin(); +return (login instanceof HadoopLoginContext) +? ((HadoopLoginContext) login).isLoginSuccess() : true; Review Comment: nit, put : on a newline ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); +} else if (UserGroupInformation.isLoginTicketBased()) { + UserGroupInformation.getLoginUser().reloginFromTicketCache(); +} +try { + // try processing message again + saslResponse = processSaslMessage(saslMessage); + AUDITLOG.info("Retry " + AUTH_SUCCESSFUL_FOR + this.toString() + + ":" + attemptingUser + " after failure"); +} catch (IOException exp) { + tce = (IOException) getTrueCause(e); Review Comment: big assumption there about wrapped cause and its class. does it hold ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); +} else if (UserGroupInformation.isLoginTicketBased()) { + UserGroupInformation.getLoginUser().reloginFromTicketCache(); +} +try { + // try processing message again + saslResponse = processSaslMessage(saslMessage); + AUDITLOG.info("Retry " + AUTH_SUCCESSFUL_FOR + this.toString() Review Comment: use slf4j syntax ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); +} else if (UserGroupInformation.isLoginTicketBased()) { + UserGroupInformation.getLoginUser().reloginFromTicketCache(); +} +try { + // try processing message again + saslResponse = processSaslMessage(saslMessage); Review Comment: maybe log at debug that this is about to be retried ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); Review Comment: this the right log level? ##
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649929#comment-17649929 ] ASF GitHub Bot commented on HADOOP-18581: - cnauroth commented on code in PR #5248: URL: https://github.com/apache/hadoop/pull/5248#discussion_r1053628775 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java: ## @@ -2206,7 +2206,25 @@ private void saslProcess(RpcSaslProto saslMessage) AUDITLOG.warn(AUTH_FAILED_FOR + this.toString() + ":" + attemptingUser + " (" + e.getLocalizedMessage() + ") with true cause: (" + tce.getLocalizedMessage() + ")"); - throw tce; + if (!UserGroupInformation.getLoginUser().isLoginSuccess()) { +LOG.info("Initiating re-login from IPC Server"); +if (UserGroupInformation.isLoginKeytabBased()) { + UserGroupInformation.getLoginUser().reloginFromKeytab(); Review Comment: If I trace through the chain of these re-login methods, they end up passing `false` for `ignoreLastLoginTime`. They'll skip the re-login and early exit if insufficient time (default 60 seconds) has elapsed since last login. Would that still leave a server potentially in a bad state for up to 60 seconds? ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java: ## @@ -529,6 +529,13 @@ private void setLogin(LoginContext login) { user.setLogin(login); } + /** This method is only helpful for HadoopLoginContext*/ Review Comment: There is a minor checkstyle warning here asking for a period at the end of the sentence. However, perhaps consider expanding a bit. `HadoopLoginContext` is a private inner class, so probably best not to discuss it in a public Javadoc. You could discuss how this method checks for a successful Kerberos login, or defaults to `true` if not using Kerberos. > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > Labels: pull-request-available > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649876#comment-17649876 ] ASF GitHub Bot commented on HADOOP-18581: - hadoop-yetus commented on PR #5248: URL: https://github.com/apache/hadoop/pull/5248#issuecomment-1359664117 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 51s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 55s | | trunk passed | | +1 :green_heart: | compile | 23m 15s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 20m 33s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 14s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 44s | | trunk passed | | -1 :x: | javadoc | 1m 17s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/1/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 50s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 47s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 23s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 59s | | the patch passed | | +1 :green_heart: | compile | 22m 29s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 22m 29s | | the patch passed | | +1 :green_heart: | compile | 20m 27s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 27s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 9s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/1/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 1 new + 223 unchanged - 0 fixed = 224 total (was 223) | | +1 :green_heart: | mvnsite | 1m 39s | | the patch passed | | -1 :x: | javadoc | 1m 7s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/1/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 41s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 5s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 20s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 3s | | The patch does not generate ASF License warnings. | | | | 207m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5248/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5248 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 8e7a86fb979a 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Commented] (HADOOP-18581) Handle Server KDC re-login when Server and Client run in same JVM.
[ https://issues.apache.org/jira/browse/HADOOP-18581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17649738#comment-17649738 ] ASF GitHub Bot commented on HADOOP-18581: - surendralilhore opened a new pull request, #5248: URL: https://github.com/apache/hadoop/pull/5248 …in same JVM. ### Description of PR ### How was this patch tested? ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? > Handle Server KDC re-login when Server and Client run in same JVM. > -- > > Key: HADOOP-18581 > URL: https://issues.apache.org/jira/browse/HADOOP-18581 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.1 >Reporter: Surendra Singh Lilhore >Assignee: Surendra Singh Lilhore >Priority: Major > > Handle re-login in Server when client, server running in same JVM and client > trying to re-login, but it fails. > For example, NameNode is server but in same JVM journal node client also > running to push to edit logs. When JN client try to re-login and it fails, it > will destroy server service ticket also and NameNode not able to server > client request. We can see the below error logs in NameNode log file. > > {noformat} > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed) > Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: > (GSS initiate failed){noformat} > Same discussion happened in HADOOP-17996. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org