[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

2021-12-21 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463589#comment-17463589
 ] 

Surendra Singh Lilhore edited comment on HADOOP-17996 at 12/22/21, 5:40 AM:


[~Sushma_28] and [~prabhujoseph].

Looks like this patch is trying to handle two scenario.
 # Set last login time after re-login in 
*UserGroupInformation#unprotectedRelogin().*
 # Handle re-login in Server when client and server running in same JVM and 
client trying to re-login but it failed. This impacted server also.

#1 is absolutely not required and for this already configuration available if 
you want to reduce the time.

#2 is different scenario and I tried reproducing it by adding some extra code 
in namenode. I added new thread which will logout in a 2 minute after namenode 
start and login again after waiting 2 minute.
{code:java}
    new Thread() {
      public void run() {
        try {
          LOG.info("Logout from UGI");
          Thread.sleep(12);
          UserGroupInformation.getLoginUser().getLogin().logout();
          LOG.info("Waiting got 2 min");
          Thread.sleep(12);
          LOG.info("Login again");
          UserGroupInformation.getLoginUser().getLogin().login();
          LOG.info("Relogin success..");
        } catch (LoginException | IOException | InterruptedException e) {
          LOG.error("Failed log out thread ", e);
        }
      }
    }.start(); {code}
For the 2 minute namenode not able to handle any client operation and keep on 
printing below exception.
{code:java}
Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: (GSS 
initiate failed)
Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: (GSS 
initiate failed)
Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: (GSS 
initiate failed) {code}
I feel raise new Jira to handle Server side re-login and close this as Invalid.


was (Author: surendrasingh):
[~Sushma_28] and [~prabhujoseph].

Looks like this patch is trying to handle two scenario.
 # Set last login time after re-login in 
*UserGroupInformation#unprotectedRelogin().*
 # Handle re-login in Server when client and server running in same JVM and 
client trying to re-login but it failed. This impacted server also.

#1 is absolutely not required and for this already configuration available if 
you want to reduce the time.

#2 is different scenario and I tried reproducing it by adding some extra code 
in namenode. I added new thread which will logout in a 2 minute after namenode 
start and login again after waiting 2 minute.
{code:java}
    new Thread() {
      public void run() {
        try {
          LOG.info("Logout from UGI");
          Thread.sleep(12);
          UserGroupInformation.getLoginUser().getLogin().logout();
          LOG.info("Waiting got 2 min");
          Thread.sleep(12);
          LOG.info("Login again");
          UserGroupInformation.getLoginUser().getLogin().login();
          LOG.info("Relogin success..");
        } catch (LoginException | IOException | InterruptedException e) {
          LOG.error("Failed log out thread ", e);
        }
      }
    }.start(); {code}
For the 2 minute namenode not able to handle any client operation and keep on 
printing below exception.
{code:java}
Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: (GSS 
initiate failed)
Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: (GSS 
initiate failed)
Auth failed for x.x.x.x:42199:null (GSS initiate failed) with true cause: (GSS 
initiate failed) {code}
I feel raise new Jira to handle Server side re-login and close this as Invalid.

> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in
> --
>
> Key: HADOOP-17996
> URL: https://issues.apache.org/jira/browse/HADOOP-17996
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.3.1
>Reporter: Prabhu Joseph
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HADOOP-17996.001.patch
>
>
> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in. IPC#Client does reloginFromKeytab when there is a connection 
> reset failure from AD which does logout and set the last login time to now 
> and then tries to login. The login also fails as not able to connect to AD. 
> Then the reattempts does not happen as kerberosMinSecondsBeforeRelogin check 
> fails. All Client and Server operations fails with *GSS initiate failed*
> {code}
> 2021-10-31 09:50:53,546 WARN  ha.EditLogTailer - Unable to trigger a roll of 
> the active NN
> java.util.concurrent.ExecutionException: 
> 

[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

2021-11-23 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448161#comment-17448161
 ] 

Surendra Singh Lilhore edited comment on HADOOP-17996 at 11/23/21, 5:52 PM:


>> Yes it can be workaround by setting re-login attempt time to a lower value. 
>>Every user has to modify this value after facing this issue. Instead this 
>>patch improves that by reattempting if a previous login failed.

This is not workaround. This property added to avoid load on KDC server. If you 
feel your clusters are not putting enough load on KDC then change default value 
to 0.

Changing it to 0 is same as your patch.

>>This Jira is an improvement. Do you see any problem/impact with this patch.

yes, it will impact the KDC server where KDC is shared by multiple cluster. All 
the processes will start re-login immediately and load will increase.

 

>> Don't we immediately login into our laptop if the previous login failed? 

     This is single user scenario, not for distributed system. :)


was (Author: surendrasingh):
>> Yes it can be workaround by setting re-login attempt time to a lower value. 
>>Every user has to modify this value after facing this issue. Instead this 
>>patch improves that by reattempting if a previous login failed.

This is not workaround. This property added to avoid load on KDC server. If you 
feel your clusters are not putting enough load on KDC then change default value 
to 0.

Changing it to 0 is same as your patch.

>>This Jira is an improvement. Do you see any problem/impact with this patch.

yes, it will impact the KDC server where is shared by multiple cluster. All the 
processes will start re-login immediately and load will increase.

 

>> Don't we immediately login into our laptop if the previous login failed? 

     This is single user scenario, not for distributed system. :)

> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in
> --
>
> Key: HADOOP-17996
> URL: https://issues.apache.org/jira/browse/HADOOP-17996
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.3.1
>Reporter: Prabhu Joseph
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HADOOP-17996.001.patch
>
>
> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in. IPC#Client does reloginFromKeytab when there is a connection 
> reset failure from AD which does logout and set the last login time to now 
> and then tries to login. The login also fails as not able to connect to AD. 
> Then the reattempts does not happen as kerberosMinSecondsBeforeRelogin check 
> fails. All Client and Server operations fails with *GSS initiate failed*
> {code}
> 2021-10-31 09:50:53,546 WARN  ha.EditLogTailer - Unable to trigger a roll of 
> the active NN
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.security.KerberosAuthException:  DestHost:destPort 
> namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. Failed on local 
> exception: org.apache.hadoop.security.KerberosAuthException: Login failure 
> for user: nn/nameno...@example.com javax.security.auth.login.LoginException: 
> Connection reset
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:382)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:441)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1712)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> Caused by: org.apache.hadoop.security.KerberosAuthException:  
> DestHost:destPort namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. 
> Failed on local exception: org.apache.hadoop.security.KerberosAuthException: 
> Login failure for user: nn/nameno...@example.com 
> javax.security.auth.login.LoginException: Connection reset
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> 

[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

2021-11-23 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448083#comment-17448083
 ] 

Prabhu Joseph edited comment on HADOOP-17996 at 11/23/21, 3:49 PM:
---

[~surendralilhore] The issue in existing code is if a re-login failed for some 
reason then the retries to re-login will be skipped for next configured 
re-login attempt time. Yes it can be workaround by setting re-login attempt 
time to a lower value. Every user has to modify this value after facing this 
issue. Instead this patch improves that by reattempting if a previous login 
failed.

Don't we immediately login into our laptop if the previous login failed? Do we 
wait for configured re-login attempt time after every login failure. If so, 
what is the use in waiting for that period if you are sure you have the correct 
credentials? 

>> One question here, even after 60s second login was not successful ? Is this 
>> going in unnecessary loop ?
It will be successful if AD is available. But for 60s, the HDFS Service is 
unavailable. All IPC Server and Client Operations will be failed with *GSS 
initiate failed*.

This Jira is an improvement. Do you see any problem/impact with this patch.



was (Author: prabhu joseph):
[~surendralilhore] The issue in existing code is if a re-login failed for some 
reason then the retries to re-login will be skipped for next configured 
re-login attempt time. Yes it can be workaround by setting re-login attempt 
time to a lower value. Every user has to modify this value after facing this 
issue. Instead this patch improves that by reattempting if a previous login 
failed.

Don't we immediately login into our laptop if the previous login failed? Do we 
wait for configured re-login attempt time after every login failure. If so, 
what is the use in waiting for that period? 

>> One question here, even after 60s second login was not successful ? Is this 
>> going in unnecessary loop ?
It will be successful if AD is available. But for 60s, the HDFS Service is 
unavailable. All IPC Server and Client Operations will be failed with *GSS 
initiate failed*.

This Jira is an improvement. Do you see any problem/impact with this patch.


> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in
> --
>
> Key: HADOOP-17996
> URL: https://issues.apache.org/jira/browse/HADOOP-17996
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.3.1
>Reporter: Prabhu Joseph
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HADOOP-17996.001.patch
>
>
> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in. IPC#Client does reloginFromKeytab when there is a connection 
> reset failure from AD which does logout and set the last login time to now 
> and then tries to login. The login also fails as not able to connect to AD. 
> Then the reattempts does not happen as kerberosMinSecondsBeforeRelogin check 
> fails. All Client and Server operations fails with *GSS initiate failed*
> {code}
> 2021-10-31 09:50:53,546 WARN  ha.EditLogTailer - Unable to trigger a roll of 
> the active NN
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.security.KerberosAuthException:  DestHost:destPort 
> namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. Failed on local 
> exception: org.apache.hadoop.security.KerberosAuthException: Login failure 
> for user: nn/nameno...@example.com javax.security.auth.login.LoginException: 
> Connection reset
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:382)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:441)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1712)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> Caused by: org.apache.hadoop.security.KerberosAuthException:  
> DestHost:destPort namenode0:8020 , LocalHost:localPort 

[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

2021-11-23 Thread Surendra Singh Lilhore (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17447931#comment-17447931
 ] 

Surendra Singh Lilhore edited comment on HADOOP-17996 at 11/23/21, 11:18 AM:
-

[~Sushma_28] , last login time is not successful login time, it is just time 
which indicate when login attempted. So I don't thing setting it after login 
make any sense. HADOOP-7930 allow you to change relogin attempt time if you 
need, by default it is 60 sec.

One question here, even after 60s second login was not successful ? Is this 
going in unnecessary loop ?


was (Author: surendrasingh):
[~Sushma_28] , last login time is not successful login time, it is just time 
which indicate when login attempted. So I don't thing setting it after login 
make any sense. HADOOP-7930 allow you to change relogin attempt time  if you 
need, by default it is 60 sec.

One question here, even after 60s second login was not successful ? Is this 
going in unnecessary loop ?

> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in
> --
>
> Key: HADOOP-17996
> URL: https://issues.apache.org/jira/browse/HADOOP-17996
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 3.3.1
>Reporter: Prabhu Joseph
>Assignee: Ravuri Sushma sree
>Priority: Major
> Attachments: HADOOP-17996.001.patch
>
>
> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in. IPC#Client does reloginFromKeytab when there is a connection 
> reset failure from AD which does logout and set the last login time to now 
> and then tries to login. The login also fails as not able to connect to AD. 
> Then the reattempts does not happen as kerberosMinSecondsBeforeRelogin check 
> fails. All Client and Server operations fails with *GSS initiate failed*
> {code}
> 2021-10-31 09:50:53,546 WARN  ha.EditLogTailer - Unable to trigger a roll of 
> the active NN
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.security.KerberosAuthException:  DestHost:destPort 
> namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. Failed on local 
> exception: org.apache.hadoop.security.KerberosAuthException: Login failure 
> for user: nn/nameno...@example.com javax.security.auth.login.LoginException: 
> Connection reset
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:206)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:382)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:441)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:360)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1712)
>   at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:480)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
> Caused by: org.apache.hadoop.security.KerberosAuthException:  
> DestHost:destPort namenode0:8020 , LocalHost:localPort namenode1/1.2.3.4:0. 
> Failed on local exception: org.apache.hadoop.security.KerberosAuthException: 
> Login failure for user: nn/nameno...@example.com 
> javax.security.auth.login.LoginException: Connection reset
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
>   at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1501)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1443)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1353)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>   at com.sun.proxy.$Proxy21.rollEditLog(Unknown Source)
>   at 
> 

[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

2021-11-15 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444021#comment-17444021
 ] 

Prabhu Joseph edited comment on HADOOP-17996 at 11/15/21, 6:18 PM:
---

Thanks [~brahmareddy] for reviewing the patch.
{quote}this was just to track the re-login attempt so that so many retries can 
be avoided.?
{quote}
There are two issues the patch addresses

1. When IPC#Client fails during {{{}saslConnect{}}}, it does re-login from 
{{{}handleSaslConnectionFailure{}}}. The re-login sets the last login time to 
current time irrespective of the login status, followed by logout and then 
login. When login fails for some reason like intermittent issue in connecting 
to AD, then all subsequent Client and Server operations will fail with GSS 
Initiate Failed for next configured {{kerberosMinSecondsBeforeLogin}} (60 
seconds).
{code:java}
// try re-login
  if (UserGroupInformation.isLoginKeytabBased()) {
UserGroupInformation.getLoginUser().reloginFromKeytab();
  } else if (UserGroupInformation.isLoginTicketBased()) {
UserGroupInformation.getLoginUser().reloginFromTicketCache();
  }
{code}
This issue is addressed by setting the last login time to current time after 
the login succeeds. 

2. Currently the re-login happens only from IPC#Client during 
{{{}handleSaslConnectionFailure(){}}}. Have observed cases where Client has 
logged out and have failed to login back leading to all IPC#Server operations 
failing in {{processSaslMessage}} with below error.
{code:java}
2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 
10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 
10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
{code}
This patch adds re-login from Server side as well during any Authentication 
Failure.
{quote}Configuring kerberosMinSecondsBeforeRelogin with low value will not work 
here if it's needed.?
{quote}
This will workaround the first issue.
 
{quote}After this fix , on failure it will continuously retry..?
{quote}
IPC#Client does re-login during Connection Failure. This patch adds at 
IPC#Server side as well. Retries are based on the retry mechanism of IPC#Client 
and IPC#Server. The real kerberos login will happen for every retry from 
IPC#Client and IPC#Server till the login succeeds.


was (Author: prabhu joseph):
Thanks [~brahmareddy] for reviewing the patch.
{quote}this was just to track the re-login attempt so that so many retries can 
be avoided.?
{quote}
There are two issues the patch tries to address

1. When IPC#Client fails during {{{}saslConnect{}}}, it does re-login from 
{{{}handleSaslConnectionFailure{}}}. The re-login sets the last login time to 
current time irrespective of the login status, followed by logout and then 
login. When login fails for some reason like intermittent issue in connecting 
to AD, then all subsequent Client and Server operations will fail with GSS 
Initiate Failed for next configured {{kerberosMinSecondsBeforeLogin}} (60 
seconds).
{code:java}
// try re-login
  if (UserGroupInformation.isLoginKeytabBased()) {
UserGroupInformation.getLoginUser().reloginFromKeytab();
  } else if (UserGroupInformation.isLoginTicketBased()) {
UserGroupInformation.getLoginUser().reloginFromTicketCache();
  }
{code}
This issue is addressed by setting the last login time to current time after 
the login succeeds. 

2. Currently the re-login happens only from IPC#Client during 
{{{}handleSaslConnectionFailure(){}}}. Have observed cases where Client has 
logged out and have failed to login back leading to all IPC#Server operations 
failing in {{processSaslMessage}} with below error.
{code:java}
2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 
10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 
10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
{code}
This patch adds re-login from Server side as well during any Authentication 
Failure.

bq. Configuring kerberosMinSecondsBeforeRelogin with low value will not work 
here if it's needed.?
This will workaround the first issue.
 

bq. After this fix , on failure it will continuously retry..?

IPC#Client does re-login during Connection Failure. This patch adds at 
IPC#Server side as well. Retries are based on the retry mechanism of IPC#Client 
and IPC#Server. The real kerberos login will happen for every retry from 
IPC#Client and IPC#Server till the login succeeds.

> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in
> 

[jira] [Comment Edited] (HADOOP-17996) UserGroupInformation#unprotectedRelogin sets the last login time before logging in

2021-11-15 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444021#comment-17444021
 ] 

Prabhu Joseph edited comment on HADOOP-17996 at 11/15/21, 6:17 PM:
---

Thanks [~brahmareddy] for reviewing the patch.
{quote}this was just to track the re-login attempt so that so many retries can 
be avoided.?
{quote}
There are two issues the patch tries to address

1. When IPC#Client fails during {{{}saslConnect{}}}, it does re-login from 
{{{}handleSaslConnectionFailure{}}}. The re-login sets the last login time to 
current time irrespective of the login status, followed by logout and then 
login. When login fails for some reason like intermittent issue in connecting 
to AD, then all subsequent Client and Server operations will fail with GSS 
Initiate Failed for next configured {{kerberosMinSecondsBeforeLogin}} (60 
seconds).
{code:java}
// try re-login
  if (UserGroupInformation.isLoginKeytabBased()) {
UserGroupInformation.getLoginUser().reloginFromKeytab();
  } else if (UserGroupInformation.isLoginTicketBased()) {
UserGroupInformation.getLoginUser().reloginFromTicketCache();
  }
{code}
This issue is addressed by setting the last login time to current time after 
the login succeeds. 

2. Currently the re-login happens only from IPC#Client during 
{{{}handleSaslConnectionFailure(){}}}. Have observed cases where Client has 
logged out and have failed to login back leading to all IPC#Server operations 
failing in {{processSaslMessage}} with below error.
{code:java}
2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 
10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 
10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
{code}
This patch adds re-login from Server side as well during any Authentication 
Failure.

bq. Configuring kerberosMinSecondsBeforeRelogin with low value will not work 
here if it's needed.?
This will workaround the first issue.
 

bq. After this fix , on failure it will continuously retry..?

IPC#Client does re-login during Connection Failure. This patch adds at 
IPC#Server side as well. Retries are based on the retry mechanism of IPC#Client 
and IPC#Server. The real kerberos login will happen for every retry from 
IPC#Client and IPC#Server till the login succeeds.


was (Author: prabhu joseph):
Thanks [~brahmareddy] for reviewing the patch.
{quote}this was just to track the re-login attempt so that so many retries can 
be avoided.?
{quote}
There are two issues the patch tries to address

1. When IPC#Client fails during {{{}saslConnect{}}}, it does re-login from 
{{{}handleSaslConnectionFailure{}}}. The re-login sets the last login time to 
current time irrespective of the login status, followed by logout and then 
login. When login fails for some reason like intermittent issue in connecting 
to AD, then all subsequent Client and Server operations will fail with GSS 
Initiate Failed for next configured {{kerberosMinSecondsBeforeLogin}} (60 
seconds).
{code:java}
// try re-login
  if (UserGroupInformation.isLoginKeytabBased()) {
UserGroupInformation.getLoginUser().reloginFromKeytab();
  } else if (UserGroupInformation.isLoginTicketBased()) {
UserGroupInformation.getLoginUser().reloginFromTicketCache();
  }
{code}
This issue is addressed by setting the last login time to current time after 
the login succeeds. 

2. Currently the re-login happens only from IPC#Client during 
{{{}handleSaslConnectionFailure(){}}}. Have observed cases where Client has 
logged out and have failed to login back leading to all IPC#Server operations 
failing in {{processSaslMessage}} with below error.
{code:java}
2021-11-02 13:28:08,750 WARN  ipc.Server - Auth failed for 
10.25.35.45:37849:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
2021-11-02 13:28:08,767 WARN  ipc.Server - Auth failed for 
10.25.35.46:35919:null (GSS initiate failed) with true cause: (GSS initiate 
failed)
{code}
This patch adds re-login from Server side as well during any Authentication 
Failure.

bq. Configuring kerberosMinSecondsBeforeRelogin with low value will not work 
here if it's needed.?
This will workaround the first issue.
 
{quote}
{quote}After this fix , on failure it will continuously retry..?
{quote}

IPC#Client does re-login during Connection Failure. This patch adds at 
IPC#Server side as well. Retries are based on the retry mechanism of IPC#Client 
and IPC#Server. The real kerberos login will happen for every retry from 
IPC#Client and IPC#Server till the login succeeds.

> UserGroupInformation#unprotectedRelogin sets the last login time before 
> logging in
>