[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-11-09 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   2.8.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.8.

Thanks a lot for the reviews [~ste...@apache.org], [~drankye] and 
[~andrew.wang]!

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch, 
> HADOOP-13590.09.patch, HADOOP-13590.10.patch, HADOOP-13590.branch-2.01.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-11-04 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.branch-2.01.patch

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch, 
> HADOOP-13590.09.patch, HADOOP-13590.10.patch, HADOOP-13590.branch-2.01.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-11-04 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.10.patch

Thanks Andrew and Steve.

Patch 10 to:
- Added the space in TestUGIWithMiniKdc
- Added to string to assertion message. Extracted a method to make this cleaner.

Will provide a branch-2 patch shortly.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch, 
> HADOOP-13590.09.patch, HADOOP-13590.10.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: (was: HADOOP-13590.09.patch)

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch, 
> HADOOP-13590.09.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.09.patch

Seems my IDE was confused on indentation by the lambda. This should fix it.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch, 
> HADOOP-13590.09.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-11-01 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.09.patch

Patch 9 to address the comments. I find the lambda {{wait()}} easier to use 
than {{GenericTestUtil#waitFor}}. :) Thank you Steve.

Created HADOOP-13785 for the clean up.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch, 
> HADOOP-13590.09.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-10-31 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.08.patch

Thanks [~andrew.wang] for looking at this! Patch 8 should address all the 
comments:

bq. unit test
Now TestUGI tests the just the retry logic, and the new {{TestUGIWithMiniKdc}} 
tests the 'retries at all'.

bq. Exponential back-off
Crossing your comment and [~ste...@apache.org]'s comment about using 
{{RetryPolicy}}, I changed the code for retry-time calculation. Now it first 
calculates how many max retries could possibly be needed, then creates a 
{{ExponentialBackoffRetry}} object and delegates the calculation to it. This 
way we achieve the random interval + code reuse. The UGI code is (I think) 
harder to read though.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.08.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-10-06 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: (was: HADOOP-13590.07.patch)

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-27 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.07.patch

Fixing checkstyle.
[~ste...@apache.org], please feel free to share your thoughts. Thank you.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.07.patch

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: (was: HADOOP-13590.07.patch)

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-23 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.07.patch

Thanks for the feedback, [~ste...@apache.org].

bq. is there any reason not to use a RetryPolicy here
Good question! The reason is the following:
First of all, we definitely want exponential backoff, to prevent us causing 
ddos on kdc.

In {{RetryPolicies}}, there is no {{RetryUpToMaxmumTimeWithProportinalSleep}}, 
and IMO the reason lacking one there, is it's not feasible/maintainable to 
calculate a {{maxRetries}} inline when invoking the base class ctor. It's 
eventually calculating a taylor series IIUC.

In our case, we could calculate the {{maxRetries}} beforehand, then initialize 
a {{retryUpToMaximumCountWithProportionalSleep}} accordingly. That ends up in 
similar code to {{getNextTgtRenewalTime}} in the caller. Moreover, personally I 
feel the last retry before expiry could be helpful, otherwise the backoff will 
likely miss the end time.

bq. Test can probably import org.apache.hadoop.conf.Configuration rather than 
declare variables that way.
Not really, there's a conflict with {{javax.security.auth.login.Configration}}. 
On a second thought I switched the two to make hadoop's {{Configuration}} the 
default.


Other comments are addressed in patch 7.
Regarding the test, having a real test is brittle and a bit time consuming (due 
to {{TICKET_RENEW_WINDOW}}), but having a fake test as [~drankye] pointed out 
is fake. I don't have a strong option, but if it ends up spamming 
pre-commit, we may switch to the mock test after all.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch, HADOOP-13590.07.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-22 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.06.patch

Thanks for the review and feedback Kai!

Patch 6 attached, addressed your comments, with the exception of:
bq. 1. show now and renewalFailures values in the warning log?
Updated the warning log and removed the debug log in the inner call.
bq. 3. Could we return earlier at the beginning so we can avoid at least 2 
level of indents and make the whole block more readable
Agreed. Feels we should do this separately, created HADOOP-13641.
bq. 4.Just a question. Any other exception than IOException could be thrown 
there?
Checking line by line, I think only IOE and {{InterruptedException}} can be 
thrown.


> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch, 
> HADOOP-13590.06.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-21 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.05.patch

Thanks [~drankye] for reviewing!

Patch 5 to address the comments. Renamed the param and updated the renewal test 
to be based on miniKDC. {{TestUGI}} all passes very fast, so I added a new test 
class for this test. Good news is we can remove the {{shouldSkipSleepForTests}} 
param, making the test more real. :)

Also had minor javadoc/logging improvements.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch, HADOOP-13590.05.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-20 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.04.patch

Thanks for the review, Steve. Attached patch 4 to pass checkstyle.
Any suggestion on whose vote we should ask for?

I'm pinging [~atm] [~rkanter] [~cnauroth] [~lmccay]: dear Kerberos people, 
could you please review? Thanks in advance!

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch, HADOOP-13590.04.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-16 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.03.patch

Patch 3 added exponential backoff to make this nicer. Also added username to 
the log, to be consistent with current code.
Could any of the watchers please take a look? Thanks a lot.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch, 
> HADOOP-13590.03.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.02.patch

oops, rebased.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: (was: HADOOP-13590.02.patch)

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-15 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.02.patch

Improved the test to run faster. And fixed precommit warnings.
The javac for getter method should be ignored IMO, to keep consistency with 
current code. Reviews appreciated!

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch, HADOOP-13590.02.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-13 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Attachment: HADOOP-13590.01.patch

Attaching a patch for the idea.

Current behavior is to just print a exception message on the first failure. 
Feels to me we should retry here, since the UGI had a successful login. This 
would help in scenarios where the renew failure happen to be intermittent.

Patch 1 just to have the retry to be simply every 
{{kerberosMinSecondsBeforeRelogin}} interval, until it succeeds or the tgt 
expires. We could add a more sophisticated retry logic (e.g. max # of retries 
with exponential backoff) if people like this direction.

Also added the exception stack trace to the log message, instead of just the 
exception msg alone.

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-13 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Status: Patch Available  (was: Open)

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.6.4, 2.7.3, 2.8.0
>Reporter: Xiao Chen
>Assignee: Xiao Chen
> Attachments: HADOOP-13590.01.patch
>
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-11 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Component/s: security

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13590) Retry until TGT expires even if the UGI renewal thread encountered exception

2016-09-11 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen updated HADOOP-13590:
---
Affects Version/s: (was: 0.22.0)
   2.8.0
   2.7.3
   2.6.4

> Retry until TGT expires even if the UGI renewal thread encountered exception
> 
>
> Key: HADOOP-13590
> URL: https://issues.apache.org/jira/browse/HADOOP-13590
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.8.0, 2.7.3, 2.6.4
>Reporter: Xiao Chen
>Assignee: Xiao Chen
>
> The UGI has a background thread to renew the tgt. On exception, it 
> [terminates 
> itself|https://github.com/apache/hadoop/blob/bee9f57f5ca9f037ade932c6fd01b0dad47a1296/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1013-L1014]
> If something temporarily goes wrong that results in an IOE, even if it 
> recovered no renewal will be done and client will eventually fail to 
> authenticate. We should retry with our best effort, until tgt expires, in the 
> hope that the error recovers before that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org