[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk-6.patch Fixed 2 test failures introduced by patch 5. {{TestLoadBalancingKMSClientProvider}} and {{TestCommonConfigurationFields}} Remove {{hadoop.security.kms.client.failover.max.retries}} from core-default.xml because the default value is supposed to be the num of providers to preserve original behavior. Adding the value in core-default.xml will change that behavior since it will always override that value. > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, > HDFS-11804-trunk-6.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, > HDFS-11804-trunk-6.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Open (was: Patch Available) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, > HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk-5.patch > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, > HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk-5.patch, > HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Open (was: Patch Available) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk-4.patch Uploading a new patch addressing all the comments and correcting some assert messages. [~daryn] please review. > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk-4.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Open (was: Patch Available) Cancelling the patch to address [~daryn]'s comments. > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk-3.patch Added new patch to address checkstyle warnings. The test failures from the previous build are unrelated : ||Failed test||Comment|| |hadoop.fs.viewfs.TestViewFileSystemWithAuthorityLocalFileSystem |Passing locally| |hadoop.hdfs.TestAclsEndToEnd |Created HDFS-11944| |hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure|Passing locally| |hadoop.hdfs.server.namenode.TestDecommissioningStatus|Passing locally| |hadoop.hdfs.TestDFSStripedOutputStreamWithFailure150|Passing locally| |hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy|Passing locally| |hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy|Passing locally| [~xiaochen] [~daryn] please review the latest patch. > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk-3.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Open (was: Patch Available) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk-2.patch Attaching a new patching addressing all the comments by [~daryn] and [~xiaochen] bq. Should we also treat AuthenticationException as non-retry? Since AuthenticationException is not thrown by {{LoadBalancingKMSClientProvider#doOp}}, we can't catch it explicitly. But it will be thrown as {{WrapperException}} and we don't retry. bq. I think we should retry immediately for all 3 at first. (Same as current LBKMSCP behavior). Addressed in the latest patch. bq. Should just specify a max retries that defaults to the number of load balanced hosts (today's behavior) is probably better. Addressed in the latest patch. bq. The max sleep time seems rather high since it can block server threads for a long time. I'd suggest perhaps 1-2s. Made it 2 seconds. bq. Should probably only sleep after trying all clients instead of between each client. Addressed in the latest patch. bq. Separate try blocks for shouldRetry and sleep to better control exception rethrows. Addressed in the latest patch. bq. Change the sleep catch to rethrow InteruptedIOException. Addressed in the latest patch. bq. Trivial but there's a lot of double punctuation, ex. periods, exclamations. Addressed in the latest patch. One more major change change done in the latest patch. Treat all the operations as non idempotent. Since we catch all the Socket related exceptions in {{FailoverOnNetworkExceptionRetry#shouldRetry}} in the first catch block and Failover, it should be safe to assume that all the exception that passed the first catch block shouldn't be retried. > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk-2.patch, > HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Open (was: Patch Available) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk-1.patch Just quick fixing the compile errors in this patch. Will fix checkstyle and find bugs warning in next patch > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk-1.patch, HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Open (was: Patch Available) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Status: Patch Available (was: Open) > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Attachment: HDFS-11804-trunk.patch This is by no means a committable patch. It needs some rework. Changes done in this patch: 1. Only LoadBalancingKMSClientProvider will be instantiated when createProvider is being called. 2. Added three new config keys similar to dfs client. * {{kms.client.failover.attempts.multiplier}}: This is the multiplier factor. For example: if {{kms.client.failover.attempts.multiplier}} is 2 and there are 2 backend kms servers, then client will retry for maximum 4 times. * {{kms.client.failover.sleep.base.millis}}: To calculate failover sleep time. * {{kms.client.failover.sleep.max.millis}}: To calculate maximum amount of time to sleep. 3. The client is configured to use {{FailoverOnNetworkExceptionRetry}} with fallback policy as {{TryOnceThenFail}}. On encountering AccessControlException or AuthorizationException, the client will stop retrying. This assumes all the servers are configured with identical permissions and key acls. 4. Since the {{RetryPolicy#shouldRetry}} function expects whether the option is idempotent or not, We need to decide what all operations are idempotent. Below is my understanding. ** IdempotentOperations: renewDelegationToken, cancelDelegationToken, decryptEncryptedKey, getKeyVersion, getKeys, getKeysMetadata, getKeyVersions, getCurrentKey, getMetadata ** AtMostOnce: createKey, deleteKey, rollNewVersion ** Not sure: addDelegationTokens, generateEncryptedKey, reencryptEncryptedKey Would like to know your views. There are few TODOs in the patch. Would like to know your opinions on that also. Thanks in advance for reviewing my patch. > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > Attachments: HDFS-11804-trunk.patch > > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-11804) KMS client needs retry logic
[ https://issues.apache.org/jira/browse/HDFS-11804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-11804: -- Description: The kms client appears to have no retry logic – at all. It's completely decoupled from the ipc retry logic. This has major impacts if the KMS is unreachable for any reason, including but not limited to network connection issues, timeouts, the +restart during an upgrade+. This has some major ramifications: # Jobs may fail to submit, although oozie resubmit logic should mask it # Non-oozie launchers may experience higher rates if they do not already have retry logic. # Tasks reading EZ files will fail, probably be masked by framework reattempts # EZ file creation fails after creating a 0-length file – client receives EDEK in the create response, then fails when decrypting the EDEK # Bulk hadoop fs copies, and maybe distcp, will prematurely fail was: The kms client appears to have no retry logic – at all. It's completely decoupled from the ipc retry logic. This has major impacts if the KMS is unreachable for any reason, including but limited to network connection issues, timeouts, the +restart during an upgrade+. This has some major ramifications: # Jobs may fail to submit, although oozie resubmit logic should mask it # Non-oozie launchers may experience higher rates if they do not already have retry logic. # Tasks reading EZ files will fail, probably be masked by framework reattempts # EZ file creation fails after creating a 0-length file – client receives EDEK in the create response, then fails when decrypting the EDEK # Bulk hadoop fs copies, and maybe distcp, will prematurely fail > KMS client needs retry logic > > > Key: HDFS-11804 > URL: https://issues.apache.org/jira/browse/HDFS-11804 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Rushabh S Shah > > The kms client appears to have no retry logic – at all. It's completely > decoupled from the ipc retry logic. This has major impacts if the KMS is > unreachable for any reason, including but not limited to network connection > issues, timeouts, the +restart during an upgrade+. > This has some major ramifications: > # Jobs may fail to submit, although oozie resubmit logic should mask it > # Non-oozie launchers may experience higher rates if they do not already have > retry logic. > # Tasks reading EZ files will fail, probably be masked by framework reattempts > # EZ file creation fails after creating a 0-length file – client receives > EDEK in the create response, then fails when decrypting the EDEK > # Bulk hadoop fs copies, and maybe distcp, will prematurely fail -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org