[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9561 ) Change subject: KUDU-2320 apply exponential back-off while deleting replica .. KUDU-2320 apply exponential back-off while deleting replica In some scenarios, the replica to remove might be on a tablet server which hasn't yet registered with the master. For example, that happens when a tablet server where the replica had been hosted went down and stays so when master is restarted. Such a scenario is exercised by RaftConsensusNonVoterITest::RestartClusterWithNonVoter. I ran the RaftConsensusNonVoterITest::RestartClusterWithNonVoter scenario before and after the fix. Before the fix there was a steady high rate of messages, and after the fix the rate of messages stated following the exponential back-off pattern. An example of the output before the fix: I0309 00:07:34.972404 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 13 ms (attempt = 0) W0309 00:07:34.972436 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:34.985633 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 28 ms (attempt = 0) W0309 00:07:34.985673 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:35.014024 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 26 ms (attempt = 0) W0309 00:07:35.014062 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:35.040323 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 19 ms (attempt = 0) W0309 00:07:35.040377 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:35.059588 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 50 ms (attempt = 0) W0309 00:07:35.059628 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a An example of the output after the fix: I0308 22:36:59.251387 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 37 ms (attempt = 2) W0308 22:36:59.251437 5428 catalog_manager.cc:2719] Async tablet task f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb I0308 22:36:59.288799 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 84 ms (attempt = 3) W0308 22:36:59.288851 5428 catalog_manager.cc:2719] Async tablet task f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb I0308 22:36:59.373152 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 146 ms (attempt = 4) W0308 22:36:59.373209 5428 catalog_manager.cc:2719] Async tablet task f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb I0308 22:36:59.519738 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 267 ms (attempt = 5) W0308
[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9561 ) Change subject: KUDU-2320 apply exponential back-off while deleting replica .. Patch Set 2: Verified+1 Seems to be a flake in CreateTableStressTest.CreateAndDeleteBigTable. I'll will take a that separately -- most likely it's unrelated to this change. -- To view, visit http://gerrit.cloudera.org:8080/9561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d Gerrit-Change-Number: 9561 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Mar 2018 18:19:42 + Gerrit-HasComments: No
[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica
Alexey Serbin has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/9561 ) Change subject: KUDU-2320 apply exponential back-off while deleting replica .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d Gerrit-Change-Number: 9561 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9561 ) Change subject: KUDU-2320 apply exponential back-off while deleting replica .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d Gerrit-Change-Number: 9561 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Mar 2018 17:59:53 + Gerrit-HasComments: No
[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9561 ) Change subject: KUDU-2320 apply exponential back-off while deleting replica .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9561/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9561/1//COMMIT_MSG@7 PS1, Line 7: KUDU-2320 apply exponential back-off while deleting replica > nit: please don't abbreviate "exponential" - makes it harder to search the Done. -- To view, visit http://gerrit.cloudera.org:8080/9561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d Gerrit-Change-Number: 9561 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 12 Mar 2018 17:49:01 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9561 to look at the new patch set (#2). Change subject: KUDU-2320 apply exponential back-off while deleting replica .. KUDU-2320 apply exponential back-off while deleting replica In some scenarios, the replica to remove might be on a tablet server which hasn't yet registered with the master. For example, that happens when a tablet server where the replica had been hosted went down and stays so when master is restarted. Such a scenario is exercised by RaftConsensusNonVoterITest::RestartClusterWithNonVoter. I ran the RaftConsensusNonVoterITest::RestartClusterWithNonVoter scenario before and after the fix. Before the fix there was a steady high rate of messages, and after the fix the rate of messages stated following the exponential back-off pattern. An example of the output before the fix: I0309 00:07:34.972404 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 13 ms (attempt = 0) W0309 00:07:34.972436 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:34.985633 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 28 ms (attempt = 0) W0309 00:07:34.985673 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:35.014024 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 26 ms (attempt = 0) W0309 00:07:35.014062 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:35.040323 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 19 ms (attempt = 0) W0309 00:07:35.040377 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a I0309 00:07:35.059588 2029 catalog_manager.cc:2697] Scheduling retry of 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a with a delay of 50 ms (attempt = 0) W0309 00:07:35.059628 2029 catalog_manager.cc:2716] Async tablet task 832f394938da40ca954da7a842e2279b Delete Tablet RPC for TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a An example of the output after the fix: I0308 22:36:59.251387 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 37 ms (attempt = 2) W0308 22:36:59.251437 5428 catalog_manager.cc:2719] Async tablet task f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb I0308 22:36:59.288799 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 84 ms (attempt = 3) W0308 22:36:59.288851 5428 catalog_manager.cc:2719] Async tablet task f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb I0308 22:36:59.373152 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb with a delay of 146 ms (attempt = 4) W0308 22:36:59.373209 5428 catalog_manager.cc:2719] Async tablet task f259598750084d1db309c1659ee818f9 Delete Tablet RPC for TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb I0308 22:36:59.519738 5428 catalog_manager.cc:2700] Scheduling retry of f259598750084d1db309c1659ee818f9 Delete Tablet RPC fo