[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica

2018-03-12 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9561 )

Change subject: KUDU-2320 apply exponential back-off while deleting replica
..

KUDU-2320 apply exponential back-off while deleting replica

In some scenarios, the replica to remove might be on a tablet
server which hasn't yet registered with the master.  For example,
that happens when a tablet server where the replica had been hosted
went down and stays so when master is restarted.  Such a scenario
is exercised by RaftConsensusNonVoterITest::RestartClusterWithNonVoter.

I ran the RaftConsensusNonVoterITest::RestartClusterWithNonVoter
scenario before and after the fix.  Before the fix there was a steady
high rate of messages, and after the fix the rate of messages stated
following the exponential back-off pattern.

An example of the output before the fix:
  I0309 00:07:34.972404  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 13 ms (attempt = 0)
  W0309 00:07:34.972436  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:34.985633  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 28 ms (attempt = 0)
  W0309 00:07:34.985673  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:35.014024  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 26 ms (attempt = 0)
  W0309 00:07:35.014062  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:35.040323  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 19 ms (attempt = 0)
  W0309 00:07:35.040377  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:35.059588  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 50 ms (attempt = 0)
  W0309 00:07:35.059628  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a

An example of the output after the fix:
  I0308 22:36:59.251387  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 37 ms (attempt = 2)
  W0308 22:36:59.251437  5428 catalog_manager.cc:2719] Async tablet task 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb
  I0308 22:36:59.288799  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 84 ms (attempt = 3)
  W0308 22:36:59.288851  5428 catalog_manager.cc:2719] Async tablet task 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb
  I0308 22:36:59.373152  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 146 ms (attempt = 4)
  W0308 22:36:59.373209  5428 catalog_manager.cc:2719] Async tablet task 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb
  I0308 22:36:59.519738  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 267 ms (attempt = 5)
  W0308 

[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica

2018-03-12 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9561 )

Change subject: KUDU-2320 apply exponential back-off while deleting replica
..


Patch Set 2: Verified+1

Seems to be a flake in CreateTableStressTest.CreateAndDeleteBigTable.  I'll 
will take a that separately -- most likely it's unrelated to this change.


--
To view, visit http://gerrit.cloudera.org:8080/9561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d
Gerrit-Change-Number: 9561
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Mar 2018 18:19:42 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica

2018-03-12 Thread Alexey Serbin (Code Review)
Alexey Serbin has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/9561 )

Change subject: KUDU-2320 apply exponential back-off while deleting replica
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/9561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d
Gerrit-Change-Number: 9561
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica

2018-03-12 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9561 )

Change subject: KUDU-2320 apply exponential back-off while deleting replica
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d
Gerrit-Change-Number: 9561
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Mar 2018 17:59:53 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica

2018-03-12 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9561 )

Change subject: KUDU-2320 apply exponential back-off while deleting replica
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9561/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9561/1//COMMIT_MSG@7
PS1, Line 7: KUDU-2320 apply exponential back-off while deleting replica
> nit: please don't abbreviate "exponential" - makes it harder to search the
Done.



--
To view, visit http://gerrit.cloudera.org:8080/9561
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia12d261d7270aae7fafe877780b547d262aef16d
Gerrit-Change-Number: 9561
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 12 Mar 2018 17:49:01 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2320 apply exponential back-off while deleting replica

2018-03-12 Thread Alexey Serbin (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9561

to look at the new patch set (#2).

Change subject: KUDU-2320 apply exponential back-off while deleting replica
..

KUDU-2320 apply exponential back-off while deleting replica

In some scenarios, the replica to remove might be on a tablet
server which hasn't yet registered with the master.  For example,
that happens when a tablet server where the replica had been hosted
went down and stays so when master is restarted.  Such a scenario
is exercised by RaftConsensusNonVoterITest::RestartClusterWithNonVoter.

I ran the RaftConsensusNonVoterITest::RestartClusterWithNonVoter
scenario before and after the fix.  Before the fix there was a steady
high rate of messages, and after the fix the rate of messages stated
following the exponential back-off pattern.

An example of the output before the fix:
  I0309 00:07:34.972404  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 13 ms (attempt = 0)
  W0309 00:07:34.972436  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:34.985633  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 28 ms (attempt = 0)
  W0309 00:07:34.985673  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:35.014024  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 26 ms (attempt = 0)
  W0309 00:07:35.014062  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:35.040323  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 19 ms (attempt = 0)
  W0309 00:07:35.040377  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a
  I0309 00:07:35.059588  2029 catalog_manager.cc:2697] Scheduling retry of 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a with a delay of 50 ms (attempt = 0)
  W0309 00:07:35.059628  2029 catalog_manager.cc:2716] Async tablet task 
832f394938da40ca954da7a842e2279b Delete Tablet RPC for 
TS=76ea4539475745e8983bab0e501d803a failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 76ea4539475745e8983bab0e501d803a

An example of the output after the fix:
  I0308 22:36:59.251387  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 37 ms (attempt = 2)
  W0308 22:36:59.251437  5428 catalog_manager.cc:2719] Async tablet task 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb
  I0308 22:36:59.288799  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 84 ms (attempt = 3)
  W0308 22:36:59.288851  5428 catalog_manager.cc:2719] Async tablet task 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb
  I0308 22:36:59.373152  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb with a delay of 146 ms (attempt = 4)
  W0308 22:36:59.373209  5428 catalog_manager.cc:2719] Async tablet task 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC for 
TS=46d5f24c1096492d83b909cd0116edbb failed: Not found: failed to reset TS 
proxy: Could not find TS for UUID 46d5f24c1096492d83b909cd0116edbb
  I0308 22:36:59.519738  5428 catalog_manager.cc:2700] Scheduling retry of 
f259598750084d1db309c1659ee818f9 Delete Tablet RPC fo