Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11107 )
Change subject: [tools] run rebalancer during 'election storm' ...................................................................... Patch Set 1: (4 comments) I'm guessing you already did, but if you haven't can you run this a bunch of times with dist-test to check it's not flaky, and add some stats about that to the commit msg? http://gerrit.cloudera.org:8080/#/c/11107/1/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/11107/1/src/kudu/tools/kudu-admin-test.cc@2086 PS1, Line 2086: #if defined(ADDRESS_SANITIZER) || defined(THREAD_SANITIZER) : const auto timeout = MonoDelta::FromSeconds(5); : #else : const auto timeout = MonoDelta::FromSeconds(10); I think you swapped the timeouts here because the sanitizer builds should have longer timeouts. http://gerrit.cloudera.org:8080/#/c/11107/1/src/kudu/tools/kudu-admin-test.cc@2093 PS1, Line 2093: auto max_sleep_ms = 2.0; Can you explain where this number comes from? Obviously it's super low compared to the expected time to an election start from a leader stepdown in an idle cluster of about 1.25 seconds, but the Raft heartbeat times are changed for the tests. A couple sentences explaining the initial setting and backoff would be good. http://gerrit.cloudera.org:8080/#/c/11107/1/src/kudu/tools/kudu-admin-test.cc@2109 PS1, Line 2109: for (const auto& tablet : tablets) { With RF=3, running over each TS and replica will cause 3 elections to be called on each tablet. Is that what you want? Or do you want one per tablet per cycle through all the tablets? http://gerrit.cloudera.org:8080/#/c/11107/1/src/kudu/tools/kudu-admin-test.cc@2164 PS1, Line 2164: usually happens because GetConsensusState requests are dropped due to : // backpressure As you mentioned to me in person, we should fix ksck to be resilient to this up to some timeout. -- To view, visit http://gerrit.cloudera.org:8080/11107 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic98684dbe55049bbc411513faa0b6bbaef20f434 Gerrit-Change-Number: 11107 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Thu, 02 Aug 2018 18:15:39 +0000 Gerrit-HasComments: Yes
