Hello Will Berkeley, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11364 to look at the new patch set (#5). Change subject: [tests] make master-stress-test more stable ...................................................................... [tests] make master-stress-test more stable The master-stress-test has been flaky for some time. After looking at those failure closely, I found at least five different issues. This patch addresses the most prominent one: failures of the test scenario because of timeout errors in case of TSAN builds. About 9 out of 10 failures were due to the issue fixed by this patch. The timeout errors were triggered by RPC queue overflow and the timing of master restarts wrt the retry/back-off pattern used by KuduClient and other test utility code. The rest of issues behind the flakiness will be addressed separately. This patch also introduces rpc_negotiation_timeout as a member for ExternalMiniClusterOptions: that's to customize connection negotiation timeout for the cluster's utility messenger. Some statistics about the flakiness: before the fix: 37 out of 256 failed in TSAN build, where almost all failures are due to the issues fixed by this patch: http://dist-test.cloudera.org//job?job_id=aserbin.1535666928.86597 after the fix: 2 out of 256 failed in TSAN build, where the failure was due to [2] (not addressed by this change list, it will be addressed separately): http://dist-test.cloudera.org/job?job_id=aserbin.1535665784.64065 A few of other issues due to which the test is still a bit flaky: [1] https://issues.apache.org/jira/browse/KUDU-2561 [2] https://issues.apache.org/jira/browse/KUDU-2564 [3] https://issues.apache.org/jira/browse/HIVE-19874 By my understanding, Dan found [3] to be the reason behind one type of HMS-related failures; and there two more to evaluate. Change-Id: I6b30d8afd4a24acdbd96481cadeaf8f6a9475adf --- M src/kudu/integration-tests/master-stress-test.cc M src/kudu/mini-cluster/external_mini_cluster.cc M src/kudu/mini-cluster/external_mini_cluster.h 3 files changed, 93 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/11364/5 -- To view, visit http://gerrit.cloudera.org:8080/11364 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6b30d8afd4a24acdbd96481cadeaf8f6a9475adf Gerrit-Change-Number: 11364 Gerrit-PatchSet: 5 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>