Hello Will Berkeley, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11364

to look at the new patch set (#5).

Change subject: [tests] make master-stress-test more stable
......................................................................

[tests] make master-stress-test more stable

The master-stress-test has been flaky for some time.  After looking
at those failure closely, I found at least five different issues.

This patch addresses the most prominent one: failures of the test
scenario because of timeout errors in case of TSAN builds.  About 9 out
of 10 failures were due to the issue fixed by this patch.  The timeout
errors were triggered by RPC queue overflow and the timing of master
restarts wrt the retry/back-off pattern used by KuduClient and other
test utility code.

The rest of issues behind the flakiness will be addressed separately.

This patch also introduces rpc_negotiation_timeout as a member for
ExternalMiniClusterOptions: that's to customize connection negotiation
timeout for the cluster's utility messenger.

Some statistics about the flakiness:

before the fix:
  37 out of 256 failed in TSAN build, where almost all failures are
  due to the issues fixed by this patch:
    http://dist-test.cloudera.org//job?job_id=aserbin.1535666928.86597

after the fix:
  2 out of 256 failed in TSAN build, where the failure was due to [2]
  (not addressed by this change list, it will be addressed separately):
    http://dist-test.cloudera.org/job?job_id=aserbin.1535665784.64065

A few of other issues due to which the test is still a bit flaky:
  [1] https://issues.apache.org/jira/browse/KUDU-2561
  [2] https://issues.apache.org/jira/browse/KUDU-2564
  [3] https://issues.apache.org/jira/browse/HIVE-19874

By my understanding, Dan found [3] to be the reason behind one type
of HMS-related failures; and there two more to evaluate.

Change-Id: I6b30d8afd4a24acdbd96481cadeaf8f6a9475adf
---
M src/kudu/integration-tests/master-stress-test.cc
M src/kudu/mini-cluster/external_mini_cluster.cc
M src/kudu/mini-cluster/external_mini_cluster.h
3 files changed, 93 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/11364/5
--
To view, visit http://gerrit.cloudera.org:8080/11364
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b30d8afd4a24acdbd96481cadeaf8f6a9475adf
Gerrit-Change-Number: 11364
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com>

Reply via email to