Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7887

to look at the new patch set (#2).

Change subject: WIP [raft_consensus-itest] fix flake in TestSlowLeader
......................................................................

WIP [raft_consensus-itest] fix flake in TestSlowLeader

Under rare conditions, a reader thread of the test workload of the
RaftConsensusITest.TestSlowLeader test might read data from a lagging
follower replica, while the writer threads just switched to a newly
elected leader replica.  When that happened, the test failed with
a stack trace like the following:

F0829 01:47:51.052803  1605 test_workload.cc:230] Check failed: \
  row_count >= expected_row_count (219550 vs. 249850)
*** Check failure stack trace: ***
    @           0x95e83d  google::LogMessage::Fail() \
                            at thirdparty/src/glog-0.3.5/src/logging.cc:1488
    @           0x9606fd  google::LogMessage::SendToLog() \
                            at thirdparty/src/glog-0.3.5/src/logging.cc:1442
    @           0x95e379  google::LogMessage::Flush() \
                            at thirdparty/src/glog-0.3.5/src/logging.cc:1312
    @           0x96119f  google::LogMessageFatal::~LogMessageFatal() \
                            at thirdparty/src/glog-0.3.5/src/logging.cc:2024
    @           0x955809  kudu::TestWorkload::ReadThread() \
                            at tr1/shared_ptr.h:340
    @     0x7fbdbc108a40  (unknown) at ??:0
    @     0x7fbdbd550184  start_thread at ??:0
    @     0x7fbdbbb7637d  clone at ??:0
    @              (nil)  (unknown)

This patch addresses the issue.  Basically, it works around the RYW
consistency issue which is seen here because of the absense of the
leader leases mechanism.

To test the modifications, I run the test 2K times multiple times,
none of those failed (RELEASE build):
  http://dist-test.cloudera.org//job?job_id=aserbin.1504051374.17423

To run the test without the work-around, I applied the patch below
and get 1 out of 2K failed:
  http://dist-test.cloudera.org//job?job_id=aserbin.1504053764.1127

WIP: because I'm not sure whether we want this workaround now
or we would better wait for the leader leases to be implemented.

------------------------------------------------------------------------
--- a/src/kudu/integration-tests/raft_consensus-itest.cc
+++ b/src/kudu/integration-tests/raft_consensus-itest.cc
@@ -2571,7 +2571,7 @@ TEST_F(RaftConsensusITest, TestSlowLeader) {
   if (!AllowSlowTests()) return;

   static const int kHbIntervalMs = 32;
-  static const int kMaxMissedHbPeriods = 3;
+  static const int kMaxMissedHbPeriods = 1;
   const vector<string> tserver_flags = {
     Substitute("--raft_heartbeat_interval_ms=$0", kHbIntervalMs),
     Substitute("--leader_failure_max_missed_heartbeat_periods=$0",
@@ -2586,9 +2586,9 @@ TEST_F(RaftConsensusITest, TestSlowLeader) {
   TestWorkload workload(cluster_.get());
   workload.set_table_name(kTableId);
   workload.set_num_read_threads(2);
-  workload.set_read_retry_enabled(true);
-  workload.set_read_retry_delay(
-      MonoDelta::FromMilliseconds(kHbIntervalMs * kMaxMissedHbPeriods));
+  //workload.set_read_retry_enabled(true);
+  //workload.set_read_retry_delay(
+  //    MonoDelta::FromMilliseconds(kHbIntervalMs * kMaxMissedHbPeriods));
   workload.Setup();
   workload.Start();
   SleepFor(MonoDelta::FromSeconds(60));
------------------------------------------------------------------------

Change-Id: Ie5ee6c5400c947f87b1da2e76d24dd837b1270ca
---
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
3 files changed, 43 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/7887/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7887
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie5ee6c5400c947f87b1da2e76d24dd837b1270ca
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot

Reply via email to