Mike Percy has submitted this change and it was merged. Change subject: Fix flaky test TestRestartWithOrphanedReplicates ......................................................................
Fix flaky test TestRestartWithOrphanedReplicates Looks like the test was setting a fault injection flag before the setup which was causing the setup to fail. Moving the flag setting to after the setup, but before Start seems to have done the trick. We were getting about 1 failure per 1000 runs before this change, 0 per 1000 after. Dist test job after patch: http://dist-test.cloudera.org/job?job_id=efan.1495643022.19373 Dist test job before patch with failures: http://dist-test.cloudera.org/job?job_id=efan.1495149936.30555 Example from a failure log: I0518 23:26:00.447599 2346 master_service.cc:195] Signed X509 certificate for tserver {username='slave'} at 127.9.10.0:34667 W0518 23:26:00.448045 2529 fault_injection.cc:38] FAULT INJECTION ENABLED! W0518 23:26:00.448065 2529 fault_injection.cc:39] THIS SERVER MAY CRASH! E0518 23:26:00.448072 2529 fault_injection.cc:54] Injecting fault: FLAGS_fault_crash_before_append_commit (process will exit) I0518 23:26:00.448150 2510 heartbeater.cc:380] Master 127.0.0.1:38357 was elected leader, sending a full tablet report... W0518 23:26:00.450608 2331 connection.cc:462] server connection from 127.9.10.0:34667 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104) W0518 23:26:00.450630 2319 connection.cc:462] client connection to 127.9.10.0:51332 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104) W0518 23:26:00.450822 2331 connection.cc:462] client connection to 127.9.10.0:51332 recv error: Network error: failed to read from TLS socket: Connection reset by peer (error 104) F0518 23:26:30.315480 2314 test_workload.cc:266] Timed out: Timed out waiting for Table Creation *** Check failure stack trace: *** @ 0x7fb089e3b2fd google::LogMessage::Fail() at ??:0 @ 0x7fb089e3d1bd google::LogMessage::SendToLog() at ??:0 @ 0x7fb089e3ae39 google::LogMessage::Flush() at ??:0 @ 0x7fb089e3dc5f google::LogMessageFatal::~LogMessageFatal() at ??:0 @ 0x7fb0943b7492 kudu::TestWorkload::Setup() at ??:0 @ 0x40e68e kudu::TsRecoveryITest_TestRestartWithOrphanedReplicates_Test::TestBody() at /home/efan/src/kudu/build/release/../../src/kudu/integration-tests/ts_recovery-itest.cc:107 @ 0x7fb08ad35af8 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 @ 0x7fb08ad29fc2 testing::Test::Run() at ??:0 @ 0x7fb08ad2a108 testing::TestInfo::Run() at ??:0 @ 0x7fb08ad2a1e5 testing::TestCase::Run() at ??:0 @ 0x7fb08ad2a4c8 testing::internal::UnitTestImpl::RunAllTests() at ??:0 @ 0x7fb08ad36008 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 @ 0x7fb08ad2a7ad testing::UnitTest::Run() at ??:0 @ 0x7fb09416566c main at ??:0 @ 0x7fb088083f45 __libc_start_main at ??:0 @ 0x40dfb9 (unknown) at ??:? Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Reviewed-on: http://gerrit.cloudera.org:8080/6976 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin <[email protected]> Reviewed-by: Mike Percy <[email protected]> --- M src/kudu/integration-tests/ts_recovery-itest.cc 1 file changed, 6 insertions(+), 2 deletions(-) Approvals: Mike Percy: Looks good to me, approved Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6976 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ied9a55abd20841d350589ce56aa935ea1feece79 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward Fancher <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Edward Fancher <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]>
