Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6211

to look at the new patch set (#4).

Change subject: Make ExternalDaemon::StartProcess() handle fault injection
......................................................................

Make ExternalDaemon::StartProcess() handle fault injection

external_mini_cluster-test is failing frequently with:
F0302 18:10:29.412564 22826 ts_itest-base.h:487] Check failed: _s.ok() Bad 
status: Runtime error: /tmp/run_tha_testm13fcv/build/debug/bin/kudu-tserver: 
process exited on signal 13

The reason for the failure is that, under load, tests doing fault
injection might fail before StartProcess() completes.
Sometimes these errors get misreported as termination because
of a SIGPIPE signal (number 13).
     
The fix is to run StartProcess() in loop until it is
successful as long as there is a flag with 'fault' passed
to the daemon.

To reproduce the error I ran this test on dist-test with stress
with the following command:

build-support/dist_test.py loop -n 500 \
build/debug/bin/external_mini_cluster-test \
--stress_cpu_threads=10 --gtest_repeat=10 \
--gtest_break_on_failure

Without this fix the test failed 500/500 times. I inspected
some of the logs and found the same error. The results can
be found at:
http://dist-test.cloudera.org//job?job_id=david.alves.1488478196.9127

With this fix the test passes 500/500 times. The results
can be found at:
http://dist-test.cloudera.org//job?job_id=david.alves.1488481249.22125

Change-Id: I6046e34a321de3e324e20e3d63249e4073712447
---
M src/kudu/integration-tests/external_mini_cluster.cc
M src/kudu/integration-tests/external_mini_cluster.h
2 files changed, 56 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/6211/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6211
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6046e34a321de3e324e20e3d63249e4073712447
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jdcry...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to