Hello David Ribeiro Alves,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/3321

to review the following change.

Change subject: KUDU-1477. Pending COMMIT message for failed write operation 
can prevent tablet startup
......................................................................

KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet 
startup

This fixes a bug seen in a recent YCSB stress test that I ran
in which I was accidentally writing tens of thousands of duplicate
keys per second. After a tablet server restarted, it failed to come
up due to a pending commit which referred to no mutated stores.

This patch tweaks the logic for this safety check: a commit with no
mutated stores trivially has "no active stores". However, that's not
the same as having "only inactive stores" -- the subtlety is in the
difference in behavior when a commit has no stores at all.

The patch adds a new targeted test in tablet_bootstrap-test as well as
a more end-to-end test in ts_recovery-itest. Both reproduced the bug
reliably before this patch.

Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
---
M src/kudu/consensus/log-test-base.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/test_workload.cc
M src/kudu/integration-tests/test_workload.h
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/tablet/tablet_bootstrap-test.cc
M src/kudu/tablet/tablet_bootstrap.cc
7 files changed, 136 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/3321/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3321
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <t...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <david.al...@cloudera.com>

Reply via email to