Hello Tidy Bot, Mike Percy, David Ribeiro Alves, Kudu Jenkins, Todd Lipcon,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7439
to look at the new patch set (#13).
Change subject: mvcc: allow tablet shutdown without completing txs
......................................................................
mvcc: allow tablet shutdown without completing txs
Currently, the only way to stop an Applying transaction is to wait for
it to finish and Commit. This constraint was put in place to guarantee
on-disk correctness, but is sometimes too strict. E.g. if the tablet is
shutting down, the Apply doesn't need to finish.
This patch adds the ability to "stop" a tablet by shutting down its MVCC
manager. Once this happens, Applies will return and not move on to the
Commit phase, and any methods waiting for the tablet's Applies to Commit
(e.g. new snapshot scans, FlushMRS) will respond with an error
immediately. Applies that are already underway may still Commit, but
these Committed operations are inconsequential w.r.t. consistency;
this behavior of having some in-flight transactions Commit and others
not is consistent with the server crashing in between the Commits of two
transactions.
This will be particularly useful in handling disk failures: if a tablet
needs to be shut down due to disk failure, its MVCC manager can be
stopped immediately, allowing currently-Applying transactions to abort
without Committing.
This patch only includes the behavior when shutting down a tablet, with
the assumption that a tablet will only be shut down when it's being
deleted and we don't care too much about its in-flight transactions
Committing. Code paths that previously crashed Kudu if Applies did not
succeed will now not crash if the MVCC manager is shut down and log a
warning instead.
Testing is done by adding the following:
- a test in mvcc-test to shut down MVCC and delete an Applying
transaction, ensuring that there are no errors when it leaves scope.
- a test in mvcc-test to wait on an Applying transaction, shut down
MVCC, and ensure that any waiters will return with an error.
- a test in tablet_replica-test to register a WriteTransaction,
shut down the tablet's MvccManager, and begin Applying. The
transaction exits early and releases itself from MVCC.
- integration tests in ts_tablet_manager-itest that ensure workloads
complete despite having failed tablets.
Change-Id: I983620f27e7226806a2cca253db7619731914d42
---
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
15 files changed, 401 insertions(+), 106 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/7439/13
--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-Change-Number: 7439
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>