Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/7440
to look at the new patch set (#9).
Change subject: KUDU-1407: reassign failed tablets
......................................................................
KUDU-1407: reassign failed tablets
Tablets put into the state tablet::FAILED are left until they are
manually deleted; they are not evicted and reassigned. If a tablet
fails to bootstrap, it will sit, responding to heartbeats, doing nothing
else. This patch ensures failed tablets will be reassigned.
As the tablets are not used, rather than directly setting replicas to
FAILED, an error is first recorded and the TabletReplica::Shutdown(), leaving
the final state as FAILED. A replica can no longer leave the FAILED
state (calls to Shutdown() leave it FAILED). The tserver response
generated by FAILED tablets is now TABLET_FAILED. Upon receiving this, a
leader will immediately evict the peer.
Prior to this patch, a tablet was marked FAILED if its WAL or metadata
failed to delete (after already shutting down). If this occurs, there
may be an inconsistency on-disk. This has been made fatal.
Testing is done in a few places:
- raft_consensus-itest is updated to ensure that tablets that fail to
bootstrap are evicted and replaced.
- tablet_server-test is also updated to ensure that, instead of
TABLET_NOT_RUNNING, TABLET_FAILED is returned by failed tablets.
- a test is added to ts_tablet_manager-itest is added to test that
failed tablets while running are evicted and replaced.
This patch is a part of a series of patches to handle disk failure. See
section 2.5 in this doc:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit
Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
---
M src/kudu/client/scanner-internal.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/consensus/consensus_queue.h
M src/kudu/integration-tests/raft_consensus-itest.cc
M src/kudu/integration-tests/ts_recovery-itest.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tserver/tablet_server-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver.proto
14 files changed, 160 insertions(+), 59 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/9
--
To view, visit http://gerrit.cloudera.org:8080/7440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>