Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8402

to look at the new patch set (#4).

Change subject: [docs] Document how to recover from a majority failed tablet
......................................................................

[docs] Document how to recover from a majority failed tablet

This adds some docs on how to recover when a tablet can no longer find
a majority due to the permanent failure of replicas. Manual
intervention is required, and basically boils down to

1. Tombstone the failed replicas. This deletes their data and
allows Kudu to overwrite the failed replicas, if necessary. Failing
to do this in certain situations prevents the automatic recovery of the
tablet after step 2.
2. Eject the failed replicas from the consensus configuration, so the
remaning healthy replicas can elect a leader. From this point, the
master orchestrates automatic re-replication of the tablet.

I tested this procedure by failing tablets in various ways:
- deleting important bits like cmeta or tablet metadata
- deleting entire data dirs
- tombstoning 2/3 replicas (and disabling tombstoned voting)
and I was always able to recover using these instructions.

Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215
---
M docs/administration.adoc
1 file changed, 80 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/8402/4
--
To view, visit http://gerrit.cloudera.org:8080/8402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215
Gerrit-Change-Number: 8402
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Kudu Jenkins

Reply via email to