Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8402

to look at the new patch set (#3).

Change subject: [docs] Document how to recover from a majority failed tablet
......................................................................

[docs] Document how to recover from a majority failed tablet

This adds some docs on how to recover when a tablet can no longer find
a majority due to the permanent failure of replicas. Manual
intervention is required, and basically boils down to

1. copy the data from a healthy replica to where the revived replicas
will be
2. set the consensus configuration of the tablet so it matches the new
locations of replicas

Step 2 requires downtime even for healthy replicas, since new servers
can't be added to consensus configs without either rewriting the on-disk
cmeta or having a majority available. It might be worth allowing a tool
to bypass this restriction so that healthy tablet servers don't need to
be shut down in order to recover tablet on unhealthy ones.

I tested this procedure by failing tablets in various ways:
- deleting important bits like cmeta or tablet metadata
- deleting entire data dirs
- tombstoning 2/3 replicas (and disabling tombstoned voting)
and I was always able to recover using these instructions.

Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215
---
M docs/administration.adoc
1 file changed, 104 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/8402/3
--
To view, visit http://gerrit.cloudera.org:8080/8402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215
Gerrit-Change-Number: 8402
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wdberke...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

Reply via email to