Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8402 to look at the new patch set (#2). Change subject: [docs] Document how to recovery from a majority failed tablet ...................................................................... [docs] Document how to recovery from a majority failed tablet This adds some docs on how to recover when a tablet can no longer find a majority due to the permanent failure of replicas. Manual intervention is required, and basically boils down to 1. copy the data from a healthy replica to where the revived replicas will be 2. set the consensus configuration of the tablet so it matches the new locations of replicas Step 2 requires downtime even for healthy replicas, since new servers can't be added to consensus configs without either rewriting the on-disk cmeta or having a majority available. It might be worth allowing a tool to bypass this restriction so that healthy tablet servers don't need to be shut down in order to recover tablet on unhealthy ones. I tested this procedure by failing tablets in various ways: - deleting important bits like cmeta or tablet metadata - deleting entire data dirs - tombstoning 2/3 replicas (and disabling tombstoned voting) and I was always able to recover using these instructions. Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215 --- M docs/administration.adoc 1 file changed, 104 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/8402/2 -- To view, visit http://gerrit.cloudera.org:8080/8402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215 Gerrit-Change-Number: 8402 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Kudu Jenkins