Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8402 )
Change subject: [docs] Document how to recover from a majority failed tablet ...................................................................... Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc File docs/administration.adoc: http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@709 PS4, Line 709: Reviving a tablet that's lost a majority of replicas how about: Bringing a tablet that's lost a majority of replicas back online http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@711 PS4, Line 711: If a tablet has permanently lost a majority of its replicas, it cannot recover It is critical to emphasize that in a majority-lost scenario, permanent data loss is likely, and in fact there is no guarantee that any data can be recovered. It may only be due to luck that they get some or all of their data back after this procedure. We should also emphasize that this procedure should only be performed if it is not possible to bring the majority back online. http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@723 PS4, Line 723: 638a20403e3e4ae3b55d4d07d920e6de (tserver-00:7150): RUNNING [LEADER] This is kind of a cool scenario but this whole thing only works if the leader survives. I think it's worth indicating how to handle this when the leader did not survive as well and a discussion around the implications of that. Actually, if the leader survives, the likelihood of losing data is much lower (although not zero, because it could have been an old, partitioned leader in some nasty cases) http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@760 PS4, Line 760: $ kudu remote_replica delete tserver-01:7150 e822cab6c0584bc0858219d1539a17e6 "delete failed replica" this is not actually required; the master should do it automatically once they get evicted when we do the unsafe config change http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@767 PS4, Line 767: [source,bash] : ---- : $ kudu remote_replica unsafe_change_config <tserver address> <tablet id> <uuid 1> <uuid 2> ... : ---- I found this confusing. It seems like a command, I was trying to figure out who uuid1 and uuid2 were and why we're changing the config to those two, etc. I think we need to pick one of the "prototype" or the "example" for the same command. I actually think the prototype (this example) is more useful than the one below, except that you indicate a "uuid2" which doesn't apply here. http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@775 PS4, Line 775: [source,bash] If you are going to put this in, at least mark it with a label like "Example:" http://gerrit.cloudera.org:8080/#/c/8402/4/docs/administration.adoc@777 PS4, Line 777: $ kudu remote_replica unsafe_change_config tserver-00:7150 e822cab6c0584bc0858219d1539a17e6 638a20403e3e4ae3b55d4d07d920e6de Because having a long UUID for tablet_id and UUID for tablet server id can be confusing, and these example uuids are never going to actually be what a user would paste in, I think something that is sort of a compromise of what you wrote on line 770 and what is here on line 777 would be ideal: $ kudu remote_replica unsafe_change_config tserver-00:7150 <tablet_id> <tserver-00-uuid> explaining that tserver-000-uuid would be the tablet server UUID of the remaining replica on tserver-00 -- To view, visit http://gerrit.cloudera.org:8080/8402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic6326f65d029a1cd75e487b16ce5be4baea2f215 Gerrit-Change-Number: 8402 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Fri, 15 Dec 2017 22:46:36 +0000 Gerrit-HasComments: Yes
