Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11218 )

Change subject: KUDU-2538: [docs] Document how to manually recover from Cfile 
corruption
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11218/3/docs/troubleshooting.adoc
File docs/troubleshooting.adoc:

http://gerrit.cloudera.org:8080/#/c/11218/3/docs/troubleshooting.adoc@640
PS3, Line 640: Until 
link:https://issues.apache.org/jira/browse/KUDU-2469[KUDU-2469] is
> Do we want to go into so many details as to list JIRAs?
I wan't to list Jiras so users can see that we are working on fixes and what 
versions they are in once complete. We can update this document once all the 
fixes are in.


http://gerrit.cloudera.org:8080/#/c/11218/3/docs/troubleshooting.adoc@645
PS3, Line 645: 
link:https://kudu.apache.org/docs/command_line_tools_reference.html#cluster-ksck[ksck
 tool].
> These links should be relative:
Done


http://gerrit.cloudera.org:8080/#/c/11218/3/docs/troubleshooting.adoc@655
PS3, Line 655: 
`link:https://kudu.apache.org/docs/command_line_tools_reference.html#remote_replica-delete[remote_replica
 delete tool]`.
> Same here + the backticks should wrap only `remote_replica delete`:
Done


http://gerrit.cloudera.org:8080/#/c/11218/3/docs/troubleshooting.adoc@668
PS3, Line 668: 
`link:https://kudu.apache.org/docs/command_line_tools_reference.html#tablet-unsafe_replace_tablet[unsafe_replace_tablet
 tool]`.
> Same here
Done


http://gerrit.cloudera.org:8080/#/c/11218/3/docs/troubleshooting.adoc@671
PS3, Line 671: sudo -u kudu kudu tablet unsafe_replace_tablet 
<master_addresses> <tablet_id>
> As this tool will only be available in 1.8.0 and above, maybe we
 > should put a note about it and suggest upgrading in case the users
 > run into this corruption on an older release with the only
 > workaround being deleting the whole range partition (or table in
 > case there's no range partitioning).

I don't think suggesting an upgrade will help given users won't come to the 
troubleshooting page unless they have already hit the issue. This is the only 
solution that is straightforward and reliable. All other solutions are case by 
case and I didn't want to do a long this of conditional options.

 > Alternatively, we could also describe how to remove the affected
 > rowset manually from the tablet metadata, but if we do that, we
 > should add a huge warning about it being unsafe.

That is very low level and error prone. I don't think a list of steps that long 
and complicated should be listed in the docs. Hopefully users will ask in the 
mailing list or slack if they are in a situation that this documentation 
doesn't cover/fix.



--
To view, visit http://gerrit.cloudera.org:8080/11218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieefd472bef104921de7cab442fd49ab32c0fe81b
Gerrit-Change-Number: 11218
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Attila Bukor <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Wed, 15 Aug 2018 14:13:49 +0000
Gerrit-HasComments: Yes

Reply via email to