Zoltan Martonka has posted comments on this change. ( http://gerrit.cloudera.org:8080/20067 )
Change subject: [WIP] KUDU-3458 Startup tserver even if some metadata is corrupted ...................................................................... Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/20067/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20067/4//COMMIT_MSG@14 PS4, Line 14: 3 tserver setup : with RF=3 > If it is a single server cluster, can the cluster become healthy finally? I don't know what you mean by single server cluster. If it is a 1 tserver environment with RF=1, then we still lost the tablet and the table is unusable. So nothing changed. If it is a 3 tserver environment with RF=3, and 1 tserver can't restart ONLY due to some corrupted metadata file(s), and the corresponding tablet(s) still has healthy replicas elsewhere, then yes now it can. http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc@949 PS4, Line 949: // segment 11 and thus making the new replica unable to catch up. Yes, it's : // not optimal: we're anchoring 0 and we might only need to anchor 10/11. : // However, having no anchor at all is likely to cause replicas to start : // fail copying. : // : // > Also, why is it necessary to remove those files at all here? I thought tha We don't have a metadata, so we don't call the SetTabletToReplace function, so the cleanups won't run. Both the meta and consensus file creation checks for existence. And we did not clean these files at startup. http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc@949 PS4, Line 949: // segment 11 and thus making the new replica unable to catch up. Yes, it's : // not optimal: we're anchoring 0 and we might only need to anchor 10/11. : // However, having no anchor at all is likely to cause replicas to start : // fail copying. : // : // > I'm not sure that's the right way to do this. At least, there is TabletMe I think some non trivial decisions has to be made, so I added WIP to the commit message. If I understand correctly, if we lost the tablet metadata, then we don't know which blocks to remove. We can't even hunt for unassigned blocks, because we might have an other unreplicated tablet replica corrupted too, and we should not delete that ones. We either have to just accept that this garbage will remain there, or move the belonging files into a ".backup", so some manual cleanup might be possible. I see no way for automatic cleanup. At the beginning of the copy request, we have no metadata, so calling TabletMetadata::DeleteSuperBlock is not possible. We could wait until we receive the metadata from the remote, but since block ids are local, TabletMetadata::DeleteSuperBlock would result in just deleting the metadata file. We could alternatively delete the files right before creating the new ones or add an additional "don't check for existence" param to the create functions, something like this (This is just an example, and it works badly if we crash during the tablet copy): https://github.com/apache/kudu/commit/4561ff9b83d988655a6aa3d31e40a27405ff683a Alternatively we could create some fake tombstones state for the tablet so the cleanup functions would delete the consensus and metadata (still no block cleanup). But for me this feels much more ugly, than just deleting the files. I think it is cleaner and safer to just pretend that the tablet is gone with everything, if we do not have metadata. But only delete the files if we receive a copy request, which tells us that there is still a healthy replica, so we don't have to worry about keeping our corrupted files (unless it crashes in the near future). I think the first 2 question that needs to be answered are: 1. Is it OK to behave like we completely lost the tablet, or should we still load the consensus data and maybe contribute something to elections? 2. Should we allow re-replication over the corrupted data, or should we refuse it? If we save the corrupted files for possible manual cleanup later, then I see no point in forbidding it. -- To view, visit http://gerrit.cloudera.org:8080/20067 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id44abaef5238b1de99350d8db43494a7120de60 Gerrit-Change-Number: 20067 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Martonka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Marton Greber <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wang Xixu <[email protected]> Gerrit-Reviewer: Zoltan Martonka <[email protected]> Gerrit-Comment-Date: Fri, 21 Jul 2023 13:39:59 +0000 Gerrit-HasComments: Yes
