Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20067 )

Change subject: [WIP] KUDU-3458 Startup tserver even if some metadata is 
corrupted
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20067/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20067/4//COMMIT_MSG@14
PS4, Line 14: 3 tserver setup
            : with RF=3
> If it is a single server cluster, can the cluster become healthy finally?
I don't know what you mean by single server cluster.
If it is a 1 tserver environment with RF=1, then we still lost the tablet and 
the table is unusable. So nothing changed.
If it is a 3 tserver environment with RF=3, and 1 tserver can't restart ONLY 
due to some corrupted metadata file(s), and the corresponding tablet(s) still 
has healthy replicas elsewhere, then yes now it can.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc@949
PS4, Line 949:   //   segment 11 and thus making the new replica unable to 
catch up. Yes, it's
             :   //   not optimal: we're anchoring 0 and we might only need to 
anchor 10/11.
             :   //   However, having no anchor at all is likely to cause 
replicas to start
             :   //   fail copying.
             :   //
             :   //
> Also, why is it necessary to remove those files at all here?  I thought tha
We don't have a metadata, so we don't call the SetTabletToReplace function, so 
the cleanups won't run. Both the meta and consensus file creation checks for 
existence. And we did not clean these files at startup.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc@949
PS4, Line 949:   //   segment 11 and thus making the new replica unable to 
catch up. Yes, it's
             :   //   not optimal: we're anchoring 0 and we might only need to 
anchor 10/11.
             :   //   However, having no anchor at all is likely to cause 
replicas to start
             :   //   fail copying.
             :   //
             :   //
> I'm not sure that's the right way to do this.  At least, there is  TabletMe
I think some non trivial decisions has to be made, so I added WIP to the commit 
message.

If I understand correctly, if we lost the tablet metadata, then we don't know 
which blocks to remove. We can't even hunt for unassigned blocks, because we 
might have an other unreplicated tablet replica corrupted too, and we should 
not delete that ones. We either have to just accept that this garbage will 
remain there, or move the belonging files into a ".backup", so some manual 
cleanup might be possible. I see no way for automatic cleanup.

At the beginning of the copy request, we have no metadata, so calling  
TabletMetadata::DeleteSuperBlock is not possible. We could wait until we 
receive the metadata from the remote, but since block ids are local, 
TabletMetadata::DeleteSuperBlock would result in just deleting the metadata 
file.

We could alternatively delete the files right before creating the new ones or 
add an additional "don't check for existence" param to the create functions, 
something like this (This is just an example, and it works badly if we crash 
during the tablet copy):
https://github.com/apache/kudu/commit/4561ff9b83d988655a6aa3d31e40a27405ff683a

Alternatively we could create some fake tombstones state for the tablet so the 
cleanup functions would delete the consensus and metadata (still no block 
cleanup). But for me this feels much more ugly, than just deleting the files.

I think it is cleaner and safer to just pretend that the tablet is gone with 
everything, if we do not have metadata. But only delete the files if we receive 
a copy request, which tells us that there is still a healthy replica, so we 
don't have to worry about keeping our corrupted files (unless it crashes in the 
near future).

I think the first 2 question that needs to be answered are:

1. Is it OK to behave like we completely lost the tablet, or should we still 
load the consensus data and maybe contribute something to elections?

2. Should we allow re-replication over the corrupted data, or should we refuse 
it? If we save the corrupted files for possible manual cleanup later, then I 
see no point in forbidding it.



--
To view, visit http://gerrit.cloudera.org:8080/20067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id44abaef5238b1de99350d8db43494a7120de60
Gerrit-Change-Number: 20067
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Fri, 21 Jul 2023 13:39:59 +0000
Gerrit-HasComments: Yes

Reply via email to