Zoltan Martonka has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20067 )

Change subject: [WIP] KUDU-3458 Startup tserver even if some metadata is 
corrupted
......................................................................


Patch Set 5:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc
File src/kudu/integration-tests/ts_tablet_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1009
PS4, Line 1009: pecta
> tablet?
Thank you, I will fix the table/tablet/replica mess-up.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1010
PS4, Line 1010: s all 3
> It's called 're-replicated' in Kudu's terminology.
Thanks


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1020
PS4, Line 1020:   constexpr char kTableNameR3Failed1[] = "test-table-r3-1f";
              :   constexpr char kTableNameR1[] = "test-table-r1";
              :   constexpr int kExpectedTabletCountsBefore = 3 + 3 + 1;
              :   constexpr int kExpectedTabletCountsAfter = 
kExpectedTabletCountsBefore - 4;
              :   constexpr int kNumTabletServers = 3;
              :   constexpr int kRowsCount = 100;
              :   // If a tablet can not be re-replicated properly, then kudu 
should not touch the corrupted files.
              :   // There is no good way to
> nit: maybe, consider adding the constexpr specifier?
ok


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1029
PS4, Line 1029:
> Would be great to comment on why customizing these flags.
Because they were customised in re-replication tests where a tserver 
disappears, and I wrongly thought the master will wait with re-replication for 
the same time if only a tablet is missing from the restarted tserver. It seem 
to  notice the missing tablet instantly.
I will remove the flag settings.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1090
PS4, Line 1090:
> nit for here and elsewhere: could drop the 'std' prefix since there are cor
ok


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1094
PS4, Line 1094:         ->fs_manager()
              :         ->GetTabletMetadataPath(tablet_name);
              :   };
              :
              :   create_table_with_some_data(kTableNameR3Failed1, 3);
              :   create_table_with_some_data(kTableNameR3Failed3, 3);
              :   create_table_with_some_data(kTableNameR1, 1);
              :
              :   ASSERT_EVENTUALLY([&] { 
ASSERT_EQ(kExpectedTabletCountsBefore, all_tablet_count()); });
              :
              :   
ASSERT_OK(cluster_->mini_master()->WaitForCatalogManagerInit());
              :
              :   string corrupt_1_tablet_id = 
get_tablet_id(kTableNameR3Failed1);
              :   string corrupt_3_tablet_id = 
get_tablet_id(kTableNameR3Failed3);
              :   string corrupt_unreplicated_tablet_id = 
get_tablet_id(kTableNameR1);
              :
              :   A
> You could get rid of this complexity by creating test tables with just a si
ok


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1144
PS4, Line 1144:
> nit: should this be const?
ok


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1166
PS4, Line 1166:   for (int i = 0; i <
> nit: isn't the stream object going to flush automatically on destruction wh
yes it will.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/integration-tests/ts_tablet_manager-itest.cc@1186
PS4, Line 1186:                          std::istreambuf_iterator<char>(f),
> Could this be a point for flakiness due to scheduler anomalies, etc.?  Did
We wait for re-replication in the previous line. ASSERT_EVENTUALY has a 30 sec 
timeout by default, which should be sufficient.
I wanted to test if tserver leaves the other unrestorable metadata files 
untouched. I don't know if there is any better method to test if something does 
not happen. It can't be flaky, because if the logic is right then it will 
always pass, and if the logic is wrong, then we might get a false PASSED for 
not waiting enough for the server to do what it should not.

I run it 100 times using dist-test, it seems to be stable.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc@942
PS4, Line 942:   // * When tablet copy is done the tablet still needs to 
bootstrap which will
             :   //   take some time.
> But the elements are deleted just below, and it can be done in multiple thr
No, it is not. I will fix this.


http://gerrit.cloudera.org:8080/#/c/20067/4/src/kudu/tserver/ts_tablet_manager.cc@944
PS4, Line 944:   //
> In addition to details about locking, it would be great to comment on why i
ok



--
To view, visit http://gerrit.cloudera.org:8080/20067
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id44abaef5238b1de99350d8db43494a7120de60
Gerrit-Change-Number: 20067
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Marton Greber <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Wang Xixu <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>
Gerrit-Comment-Date: Fri, 21 Jul 2023 12:24:44 +0000
Gerrit-HasComments: Yes

Reply via email to