Yuqi Du has posted comments on this change. ( http://gerrit.cloudera.org:8080/19571 )
Change subject: [KUDU-3452] Create tablet without enough healthy tservers ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc@5930 PS2, Line 5930: master_->ts_manager()->GetDescriptorsAvailableForPlacement(&ts_descs); Why move the statement from outer to inner loop? I infer that at a unsteady kudu cluster it can pick healthy tserver ASAP? At this, I have different opinion about this which is relation to lines from L5933 to L5938, you can see below http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc@5931 PS2, Line 5931: TableMetadataLock table_guard(tablet->table().get(), LockMode::READ); Whether the lock can put into { TableMetadataLock table_guard(tablet->table().get(), LockMode::READ); ... } to reduce lock code ranges. http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc@5932 PS2, Line 5932: // Healthy tablet servers are not enough, use unhealthy tablet servers. : if (FLAGS_enable_create_tablet_without_enough_healthy_tservers && : ts_descs.size() < table_guard.data().pb.num_replicas()) { : ts_descs.clear(); : master_->ts_manager()->GetAllDescriptors(&ts_descs); : } At this, I have different opinion about this which may relation to lines from L5930. For example, pick an unhealthy tserver, that means the left replica must create at the specific unhealthy tserver, but the tserver may dead for a long time, and during this time, a new tserver joins cluster, the tablet should have been unhealthy. So IMO, if healthy tservers are not enough by 'GetDescriptorsAvailableForPlacement', if the number statisfies the (n+1)/2, that's ok. When a third server joins or resume from fault, it can become health by recovering flow. And it does not matter putting L5930 before the loop. http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9182 PS2, Line 9182: catalog_manager_check_ts_count_for_create_table 'catalog_manager_check_ts_count_for_create_table''s default value is true. If the test case set it false. That means the feature should change the flag, right? http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9203 PS2, Line 9203: SleepFor(MonoDelta::FromMilliseconds(3000)); replacing the 'SleepFor' by checking 'ksck not OK' ? http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9209 PS2, Line 9209: NO_FATALS(workload2.Start()); IMO, it's better start a write thread and write some records to prove the table is avaliable. http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9213 PS2, Line 9213: FromMilliseconds replacing the 'SleepFor' by checking 'ksck not OK' ? -- To view, visit http://gerrit.cloudera.org:8080/19571 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I742ba1ff770f5c8b1be5800334c29bec96e195c6 Gerrit-Change-Number: 19571 Gerrit-PatchSet: 2 Gerrit-Owner: Wang Xixu <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yuqi Du <[email protected]> Gerrit-Comment-Date: Tue, 07 Mar 2023 06:14:27 +0000 Gerrit-HasComments: Yes
