Yuqi Du has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/19571 )

Change subject: [KUDU-3452] Create tablet without enough healthy tservers
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc@5930
PS2, Line 5930:       
master_->ts_manager()->GetDescriptorsAvailableForPlacement(&ts_descs);
Why move the statement from outer to inner loop? I infer that at a unsteady 
kudu cluster it can pick healthy tserver ASAP?

At this, I have different opinion about this which is relation to lines from 
L5933 to L5938, you can see below


http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc@5931
PS2, Line 5931:       TableMetadataLock table_guard(tablet->table().get(), 
LockMode::READ);
Whether the lock can put into {   TableMetadataLock 
table_guard(tablet->table().get(), LockMode::READ); ...  }  to reduce lock code 
ranges.


http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/master/catalog_manager.cc@5932
PS2, Line 5932:       // Healthy tablet servers are not enough, use unhealthy 
tablet servers.
              :       if 
(FLAGS_enable_create_tablet_without_enough_healthy_tservers &&
              :         ts_descs.size() < table_guard.data().pb.num_replicas()) 
{
              :         ts_descs.clear();
              :         master_->ts_manager()->GetAllDescriptors(&ts_descs);
              :       }
At this, I have different opinion about this which may relation to lines from 
L5930.

For example, pick an unhealthy tserver, that means the left replica must create 
at the specific unhealthy tserver, but the tserver may dead for a long time, 
and during this time, a new tserver joins cluster, the tablet should have been 
unhealthy.

So IMO, if  healthy tservers are not enough by 
'GetDescriptorsAvailableForPlacement', if the number statisfies the (n+1)/2, 
that's ok. When a third server joins or resume from fault, it can become health 
by recovering flow. And it does not matter putting  L5930  before the loop.


http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc
File src/kudu/tools/kudu-tool-test.cc:

http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9182
PS2, Line 9182: catalog_manager_check_ts_count_for_create_table
'catalog_manager_check_ts_count_for_create_table''s default value is true.  If 
the test case set it false. That means the feature should change the flag, 
right?


http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9203
PS2, Line 9203:   SleepFor(MonoDelta::FromMilliseconds(3000));
replacing the 'SleepFor' by checking 'ksck not OK' ?


http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9209
PS2, Line 9209:   NO_FATALS(workload2.Start());
IMO, it's better start a write thread and write some records to prove the table 
is avaliable.


http://gerrit.cloudera.org:8080/#/c/19571/2/src/kudu/tools/kudu-tool-test.cc@9213
PS2, Line 9213: FromMilliseconds
replacing the 'SleepFor' by checking 'ksck not OK' ?



--
To view, visit http://gerrit.cloudera.org:8080/19571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I742ba1ff770f5c8b1be5800334c29bec96e195c6
Gerrit-Change-Number: 19571
Gerrit-PatchSet: 2
Gerrit-Owner: Wang Xixu <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Tue, 07 Mar 2023 06:14:27 +0000
Gerrit-HasComments: Yes

Reply via email to