Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17528 )
Change subject: [master] AddMaster and copy sys catalog automatically ...................................................................... Patch Set 4: (27 comments) http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@33 PS4, Line 33: $ mkdir master-1 : $ mkdir master-1/log nit: this could be done in one command: mkdir -p master-1/log http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@35 PS4, Line 35: 7050 nit: maybe 8767 is a better choice here to be more consistent with the port numbering scheme in start_kudu.sh? http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@35 PS4, Line 35: nit: if showing the shell prompt for command lines above, maybe it makes sense to do so for this line as well? http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@35 PS4, Line 35: 7050 nit: would it be better to switch to port number 8766 to be aligned with the start_kudu.sh? BTW, 7050 is also the default port for tablet server's RPC endpoint (but start_kudu.sh doesn't use the default port numbers anyways). http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@123 PS4, Line 123: nit: the indent is off http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@218 PS4, Line 218: if (master_hps) { : master_hps->emplace_back(HostPortFromPB(master.registration().rpc_addresses(0))); : } nit: not exactly a part of this changelist, but consider using a local vector<HostPort> and swap with master_hps only in the end before returning Status::OK() (feel free to ignore, though) http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@226 PS4, Line 226: asynchronous request nit: the LeaderStepDown request is actually sent synchronously, and this method checks for the status IIUC http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@251 PS4, Line 251: LeaderStepDown request is asynchronous nit: the LeaderStepDown request is sent synchronously and starts the process of leadership transfer, but the process of leadership transfer takes time http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1414 PS4, Line 1414: & nit: why to keep this 64-bit reference over 32-bit args.orig_num_masters when it used only once in this scope? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1421 PS4, Line 1421: 10 nit: I recall 10 seconds sometimes wasn't enough in case of TSAN builds -- does it seem to be enough here? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1466 PS4, Line 1466: SleepFor(MonoDelta::FromSeconds(3)); : ASSERT_OK(VerifyVoterMastersForCluster(cluster_->num_masters(), nullptr, cluster_.get())); Would be nice to add a comment why this is necessary given VerifyVoterMastersForCluster() successfully completed just above in the ASSERT_EVENTUALLY clause. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1473 PS4, Line 1473: cluster_->num_masters() nit: these do actually include the current leader master, don't they? If so, does it make sense to refine the comment above? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1477 PS4, Line 1477: an nit: and http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master.proto@1143 PS4, Line 1143: AuthorizeClientOrServiceUser It would be great to clarify why this becomes necessary: probably, the commit description is the best place for that? Also, it looks a bit off compared with the coarse-grained authz requirements for the RemoveMaster() RPC below. Should the latter be updated as well? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc File src/kudu/master/master_runner.cc: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@367 PS4, Line 367: RETURN_NOT_OK(VerifyMastersGetHostPorts(opts.master_addresses(), : local_uuid, server->messenger(), : &leader_hp, &local_hp, : &try_again, &needs_add)); I'm concerned how this is going to play out if it's a restart of 2 out of 3 masters, where one master is unavailable. In that situation, my expectation it's possible to start 2 masters and have the cluster in operable state, so the 3rd failed master could be started (or replaced) afterwards. Does it work the as expected? Anyways, I guess it would be great to add a test scenario to make sure that's the case with these changes. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@374 PS4, Line 374: SleepFor(MonoDelta::FromSeconds(5)); Is it worth adding a LOG about the retry or there will be enough information output into the log to understand why it takes some time to start up a process? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@401 PS4, Line 401: MonoDelta::FromSeconds(30) Should this be controlled by a new run-time flag? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@413 PS4, Line 413: server.reset(); nit: would be nice to add a comment to explain why explicit reset here is necessary given the smart pointer is reset later on at line 418 http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h File src/kudu/mini-cluster/external_mini_cluster.h: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h@283 PS4, Line 283: Status AddMaster(const std::vector<std::string>& extra_flags = {}); Do you think it makes sense to add generic test coverage for this newly added functionality (number of masters, result flags, etc.) into external_mini_cluster-test.cc beyond the related test scenarios in dynamic_multi_master-test.cc? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h@501 PS4, Line 501: input nit: drop? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h@503 PS4, Line 503: Callers are expected nit: does it make sense to add a blurb about the necessity to reserve a port for a new master? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc File src/kudu/mini-cluster/external_mini_cluster.cc: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@663 PS4, Line 663: vector<string> master_addrs_list; Is it used at all in this scope? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@664 PS4, Line 664: for (const auto& master : masters_) { : master_rpc_addrs.emplace_back(master->bound_rpc_hostport()); : } Maybe, re-use ExternalMiniCluster::master_rpc_addrs() instead? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@679 PS4, Line 679: Unable nit: make it lowercase to match with error message at line 668? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@679 PS4, Line 679: Master ditto http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@680 PS4, Line 680: ++opts_.num_masters; nit: maybe, move this down to be along with updating the 'masters_' member field? http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@686 PS4, Line 686: Substitute("--master_addresses=$0", new_master_addrs_list)); Would be nice to add a comment to clarify how --master_addresses is set/updated for the master just added. -- To view, visit http://gerrit.cloudera.org:8080/17528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7635723f30317a45799ad7b9c9b725eafbd735b Gerrit-Change-Number: 17528 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 07 Jul 2021 18:44:55 +0000 Gerrit-HasComments: Yes
