Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17528 )
Change subject: [master] AddMaster and copy sys catalog automatically ...................................................................... Patch Set 4: (28 comments) http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@33 PS4, Line 33: $ mkdir master-1 : $ mkdir master-1/log > nit: this could be done in one command: Done http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@35 PS4, Line 35: > nit: if showing the shell prompt for command lines above, maybe it makes se Done http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@35 PS4, Line 35: 7050 > nit: maybe 8767 is a better choice here to be more consistent with the port Done http://gerrit.cloudera.org:8080/#/c/17528/4//COMMIT_MSG@35 PS4, Line 35: 7050 > nit: would it be better to switch to port number 8766 to be aligned with th Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@123 PS4, Line 123: > nit: the indent is off Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@218 PS4, Line 218: if (master_hps) { : master_hps->emplace_back(HostPortFromPB(master.registration().rpc_addresses(0))); : } > nit: not exactly a part of this changelist, but consider using a local vect Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@226 PS4, Line 226: asynchronous request > nit: the LeaderStepDown request is actually sent synchronously, and this me Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@251 PS4, Line 251: LeaderStepDown request is asynchronous > nit: the LeaderStepDown request is sent synchronously and starts the proces Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1414 PS4, Line 1414: & > nit: why to keep this 64-bit reference over 32-bit args.orig_num_masters wh It was used more before, but now it's not. Removed http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1421 PS4, Line 1421: 10 > nit: I recall 10 seconds sometimes wasn't enough in case of TSAN builds -- I haven't seen it failing yet in TSAN mode -- especially given there's only a single tablet server by default. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1466 PS4, Line 1466: SleepFor(MonoDelta::FromSeconds(3)); : ASSERT_OK(VerifyVoterMastersForCluster(cluster_->num_masters(), nullptr, cluster_.get())); > Would be nice to add a comment why this is necessary given VerifyVoterMaste Done, it's just a sanity check that the cluster is somewhat stable. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1473 PS4, Line 1473: cluster_->num_masters() > nit: these do actually include the current leader master, don't they? If s Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/dynamic_multi_master-test.cc@1477 PS4, Line 1477: an > nit: and Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master.proto@1143 PS4, Line 1143: AuthorizeClientOrServiceUser > It would be great to clarify why this becomes necessary: probably, the comm Done Maybe at some point, but it isn't necessary for this patch. I erred on the side of only relaxing what's needed. Added a note explaining why the service user is important here. If you feel strongly about keeping them consistent I can change it though. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc File src/kudu/master/master_runner.cc: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@218 PS4, Line 218: if (cstate.has_pending_config()) { : LOG(INFO) << Substitute("Existing masters have pending config: $0", : SecureShortDebugString(cstate.pending_config())); : *needs_retry = true; : return Status::OK(); > Probably thought of already, but does it make sense to verify the raft conf Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@367 PS4, Line 367: RETURN_NOT_OK(VerifyMastersGetHostPorts(opts.master_addresses(), : local_uuid, server->messenger(), : &leader_hp, &local_hp, : &try_again, &needs_add)); > I'm concerned how this is going to play out if it's a restart of 2 out of 3 Added. Keep in mind that the masters are still up, per L352. These additional RPCs are running in the background. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@374 PS4, Line 374: SleepFor(MonoDelta::FromSeconds(5)); > Is it worth adding a LOG about the retry or there will be enough informatio Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@401 PS4, Line 401: MonoDelta::FromSeconds(30) > Should this be controlled by a new run-time flag? Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/master/master_runner.cc@413 PS4, Line 413: server.reset(); > nit: would be nice to add a comment to explain why explicit reset here is n It was an overly conservative reset to ensure we didn't leave anything behind before proceeding to copy. Removed. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h File src/kudu/mini-cluster/external_mini_cluster.h: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h@283 PS4, Line 283: Status AddMaster(const std::vector<std::string>& extra_flags = {}); > Do you think it makes sense to add generic test coverage for this newly add Added some basic coverage, similar to what was already there. Opted not to do more extensive coverage, given these more explicit AddMaster tests exist. http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h@501 PS4, Line 501: input > nit: drop? Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.h@503 PS4, Line 503: Callers are expected > nit: does it make sense to add a blurb about the necessity to reserve a por Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc File src/kudu/mini-cluster/external_mini_cluster.cc: http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@663 PS4, Line 663: vector<string> master_addrs_list; > Is it used at all in this scope? Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@664 PS4, Line 664: for (const auto& master : masters_) { : master_rpc_addrs.emplace_back(master->bound_rpc_hostport()); : } > Maybe, re-use ExternalMiniCluster::master_rpc_addrs() instead? Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@679 PS4, Line 679: Master > ditto Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@679 PS4, Line 679: Unable > nit: make it lowercase to match with error message at line 668? Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@680 PS4, Line 680: ++opts_.num_masters; > nit: maybe, move this down to be along with updating the 'masters_' member Done http://gerrit.cloudera.org:8080/#/c/17528/4/src/kudu/mini-cluster/external_mini_cluster.cc@686 PS4, Line 686: Substitute("--master_addresses=$0", new_master_addrs_list)); > Would be nice to add a comment to clarify how --master_addresses is set/upd Done -- To view, visit http://gerrit.cloudera.org:8080/17528 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7635723f30317a45799ad7b9c9b725eafbd735b Gerrit-Change-Number: 17528 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Abhishek Chennaka <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 10 Jul 2021 00:51:13 +0000 Gerrit-HasComments: Yes
