Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/16936 )
Change subject: [master] KUDU-2181 Raft ChangeConfig request to remove a master ...................................................................... Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16936/1/src/kudu/master/dynamic_multi_master-test.cc@805 PS1, Line 805: auto first_master_uuid = cluster_->master(0)->uuid(); > Ah, would running something like WaitForCatalogManager() help? If not, leav I took a look at WaitForCatalogManager() and it relies on ListTables() call so it's possible there could still be a race if using WaitForCatalogManager(). So I'll leave it as is. http://gerrit.cloudera.org:8080/#/c/16936/2/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16936/2/src/kudu/master/dynamic_multi_master-test.cc@779 PS2, Line 779: // Whether the master to be > As a followup, would it make sense, after removing the master, to delete th That's a good point and would serve as a test for recovering a dead master at the same hostport. Let me add that as a test in a follow-up change. http://gerrit.cloudera.org:8080/#/c/16936/2/src/kudu/master/dynamic_multi_master-test.cc@952 PS2, Line 952: > Bug: Should be leader_master_idx. Done http://gerrit.cloudera.org:8080/#/c/16936/3/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16936/3/src/kudu/master/dynamic_multi_master-test.cc@253 PS3, Line 253: returns > nit: return? Done http://gerrit.cloudera.org:8080/#/c/16936/3/src/kudu/master/dynamic_multi_master-test.cc@950 PS3, Line 950: ASSERT_TRUE(s.IsRemoteError()); > nit here and elsewhere in similar ASSERT: just in case if it ever fails/tri Very good point. Done. http://gerrit.cloudera.org:8080/#/c/16936/3/src/kudu/master/master.cc File src/kudu/master/master.cc: http://gerrit.cloudera.org:8080/#/c/16936/3/src/kudu/master/master.cc@533 PS3, Line 533: > nit: stick the ampersand to the type (i.e. to the 'auto') Done http://gerrit.cloudera.org:8080/#/c/16936/3/src/kudu/master/master.cc@575 PS3, Line 575: return Status::IllegalState(Substitute("Found multiple masters with same RPC address $0 " : "and UUID $1", hp.ToString(), uuid)); > I guess this might be changed into LOG(FATAL)? First of, this should not b I'm not really sure how to get out of this situation and will need closer inspection of the cluster as a whole. LOG(FATAL) and crashing the leader looks right. http://gerrit.cloudera.org:8080/#/c/16936/2/src/kudu/mini-cluster/external_mini_cluster.h File src/kudu/mini-cluster/external_mini_cluster.h: http://gerrit.cloudera.org:8080/#/c/16936/2/src/kudu/mini-cluster/external_mini_cluster.h@483 PS2, Line 483: // Removes the master from the ExternalMiniCluster when the master has been removed : // externally by the test after bringing up the ExternalMiniCluster. This helps : // keep the state of the actual cluster in sync with the state in ExternalMiniCluster. > nit: IMO this is still a little confusing, given it's very easy to conflate Done -- To view, visit http://gerrit.cloudera.org:8080/16936 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I76c03b8850faef60b65f85184c0a4db7cc6759ee Gerrit-Change-Number: 16936 Gerrit-PatchSet: 3 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 02 Feb 2021 23:12:18 +0000 Gerrit-HasComments: Yes
