Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16830 )
Change subject: [consensus] KUDU-2181 Allow sending status msgs to FAILED peers ...................................................................... Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/16830/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16830/4//COMMIT_MSG@28 PS4, Line 28: - Shutdown the new ma > My earlier plan was to split the Raft change and the test procedure. But th I haven't looked in much depth, but I grepped around for FAILED_UNRECOVERABLE and some some tests in consensus_queue-test that test this state. Did you look into testing it there? Were those unit tests not a good fit for the targeted consensus-related functionality we're introducing here? http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/consensus/consensus_queue.cc@74 PS4, Line 74: : DEFINE_bool(consensus_allow_status_msg_for_failed_peer, false, : "Allows status only Raft messages to be sent to a peer in FAILED_UNRECOVERABLE state."); : TAG_FLAG(consensus_allow_status_msg_for_failed_peer, experimental); : TAG_FLAG(consensus_allow_status_msg_for_failed_peer, runtime) > Is the intention to restrict turning on this flag only for masters? Yes, or more strongly, don't expose it as a flag at all, and instead pass it as an argument to RaftConsensus, and have only masters set that argument. http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc File src/kudu/master/dynamic_multi_master-test.cc: http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@139 PS4, Line 139: // has been GC'ed. > With external mini cluster the test will be very similar to the actual proc Ah I see. Yeah I think the difficulty with IMC is providing different flags to each process. I do think it's possible in that both mini cluster types have Shutdown() and Restart() methods, but the flags may be more difficult to wrangle -- especially the runtime ones. http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@154 PS4, Line 154: > Can't use ASSERT_OK() with a lambda that returns a non-void data type. Ah, indeed I overlooked that this was a lambda. http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@416 PS4, Line 416: new_master_->Shutdown(); : cluster_.reset(); : : LOG(INFO) << "Bringing up the migrated cluster"; : opts_.num_masters = orig_num_masters_ + 1; : opts_.master_rpc_addresses = master_hps; : ExternalMiniCluster migrated_cluster(opts_); : ASSERT_OK(migrated_cluster.Start()); : for (int i = 0; i < migrated_cluster.num_masters(); i++) { : ASSERT_OK(migrated_cluster.master(i)->WaitForCatalogManager()); : } : : // Verify the cluster still has the same 3 masters. : { : ListMastersResponsePB resp; : NO_FATALS(RunListMaste > This verification function is called after verifying that the new master ha Right, though this iteration of the test isn't checking any non-system tablets either. Maybe I missed it -- why not use the ClusterVerifier? This code seems like it could be replaced with ClusterVerifier::CheckRowCount(). -- To view, visit http://gerrit.cloudera.org:8080/16830 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I142c1dec442ec72c38c5be9d62cdf270e441d6e3 Gerrit-Change-Number: 16830 Gerrit-PatchSet: 5 Gerrit-Owner: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 15 Dec 2020 21:56:15 +0000 Gerrit-HasComments: Yes
