[kudu-CR] [master][consensus] Procedure for copying system catalog

Andrew Wong (Code Review) Fri, 11 Dec 2020 17:28:49 -0800

Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16830 )


Change subject: [master][consensus] Procedure for copying system catalog
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16830/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16830/4//COMMIT_MSG@28
PS4, Line 28: One functional change
nit: it seems this patch should focus more on this aspect, rather than being 
touted as a test change, since this won't a test-only thing if we expect to use 
the new flag in a multi-master recovery. Or is the idea to get the test working 
and then iron out the approach for status messages?


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/consensus/consensus_queue.cc@74
PS4, Line 74:
            : DEFINE_bool(consensus_allow_status_msg_for_failed_peer, false,
            :             "Allows status only Raft messages to be sent to a 
peer in FAILED_UNRECOVERABLE state.");
            : TAG_FLAG(consensus_allow_status_msg_for_failed_peer, hidden);
            : TAG_FLAG(consensus_allow_status_msg_for_failed_peer, unsafe);
Do we actually intend on using this flag for our documented procedure to 
recover masters? If so, it shouldn't be unsafe. Also mark it runtime?

Alternatively, would it make sense to pass some option to the Raft 
implementation from the master that enables this?


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc
File src/kudu/master/dynamic_multi_master-test.cc:

http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@139
PS4, Line 139:   void StartClusterWithSysCatalogGCed(vector<HostPort>* 
master_hps,
I seem to recall discussing the need to GC in order to have the replica _not_ 
be catch-up-able through the WALs. But I don't recall why not just use an 
internal mini cluster and call the GC functions directly? I suppose there might 
be issues with --master_support_change_config, but it isn't an obvious blocker 
to using an IMC?

I don't feel strongly about it to request a rewrite, but I'm curious what the 
rationale is since it seems like it might simplify the tests.


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@154
PS4, Line 154: CHECK_OK
nit: ASSERT_OK?


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@171
PS4, Line 171:     int64_t time_left_to_sleep_msecs = 2000;
nit: more idiomatic to use a MonoTime as a deadline and use MonoTime::Now()


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@177
PS4, Line 177: ASSERT_GT(time_left_to_sleep_msecs, 0)
             :       << "Timed out waiting for system catalog WAL to be GC'ed";
nit: how about assert on the GC count? Since, as long as we GCed, it's not a 
big deal if we just missed the timeout?


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@343
PS4, Line 343:     ASSERT_EVENTUALLY([&] {
nit: I typically find it cleaner to have callers do the ASSERT_EVENTUALLY() and 
have test methods like this be single-try functions. We could even reuse the 
single-try variant if we do have stronger guarantees. As is, it imposes some 
assumptions about how the method will be used (i.e. that it's going to likely 
take multiple tries).


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@416
PS4, Line 416:  LOG(INFO) << "Verifying the first table";
             :     // Verify the table created before adding the new master.
             :     shared_ptr<KuduTable> table;
             :     ASSERT_OK(client->OpenTable(kTableName, &table));
             :     KuduScanner scanner1(table.get());
             :     size_t row_count = 0;
             :     ASSERT_OK(client::CountRowsWithRetries(&scanner1, 
&row_count));
             :     ASSERT_EQ(0, row_count);
             :
             :     LOG(INFO) << "Creating and verifying the second table";
             :     // Perform an operation that requires replication to masters.
             :     ASSERT_OK(CreateTable(&migrated_cluster, "second_table"));
             :     ASSERT_OK(client->OpenTable("second_table", &table));
             :     KuduScanner scanner2(table.get());
             :     ASSERT_OK(client::CountRowsWithRetries(&scanner2, 
&row_count));
             :     ASSERT_EQ(0, row_count);
We're still not necessarily testing that the new master is caught up with the 
rest of the masters here, since the OpenTable() request and GetTableLocations() 
requests go through whichever master is elected leader, which may or may not be 
the new master.

Another thought is to use a ClusterVerifier, which runs ksck and runs some 
scans, which seems like a superset of what's done here.


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@433
PS4, Line 433:     LOG(INFO) << "Pausing and resuming individual masters";
nit: move this into the if block?


http://gerrit.cloudera.org:8080/#/c/16830/4/src/kudu/master/dynamic_multi_master-test.cc@588
PS4, Line 588:   SKIP_IF_SLOW_NOT_ALLOWED();
             :
             :   vector<HostPort> master_hps;
             :   NO_FATALS(StartClusterWithSysCatalogGCed(&master_hps,
             :                                            
{"--consensus_allow_status_msg_for_failed_peer"}));
             :   ASSERT_OK(CreateTable(cluster_.get(), kTableName));
             :   // Bring up the new master and add to the cluster.
             :   master_hps.emplace_back(reserved_hp_);
             :   NO_FATALS(StartNewMaster(master_hps));
             :   ASSERT_OK(AddMasterToCluster(reserved_hp_));
             :
             :   // Newly added master will be added to the master Raft config 
but won't be caught up
             :   // from the WAL and hence remain as a NON_VOTER.
             :   ListMastersResponsePB list_resp;
             :   NO_FATALS(RunListMasters(&list_resp));
             :   ASSERT_EQ(orig_num_masters_ + 1, list_resp.masters_size());
             :
             :   for (const auto& master : list_resp.masters()) {
             :     ASSERT_EQ(1, master.registration().rpc_addresses_size());
             :     auto hp = 
HostPortFromPB(master.registration().rpc_addresses(0));
             :     if (hp == reserved_hp_) {
             :       ASSERT_EQ(consensus::RaftPeerPB::NON_VOTER, 
master.member_type());
             :       ASSERT_TRUE(master.role() == 
consensus::RaftPeerPB::LEARNER);
             :     }
             :   }
             :
             :   // Double check by directly contacting the new master.
             :   NO_FATALS(VerifyNewMasterDirectly({ 
consensus::RaftPeerPB::LEARNER },
             :                                     
consensus::RaftPeerPB::NON_VOTER));
             :
             :   // Verify new master is in FAILED_UNRECOVERABLE state.
             :   NO_FATALS(VerifyNewMasterInFailedUnrecoverableState());
If this is for the most part a copy TestAddMasterCatchupFromWALNotPossible, why 
not just have this test supersede it? We could even set 
--consensus_allow_status_msg_for_failed_peer at runtime if the idea is to test 
when --consensus_allow_status_msg_for_failed_peer=false.



--
To view, visit http://gerrit.cloudera.org:8080/16830
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I142c1dec442ec72c38c5be9d62cdf270e441d6e3
Gerrit-Change-Number: 16830
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Sat, 12 Dec 2020 01:28:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] [master][consensus] Procedure for copying system catalog

Reply via email to