Mike Percy has posted comments on this change. Change subject: consensus: KUDU-2147. Unknown leader should not be treated as valid UUID ......................................................................
Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/8109/1/src/kudu/consensus/consensus_meta-test.cc File src/kudu/consensus/consensus_meta-test.cc: PS1, Line 286: cmeta->set_leader_uuid(""); > mayber we should have something like cmeta->clear_leader_uuid() then? We could do that but I would like to do it in a separate change. I filed KUDU-2150 to track that. http://gerrit.cloudera.org:8080/#/c/8109/1/src/kudu/integration-tests/raft_config_change-itest.cc File src/kudu/integration-tests/raft_config_change-itest.cc: PS1, Line 56: The : // master should update its record of which replica is the leader after a new : // leader is elected. > Which part of the test asserts this? Would the last WaitForServersToAgree f correct - the last WaitForServersToAgree() would fail because the master would be unable to add the removed server back, which is missing the record that removed it. Line 96: TServerDetails* leader = nullptr; > Nit: don't need to initialize this; it'll always be written to by FindTable I generally don't like defensive programming but I've been bitten by bad pointers before and I hate leaving pointers around with random stack memory addresses so I typically never do it. But done. Line 113: // Leader delays heartbeats for 2 sec; followers delay by even longer. > You already wrote this on L102. Done PS1, Line 114: if (cluster_->tablet_server(i)->uuid() != leader->uuid()) { : followers.push_back(ts_map_[cluster_->tablet_server(i)->uuid()]); : } > How about doing this work in the loop on L101? Done PS1, Line 124: When it : // sends a tablet report to the master with the new configuration excluding : // the removed tablet it will report an unknown leader in the new term. > Is there a way to assert on this programmatically? We are asserting on it as a high level (see my comment above) but I don't see an easy way to inspect what the master actually receives. I looked at the logs when developing this test and I can tell you that it generally works as described. PS1, Line 130: // Wait until the master re-adds the evicted replica and it is back up and : // running. : ASSERT_OK(WaitForServersToAgree(kTimeout, ts_map_, tablet_id, 1)); > I might be missing something, but shouldn't it be under the ASSERT_EVENTUAL It wouldn't buy us much. This function retries. http://gerrit.cloudera.org:8080/#/c/8109/1/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: > If the conclusion is that cstate.has_leader_uuid() is untrustworthy, there Done http://gerrit.cloudera.org:8080/#/c/8109/1/src/kudu/tserver/heartbeater.cc File src/kudu/tserver/heartbeater.cc: PS1, Line 384: // Inject latency for testing purposes. : if (PREDICT_FALSE(FLAGS_heartbeat_inject_latency_before_heartbeat_ms > 0)) { : TRACE("Injecting $0ms of latency due to --heartbeat_inject_latency_before_heartbeat_ms", : FLAGS_heartbeat_inject_latency_before_heartbeat_ms); : SleepFor(MonoDelta::FromMilliseconds(FLAGS_heartbeat_inject_latency_before_heartbeat_ms)); : } > side note we should come up with some reasonable macros for these things Yeah this is the third time I've done this so it's a pattern. But it's only a few lines of code, so the benefit is marginal. -- To view, visit http://gerrit.cloudera.org:8080/8109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
