Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
......................................................................


Patch Set 5: Code-Review+1

(2 comments)

Overall looks good, but it would be great to add a test scenario which would 
exercise one other realistic scenario when the names many other tablet servers 
in the cluster are still resolvable, so the system replaces the failed peer 
with a new one for the time of the DNS resolver's malfunction for one 
particular kudu-tserver node.

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167
PS4, Line 167: TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
I'm curious: in the scope of this test, is it important or not to have 
connections already established between Raft peers?  Will anything change if 
adding --rpc_reopen_outbound_connections=true in this scenario?


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247
PS4, Line 247:   // Once we stop failing DNS resolution, the peer should become 
healthy.
             :   for (const auto& ts : tablet_servers_) {
             :     
ASSERT_OK(cluster_->SetFlag(cluster_->tablet_server_by_uuid(ts.first),
             :               "fail_dns_resolution", "false"));
             :   }
             :   NO_FATALS(check_for_failed_peer(/*expect_failed*/false));
In addition to this happy-ending, could you add a scenario when there is an 
extra tablet servers to spawn a replacement replica?  I guess the expected 
result would be having the peer with failed DNS name replaced, and it's 
important to make sure everything works as expected in that case.



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 11 Jun 2021 19:52:17 +0000
Gerrit-HasComments: Yes

Reply via email to