[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217 PS8, Line 217: bool has_failed_peer = false; > I appreciate you updated this scenario to make it reliable and robust! Thanks for the suggestion! -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 12 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 12 Jun 2021 18:14:54 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Reviewed-on: http://gerrit.cloudera.org:8080/17585 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 11 files changed, 233 insertions(+), 43 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 12 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 11: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217 PS8, Line 217: bool has_failed_peer = false; > (the tuning did not work well, as it turns out) I appreciate you updated this scenario to make it reliable and robust! -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 12 Jun 2021 17:29:17 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#11). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 11 files changed, 233 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/11 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 11 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217 PS8, Line 217: for (auto& ts : tablet_servers_) { > Yep, I spent some time tuning some flag values, but this is a better approa (the tuning did not work well, as it turns out) -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 12 Jun 2021 06:24:09 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217 PS8, Line 217: for (auto& ts : tablet_servers_) { > It seems this fails rather often, even in RELEASE build: Yep, I spent some time tuning some flag values, but this is a better approach. -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 12 Jun 2021 06:23:37 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#10). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 11 files changed, 233 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/10 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 10 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#9). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 11 files changed, 232 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/9 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 8: Code-Review+1 (3 comments) Overall looks good to me, but it seems the rate of flakiness for the newly added test is a bit high. http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167 PS4, Line 167: TEST_F(RaftConsensusElectionITest, TestNewLeaderCantResolvePeers) { > Tried it and the test passes. Cool, thanks a lot for verifying that! http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247 PS4, Line 247: }); : }; : NO_FATALS(wait_for_some_elections()); : NO_FATALS(cluster_->AssertNoCrashes()); : : // Stop injecting latency so we can get a stable leader. > Done Thank you! http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217 PS8, Line 217: ASSERT_LT(initial_cstate.current_term() + 10, cstate.current_term()); It seems this fails rather often, even in RELEASE build: src/kudu/integration-tests/raft_consensus_election-itest.cc:218 Expected: (initial_cstate.current_term() + 10) < (cstate.current_term()), actual: 30 vs 22 Maybe, it's worth switch this scenario into manual election mode? -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 12 Jun 2021 01:55:06 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#8). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 9 files changed, 241 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/8 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#7). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 9 files changed, 241 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/7 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167 PS4, Line 167: TEST_F(RaftConsensusElectionITest, TestNewLeaderCantResolvePeers) { > I'm curious: in the scope of this test, is it important or not to have conn Tried it and the test passes. http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247 PS4, Line 247: if (peer.health_report().overall_health() == consensus::HealthReportPB::FAILED) { : has_failed_peer = true; : break; : } : } : } > In addition to this happy-ending, could you add a scenario when there is an Done -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 11 Jun 2021 23:54:33 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#6). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 9 files changed, 247 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/6 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 5: Code-Review+1 (2 comments) Overall looks good, but it would be great to add a test scenario which would exercise one other realistic scenario when the names many other tablet servers in the cluster are still resolvable, so the system replaces the failed peer with a new one for the time of the DNS resolver's malfunction for one particular kudu-tserver node. http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc File src/kudu/integration-tests/raft_consensus_election-itest.cc: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167 PS4, Line 167: TEST_F(RaftConsensusElectionITest, TestNewLeaderCantResolvePeers) { I'm curious: in the scope of this test, is it important or not to have connections already established between Raft peers? Will anything change if adding --rpc_reopen_outbound_connections=true in this scenario? http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247 PS4, Line 247: // Once we stop failing DNS resolution, the peer should become healthy. : for (const auto& ts : tablet_servers_) { : ASSERT_OK(cluster_->SetFlag(cluster_->tablet_server_by_uuid(ts.first), : "fail_dns_resolution", "false")); : } : NO_FATALS(check_for_failed_peer(/*expect_failed*/false)); In addition to this happy-ending, could you add a scenario when there is an extra tablet servers to spawn a replacement replica? I guess the expected result would be having the peer with failed DNS name replaced, and it's important to make sure everything works as expected in that case. -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 11 Jun 2021 19:52:17 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h File src/kudu/consensus/consensus-test-util.h: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h@341 PS4, Line 341: class NoOpTestPeerProxy : public TestPeerProxy { > warning: parameter 'peer_pb' is unused [misc-unused-parameters] Done http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h@342 PS4, Line 342: public: > This could be dangerous. The ownership of the raw ptr has been transferred Existing test usage of Peers has been unsafe in similar ways (e.g. creating a proxy as a raw pointer and passing ownership to the Peers). Updated this impl a bit to at least pass ASAN, and added a comment that this can be unsafe and that users should be wary. If you feel strongly that this isn't sufficient for tests, I can go about making more drastic changes. http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc File src/kudu/consensus/consensus_peers-test.cc: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc@152 PS4, Line 152: , > This piece looks problematic even before this change. Done -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 11 Jun 2021 19:47:05 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Tidy Bot, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#5). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 9 files changed, 194 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/5 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h File src/kudu/consensus/consensus-test-util.h: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h@342 PS4, Line 342: *proxy = std::unique_ptr(proxy_); This could be dangerous. The ownership of the raw ptr has been transferred to the unique_ptr but the proxy_ still points to a location which can be deleted by the caller of NewProxy() holding the unique_ptr. It'd be better if proxy_ itself is a unique_ptr. http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc File src/kudu/consensus/consensus_peers-test.cc: http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc@152 PS4, Line 152: return proxy_ptr; > warning: Use of memory after it is freed [clang-analyzer-cplusplus.NewDelet This piece looks problematic even before this change. -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 11 Jun 2021 19:21:18 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc@508 PS2, Line 508: d::move(peer_proxy) > From the test code looks like this could be null. Do we need to check for n Done http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc File src/kudu/consensus/peer_manager.cc: http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc@77 PS2, Line 77: unique_ptr peer_proxy; : WARN_NOT_OK(peer_proxy_factory_->NewProxy(peer_pb, _proxy), : Substitute("Unable to create proxy for $0", peer_pb.permanent_uuid())); > Since the proxy factory is being supplied to NewRemotePeer, this could move I wanted to not be too invasive with the API for tests originally, but turns out this was only used in one test. -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 11 Jun 2021 18:30:37 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#4). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 9 files changed, 207 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/4 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h File src/kudu/consensus/peer_manager.h: http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h@60 PS2, Line 60: // proxies also re-attempt to re-resolve their addresses. > What happens currently when the peers are not retried? Ah I left this from before I started re-resolving. Peers are retried in this patch. -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 11 Jun 2021 17:43:47 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#3). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 8 files changed, 161 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/3 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc@508 PS2, Line 508: peer_proxy_factory_ >From the test code looks like this could be null. Do we need to check for >not-null ptr or return an error? http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc File src/kudu/consensus/peer_manager.cc: http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc@77 PS2, Line 77: unique_ptr peer_proxy; : WARN_NOT_OK(peer_proxy_factory_->NewProxy(peer_pb, _proxy), : Substitute("Unable to create proxy for $0", peer_pb.permanent_uuid())); Since the proxy factory is being supplied to NewRemotePeer, this could move to NewRemotePeer() itself utilizing the CreateIfNeeded() function and no need to supply peer_proxy to NewRemotePeer(). -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 11 Jun 2021 17:40:03 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 2: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h File src/kudu/consensus/peer_manager.h: http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h@60 PS2, Line 60: // TODO(awong): those peers should be retried later. What happens currently when the peers are not retried? -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 11 Jun 2021 12:55:42 + Gerrit-HasComments: Yes
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17585 ) Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Patch Set 2: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 11 Jun 2021 12:51:27 + Gerrit-HasComments: No
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Grant Henke has removed a vote on this change. Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17585 to look at the new patch set (#2). Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 8 files changed, 161 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/2 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17585 Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve peer .. [consensus] KUDU-2302: don't crash if new leader can't resolve peer When a tablet replica is elected leader, it constructs Peer objects for each replica in the Raft config for the sake of sending RPCs to each. If, during this construction, any remote peer cannot be reached for whatever reason, this would result in a crash. Rather than crashing, this patch allows us to start Peers without a proxy, and retries constructing the proxy the next time a proxy is required. Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 --- M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/peer_manager.cc M src/kudu/consensus/peer_manager.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/raft_consensus_election-itest.cc 8 files changed, 160 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/1 -- To view, visit http://gerrit.cloudera.org:8080/17585 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7 Gerrit-Change-Number: 17585 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong