[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217
PS8, Line 217:   bool has_failed_peer = false;
> I appreciate you updated this scenario to make it reliable and robust!
Thanks for the suggestion!



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 12 Jun 2021 18:14:54 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Reviewed-on: http://gerrit.cloudera.org:8080/17585
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
11 files changed, 233 insertions(+), 43 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 11: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217
PS8, Line 217:   bool has_failed_peer = false;
> (the tuning did not work well, as it turns out)
I appreciate you updated this scenario to make it reliable and robust!



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 12 Jun 2021 17:29:17 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#11).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
11 files changed, 233 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/11
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217
PS8, Line 217:   for (auto& ts : tablet_servers_) {
> Yep, I spent some time tuning some flag values, but this is a better approa
(the tuning did not work well, as it turns out)



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 12 Jun 2021 06:24:09 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217
PS8, Line 217:   for (auto& ts : tablet_servers_) {
> It seems this fails rather often, even in RELEASE build:
Yep, I spent some time tuning some flag values, but this is a better approach.



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 12 Jun 2021 06:23:37 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#10).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
11 files changed, 233 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/10
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-12 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#9).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/cluster_itest_util.cc
M src/kudu/integration-tests/cluster_itest_util.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
11 files changed, 232 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/9
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 8: Code-Review+1

(3 comments)

Overall looks good to me, but it seems the rate of flakiness for the newly 
added test is a bit high.

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167
PS4, Line 167: TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
> Tried it and the test passes.
Cool, thanks a lot for verifying that!


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247
PS4, Line 247: });
 :   };
 :   NO_FATALS(wait_for_some_elections());
 :   NO_FATALS(cluster_->AssertNoCrashes());
 :
 :   // Stop injecting latency so we can get a stable leader.
> Done
Thank you!


http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/8/src/kudu/integration-tests/raft_consensus_election-itest.cc@217
PS8, Line 217:   ASSERT_LT(initial_cstate.current_term() + 10, 
cstate.current_term());
It seems this fails rather often, even in RELEASE build:

src/kudu/integration-tests/raft_consensus_election-itest.cc:218
Expected: (initial_cstate.current_term() + 10) < (cstate.current_term()), 
actual: 30 vs 22

Maybe, it's worth switch this scenario into manual election mode?



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 12 Jun 2021 01:55:06 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#8).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
9 files changed, 241 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#7).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
9 files changed, 241 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167
PS4, Line 167: TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
> I'm curious: in the scope of this test, is it important or not to have conn
Tried it and the test passes.


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247
PS4, Line 247:   if (peer.health_report().overall_health() == 
consensus::HealthReportPB::FAILED) {
 : has_failed_peer = true;
 : break;
 :   }
 : }
 :   }
> In addition to this happy-ending, could you add a scenario when there is an
Done



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 11 Jun 2021 23:54:33 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#6).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
9 files changed, 247 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 5: Code-Review+1

(2 comments)

Overall looks good, but it would be great to add a test scenario which would 
exercise one other realistic scenario when the names many other tablet servers 
in the cluster are still resolvable, so the system replaces the failed peer 
with a new one for the time of the DNS resolver's malfunction for one 
particular kudu-tserver node.

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc
File src/kudu/integration-tests/raft_consensus_election-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@167
PS4, Line 167: TEST_F(RaftConsensusElectionITest, 
TestNewLeaderCantResolvePeers) {
I'm curious: in the scope of this test, is it important or not to have 
connections already established between Raft peers?  Will anything change if 
adding --rpc_reopen_outbound_connections=true in this scenario?


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/integration-tests/raft_consensus_election-itest.cc@247
PS4, Line 247:   // Once we stop failing DNS resolution, the peer should become 
healthy.
 :   for (const auto& ts : tablet_servers_) {
 : 
ASSERT_OK(cluster_->SetFlag(cluster_->tablet_server_by_uuid(ts.first),
 :   "fail_dns_resolution", "false"));
 :   }
 :   NO_FATALS(check_for_failed_peer(/*expect_failed*/false));
In addition to this happy-ending, could you add a scenario when there is an 
extra tablet servers to spawn a replacement replica?  I guess the expected 
result would be having the peer with failed DNS name replaced, and it's 
important to make sure everything works as expected in that case.



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 11 Jun 2021 19:52:17 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h
File src/kudu/consensus/consensus-test-util.h:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h@341
PS4, Line 341: class NoOpTestPeerProxy : public TestPeerProxy {
> warning: parameter 'peer_pb' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h@342
PS4, Line 342:  public:
> This could be dangerous. The ownership of the raw ptr has been transferred
Existing test usage of Peers has been unsafe in similar ways (e.g. creating a 
proxy as a raw pointer and passing ownership to the Peers). Updated this impl a 
bit to at least pass ASAN, and added a comment that this can be unsafe and that 
users should be wary. If you feel strongly that this isn't sufficient for 
tests, I can go about making more drastic changes.


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc
File src/kudu/consensus/consensus_peers-test.cc:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc@152
PS4, Line 152: ,
> This piece looks problematic even before this change.
Done



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 11 Jun 2021 19:47:05 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Tidy Bot, Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#5).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
9 files changed, 194 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/5
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h
File src/kudu/consensus/consensus-test-util.h:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus-test-util.h@342
PS4, Line 342: *proxy = std::unique_ptr(proxy_);
This could be dangerous. The ownership of the raw ptr has been transferred to 
the unique_ptr but the proxy_ still points to a location which can be deleted 
by the caller of NewProxy() holding the unique_ptr.

It'd be better if proxy_ itself is a unique_ptr.


http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc
File src/kudu/consensus/consensus_peers-test.cc:

http://gerrit.cloudera.org:8080/#/c/17585/4/src/kudu/consensus/consensus_peers-test.cc@152
PS4, Line 152: return proxy_ptr;
> warning: Use of memory after it is freed [clang-analyzer-cplusplus.NewDelet
This piece looks problematic even before this change.



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 11 Jun 2021 19:21:18 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc@508
PS2, Line 508: d::move(peer_proxy)
> From the test code looks like this could be null. Do we need to check for n
Done


http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc
File src/kudu/consensus/peer_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc@77
PS2, Line 77: unique_ptr peer_proxy;
: WARN_NOT_OK(peer_proxy_factory_->NewProxy(peer_pb, 
_proxy),
: Substitute("Unable to create proxy for $0", 
peer_pb.permanent_uuid()));
> Since the proxy factory is being supplied to NewRemotePeer, this could move
I wanted to not be too invasive with the API for tests originally, but turns 
out this was only used in one test.



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Jun 2021 18:30:37 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#4).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus-test-util.h
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
9 files changed, 207 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/4
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h
File src/kudu/consensus/peer_manager.h:

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h@60
PS2, Line 60:   // proxies also re-attempt to re-resolve their addresses.
> What happens currently when the peers are not retried?
Ah I left this from before I started re-resolving. Peers are retried in this 
patch.



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Jun 2021 17:43:47 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins, Grant Henke, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#3).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
8 files changed, 161 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/3
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Bankim Bhavsar (Code Review)
Bankim Bhavsar has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/consensus_peers.cc@508
PS2, Line 508: peer_proxy_factory_
>From the test code looks like this could be null. Do we need to check for 
>not-null ptr or return an error?


http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc
File src/kudu/consensus/peer_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.cc@77
PS2, Line 77: unique_ptr peer_proxy;
: WARN_NOT_OK(peer_proxy_factory_->NewProxy(peer_pb, 
_proxy),
: Substitute("Unable to create proxy for $0", 
peer_pb.permanent_uuid()));
Since the proxy factory is being supplied to NewRemotePeer, this could move to 
NewRemotePeer() itself utilizing the CreateIfNeeded() function and no need to 
supply peer_proxy to NewRemotePeer().



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Jun 2021 17:40:03 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 2: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h
File src/kudu/consensus/peer_manager.h:

http://gerrit.cloudera.org:8080/#/c/17585/2/src/kudu/consensus/peer_manager.h@60
PS2, Line 60:   // TODO(awong): those peers should be retried later.
What happens currently when the peers are not retried?



--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Jun 2021 12:55:42 +
Gerrit-HasComments: Yes


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17585 )

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Patch Set 2: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 11 Jun 2021 12:51:27 +
Gerrit-HasComments: No


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Grant Henke (Code Review)
Grant Henke has removed a vote on this change.

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17585

to look at the new patch set (#2).

Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
8 files changed, 161 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/2
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [consensus] KUDU-2302: don't crash if new leader can't resolve peer

2021-06-11 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/17585


Change subject: [consensus] KUDU-2302: don't crash if new leader can't resolve 
peer
..

[consensus] KUDU-2302: don't crash if new leader can't resolve peer

When a tablet replica is elected leader, it constructs Peer objects for
each replica in the Raft config for the sake of sending RPCs to each.
If, during this construction, any remote peer cannot be reached for
whatever reason, this would result in a crash.

Rather than crashing, this patch allows us to start Peers without a
proxy, and retries constructing the proxy the next time a proxy is
required.

Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
---
M src/kudu/consensus/consensus_peers-test.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_peers.h
M src/kudu/consensus/peer_manager.cc
M src/kudu/consensus/peer_manager.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/integration-tests/raft_consensus_election-itest.cc
8 files changed, 160 insertions(+), 19 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/17585/1
--
To view, visit http://gerrit.cloudera.org:8080/17585
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I22d870ecc526fa47b97f6856c3b023bc1ec029c7
Gerrit-Change-Number: 17585
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong