Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11251 )

Change subject: [WIP] KUDU-2245 Graceful leadership transfer
......................................................................


Patch Set 4:

(14 comments)

Just skimming through.  I'm going to take another look tomorrow.

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto
File src/kudu/consensus/consensus.proto:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto@462
PS2, Line 462: Do not set in ABRUPT mode.
nit: Does it mean it's not effective in ABRUPT mode (i.e. ignored)?

Maybe, change to

'Ignored in ABRUPT mode, do not set.'


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h
File src/kudu/consensus/consensus_peers.h:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@82
PS2, Line 82: does not count as a pending request
Not sure I understand this.  Did you mean pending Raft config change?


http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc
File src/kudu/consensus/consensus_peers.cc:

http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc@293
PS2, Line 293: &
nit: is it even needed?  I would expect we don't need to capture anything for 
that.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1040
PS4, Line 1040: TrackedPeer* peer
Maybe, a const reference instead of a pointer?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1455
PS4, Line 1455: Unable to notify RaftConsensus of peer to promote.
This messages seems to be outdated.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h
File src/kudu/consensus/peer_manager.h:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h@66
PS4, Line 66: void
Why not to return Status ?  That signature would be more robust, and callers, 
if needed, might explicitly ignore the result status.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@608
PS4, Line 608: BeginLeaderTransferPeriodUnlocked(new_leader_uuid);
Is it worth verifying that new_leader_uuid != peer_uuid() prior to kicking off 
the process?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@984
PS4, Line 984:     LOG_WITH_PREFIX_UNLOCKED(INFO) << "Not signalling peer " << 
peer_uuid
             :                                    << "to start an election: 
it's not a voter "
             :                                    << "in the active config.";
Add 'return;' after logging ?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@990
PS4, Line 990: peer_manager_->StartElection(peer_uuid);
What if StartElection() returns non-OK status?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

PS4:
As discussed offline, it's worth adding some negative testcases as well:

* attempt transferring leadership to a replica that has just been evicted from 
the config
* attempt transferring leadership to non-voter replica
* attempt transferring leadership to a replica which fell behind WAL segment GC 
threshold

Also, what it the behavior when asking a leader during transfer period to 
abruptly step down?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc@1282
PS4, Line 1282: ASSERT_TRUE(s.ok()) << s.ToString()
Here and elsewhere in this patch: if something goes wrong, it helps a lot to 
have the stderr printed out as well.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/kudu-admin-test.cc@1288
PS4, Line 1288:     ASSERT_OK(GetLeaderReplicaWithRetries(tablet_id_, 
&master_observed_leader));
              :     ASSERT_TRUE(ContainsKey(possible_new_leaders, 
master_observed_leader->uuid()));
Shouldn't RunKuduTool be under this ASSERT_EVENTUALLY as well?  Don't you 
expect it to be flaky a bit otherwise?


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h
File src/kudu/tools/tool_replica_util.h:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h@57
PS4, Line 57: that the replica with UUID 'leader_uuid' step down.
nit: the replica with UUID 'leader_uuid' to step down.


http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc@1146
PS4, Line 1146: uuid
nit: UUID



--
To view, visit http://gerrit.cloudera.org:8080/11251
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083
Gerrit-Change-Number: 11251
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Fengling Wang <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 29 Aug 2018 05:09:18 +0000
Gerrit-HasComments: Yes

Reply via email to