Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11251 )
Change subject: [WIP] KUDU-2245 Graceful leadership transfer ...................................................................... Patch Set 4: (17 comments) I think I hit most of the small stuff. I need to come back later and add some more tests. http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto File src/kudu/consensus/consensus.proto: http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus.proto@462 PS2, Line 462: Do not set in ABRUPT mode. > nit: Does it mean it's not effective in ABRUPT mode (i.e. ignored)? No. It means the request is rejected as invalid. http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h File src/kudu/consensus/consensus_peers.h: http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@82 PS2, Line 82: does not count as a pending request > Not sure I understand this. Did you mean pending Raft config change? This class normally can only handle one request outstanding at a time. I meant that the StartElection does not count as that one request. I updated the comment to make it clearer. http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@223 PS2, Line 223: virtual void StartTabletCopy(const StartTabletCopyRequestPB* request, > warning: parameter 'request' is unused [misc-unused-parameters] Done http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.h@224 PS2, Line 224: StartTabletCopyResponsePB* response, > warning: parameter 'response' is unused [misc-unused-parameters] Done http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc@293 PS2, Line 293: & > nit: is it even needed? I would expect we don't need to capture anything f Done http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_peers.cc@501 PS2, Line 501: const rpc::ResponseCallback& callback) { > warning: parameter 'callback' is unused [misc-unused-parameters] Done http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/consensus/consensus_queue.h@556 PS2, Line 556: mutable simple_spinlock queue_lock_; // TODO: rename > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1040 PS4, Line 1040: TrackedPeer* peer > Maybe, a const reference instead of a pointer? Done http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/consensus_queue.cc@1455 PS4, Line 1455: Unable to notify RaftConsensus of peer to promote. > This messages seems to be outdated. Done http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h File src/kudu/consensus/peer_manager.h: http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/peer_manager.h@66 PS4, Line 66: void > Why not to return Status ? That signature would be more robust, and caller Because the current caller won't do anything useful with the status. It can always be changed in the future if that changes. The method itself logs its reason for failure if it fails. http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@608 PS4, Line 608: BeginLeaderTransferPeriodUnlocked(new_leader_uuid); > Is it worth verifying that new_leader_uuid != peer_uuid() prior to kicking Yes. We also verify that on the client side and short-circuit in that case. http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@984 PS4, Line 984: LOG_WITH_PREFIX_UNLOCKED(INFO) << "Not signalling peer " << peer_uuid : << "to start an election: it's not a voter " : << "in the active config."; > Add 'return;' after logging ? Done http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/consensus/raft_consensus.cc@990 PS4, Line 990: peer_manager_->StartElection(peer_uuid); > What if StartElection() returns non-OK status? Leadership transfer failed. StartElection will log the reason why. This is already after we responded to the RPC (transfer is async) so we can't send an error back. http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/tools/kudu-admin-test.cc@1205 PS2, Line 1205: tablet_id_, /*index=*/1)); > warning: argument name 'index' in comment does not match parameter name 'mi Done http://gerrit.cloudera.org:8080/#/c/11251/2/src/kudu/tools/kudu-admin-test.cc@1252 PS2, Line 1252: tablet_id_, /*index=*/1)); > warning: argument name 'index' in comment does not match parameter name 'mi Done http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h File src/kudu/tools/tool_replica_util.h: http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tools/tool_replica_util.h@57 PS4, Line 57: that the replica with UUID 'leader_uuid' step down. > nit: the replica with UUID 'leader_uuid' to step down. Hrm, my phrasing sounds more grammatical to my ear, but I don't have a precise argument for why. http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/11251/4/src/kudu/tserver/tablet_service.cc@1146 PS4, Line 1146: uuid > nit: UUID Done -- To view, visit http://gerrit.cloudera.org:8080/11251 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic97343af9eb349556424c999799ed5e2941f0083 Gerrit-Change-Number: 11251 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Fri, 31 Aug 2018 18:30:52 +0000 Gerrit-HasComments: Yes