Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting ......................................................................
Patch Set 4: (32 comments) http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 188: Status RaftConsensus::Init() { > I think the DCHECK on state here was useful, even if it needs to be broaden Done PS4, Line 1493: tombstone_last_logged_opid ? *tombstone_last_logged_opid : : GetLatestOpIdFromLog(); > could we simplify by always using 'max()' of the entry from the metadata an I don't think so; see my response to the same question in tablet_service.cc However, I do think we should use the latest opid in the log if Consensus is currently running, since it is guaranteed to be correct in that case. Line 1497: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned"; > any way to improve this with a bit more info? I'll add the tombstone_last_logged_opid here but I can't think of a reason to log more than that because we already log the details whenever we vote. Line 2004: state_ = new_state; > given this is in every case, move it down below the switch? Done Line 2513: Status RaftConsensus::CheckSafeToVoteUnlocked(optional<OpId> tombstone_last_logged_opid) const { > warning: the parameter 'tombstone_last_logged_opid' is copied for each invo Done PS4, Line 2515: (!tombstone_last_logged_opid > not 100% following why this portion of the condition is necessary if we are voting while tombstoned then we don't have to be running PS4, Line 2520: state_ == kShutdown > should we instead be checking the expected state (ie kStopped)? aren't ther We can tombstoned vote in kInit and kStopped state, even in kRunning actually due to a race between the check in TabletService and execution in RaftConsensus. It's not possible to expose RaftConsensus in kNew state due to code in TabletReplica however it would be better to encapsulate that guarantee within this class so I'll add a Create() factory method. Line 2634: return kMinimumTerm; > under what circumstances do we hit this case? It seems like cmeta_ should b Good catch; this was left over from a previous attempt on this patch and is not possible with the current rev. However I agree with you that a factory method for RaftConsensus would be preferable here. Line 2685: if (cmeta_) { > same removed http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: Line 237: const int64_t candidate_term, > warning: parameter 'candidate_term' is const-qualified in the function decl Done PS4, Line 239: boost::optional<bool> ignore_live_leader, : boost::optional<bool> is_pre_election, > what's the point of making these optional? don't they have well-defined def I thought it was nice to pass them as optional so that if you don't specify them you get the default behavior as specified in the proto file. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 38: using kudu::consensus::MakeOpId; > warning: using decl 'MakeOpId' is unused [misc-unused-using-decls] Done Line 39: using kudu::consensus::LeaderStepDownResponsePB; > warning: using decl 'LeaderStepDownResponsePB' is unused [misc-unused-using Done Line 41: using kudu::consensus::RaftConsensus; > warning: using decl 'RaftConsensus' is unused [misc-unused-using-decls] Done Line 42: using kudu::consensus::RaftPeerPB; > warning: using decl 'RaftPeerPB' is unused [misc-unused-using-decls] Done Line 47: using kudu::tablet::TabletStatePB; > warning: using decl 'TabletStatePB' is unused [misc-unused-using-decls] Done Line 48: using kudu::tserver::TabletServerErrorPB; > warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: Line 135: // Tablet states are sent in TabletReports and kept in TabletReplica. > what effect will these changes have if there is a rolling upgrade with an o Based on a grep through the code, the master only knows about tablet::RUNNING and tablet::FAILED so it shouldn't even notice. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: Line 216: void SetPreFlushCallback(StatusClosure callback) { pre_flush_callback_ = callback; } > warning: parameter 'callback' is passed by value and only copied once; cons Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 133: set_state(INITIALIZED); > maybe moot based on the suggestion of a factory method, but if a method lik Done Line 247: // TODO: KUDU-183: Keep track of the pending tasks and send an "abort" message. > warning: missing username/bug in TODO [google-readability-todo] Done Line 272: // Release mem tracker resources. > is this comment stale? Maybe stale? Either way it seems pretty irrelevant. I'll remove it. PS4, Line 407: LOG_WITH_PREFIX(INFO) << "Fetched status for addr " << &state_ : << ": " << SecureDebugString(*status_pb_out); > leftover debug print? yep. removed Line 427: CHECK_EQ(INITIALIZED, state_); > redundant with the DCHECK in set_state Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS4, Line 75: TabletReplica(scoped_refptr<TabletMetadata> meta, : scoped_refptr<consensus::ConsensusMetadataManager> cmeta_manager, : consensus::RaftPeerPB local_peer_pb, : ThreadPool* apply_pool, : Callback<void(const std::string& reason)> mark_dirty_clbk); : : // Initializes RaftConsensus. : Status Init(ThreadPool* raft_pool); > instead of having this split "construct and then init" how about a factory The problem with a monolithic construction and initialization of a TabletReplica is that it prevents replicas in certain failure states from being visible to the master or the web UI. For example, replicas that fail to Init() at startup time due to a missing or corrupted cmeta should still be reported on the web UI and to the master -- at least, I'm pretty sure that's what we want. I suppose an alternative would be to automatically tombstone a tablet that failed to start up but that seems awfully aggressive, especially in the case where there is only 1 replica per tablet. PS4, Line 96: tombstoned > voting in general, right? ie what would happen if you called Stop() but the The log would be closed so we wouldn't be able to vote on a stopped-but-not-tombstoned replica. PS4, Line 169: // If any peers in the consensus configuration lack permanent uuids, get them via an : // RPC call and update. : // TODO: move this to raft_consensus.h. : Status UpdatePermanentUuids(); > can you remove this while you're here? seems to be dead Done Line 277: // Wait until the TabletReplica is fully in STOPPED state. > or SHUTDOWN Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: Line 136: using kudu::tablet::TabletStatusPB; > warning: using decl 'TabletStatusPB' is unused [misc-unused-using-decls] Done Line 947: tablet::TabletDataState data_state = replica->tablet_metadata()->tablet_data_state(); > aren't all these state checks racy? after you grab this here, there's nothi Yeah, it's racy, but it's safe because we're checking after we have a reference to the replica, and the replica can only have one consensus object, which will be shut down if the tablet starts copying again, for example. I'll add a block comment here to explain; LMK what you think. I plan to move the last-logged opid from TabletMetadata to ConsensusMetadata in a follow-up change. PS4, Line 966: if (OpIdBiggerThan(tmp_last_logged_opid, MinimumOpId())) { > when would this equal MinimumOpId? if it's never been tombstoned? > when would this equal MinimumOpId? if it's never been tombstoned? That's right. > Would it be safe to just pass this in always as a non-optional, and then in > RequestVote always just take max(log_->GetLastLoggedOpId(), > tombstone_last_logged_opid())? I don't think so. The case I am worried about is if a tablet fails to start up, so it doesn't have a log running, and then it gets tombstoned by the master. In that case we will not currently store the last-logged opid at DeleteTablet() time. So if we don't know what our last-logged opid was (even though we did actually have one) then I don't think it's safe for us to simply start voting based on a last-logged opid of (0,0). Additionally, just using max() seems wrong in case there was a log truncation involved. We want to use our latest opid state. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tserver/tserver-path-handlers.cc File src/kudu/tserver/tserver-path-handlers.cc: Line 460: if (!consensus || !consensus->IsRunning()) { > instead of this, can you make Consensus::DumpStatusHtml dump something reas good catch -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
