David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/7221 )
Change subject: Simplify OpId/Timestamp assignment and make it atomic ...................................................................... Patch Set 8: (36 comments) http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h File src/kudu/consensus/consensus-test-util.h: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@64 PS17, Line 64: OpId* id = msg->mutable_id(); : id->set_term(term); > nit: usually it's not a good idea to have 'using' in header files, unless t Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@67 PS17, Line 67: > As I see, the pattern is always CreateDummyReplicateWithOpId().release(). I essentially split CreateDummyReplicate in 2 versions one that sets an opid and one that doesn't. I can refactor the pointer ownership in a follow up if you feel strongly, but would like to keep that out of this straightforward change. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@82 PS17, Line 82: inline RaftPeerPB FakeRaftPeerPB(const std::string& uuid) { > ditto: maybe, just return a raw pointer. same http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@115 PS17, Line 115: continue; > Instead, is it possible to rely on the mode of the object pointed to by the not sure I follow... http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@133 PS17, Line 133: _ > an extra space Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_meta.cc File src/kudu/consensus/consensus_meta.cc: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_meta.cc@153 PS17, Line 153: return leader_uuid_; > Maybe, just use DCHECK() here? As I understand, the committed configuratio tbh honest I don't see a problem in having this here and being extra defensive. the point of this patch is that the pending config is no longer assigned an op_id before being marked as pendign which must be set before marking it as commited. this is here to make sure we always enforce that last bit. added a comment to that regard. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.h File src/kudu/consensus/consensus_queue.h: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.h@231 PS17, Line 231: // it is alive and making progress. > update this to indicate it also does assignment of OpId and Timestamp Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.h@238 PS17, Line 238: bool* more_pending); : : // Called by the consensus implementation to update the queu > Is it possible just to rely on the status of the queue and have a single me the latter. I pondered the "uglyness" of having two methods/extra params versus just relying on the state of the queue, but ultimately decided on this to make it clear/explicit and have extra checks. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc@361 PS17, Line 361: // Maintain a thread-safe copy of necessary members. : OpId preceding_id; > per comment below about GetNextOpId having a side effect, here's an example Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc@421 PS17, Line 421: // We try to get the follower's next_index from our log. > nit: indent Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc@440 PS17, Line 440: << "while preparing peer request: " : << s.ToString() << ". Destination peer: " : > hrm, this seems a little surprising that GetNextOpId has this side effect r Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/quorum_util.cc File src/kudu/consensus/quorum_util.cc: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/quorum_util.cc@158 PS17, Line 158: > add: DCHECK(type == COMMITTED_CONFIG || type == PENDING_CONFIG) Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.h@449 PS17, Line 449: t, > must Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: PS17: > Does it make sense to add some specific test for this new behavior or it's for writes the behavior is undistinguishable. for non-writes the behavior is now "correct". there's a patch further down the series that makes sure this is actually what happens. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@648 PS17, Line 648: > or just: if (round->replicate_msg()->op_type() == CHANGE_CONFIG_OP) { ... } went with todds suggestion http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@648 PS17, Line 648: > although this is right, it looks confusing because round probably isn't a c Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@662 PS17, Line 662: // Check if the pending Raft config has an OpId and whether the index is less than the > Due to the method name, seems less surprising to add: DCHECK_EQ(CHANGE_CONF went with todds naming suggestion http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@673 PS17, Line 673: return Status::OK(); > nit: extra line Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@691 PS17, Line 691: h > nit: no need for inner parentheses Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2609 PS17, Line 2609: ClearLeaderU > this and new_config can be const refs new_config can't we set the opid below before we set it as the committed config http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2622 PS17, Line 2622: > This could be const reference as well. Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2623 PS17, Line 2623: bool RaftConsensus::HasLeaderUnlocked() const { > nit: add a space after 'and' Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2624 PS17, Line 2624: > nit: replace with colon? Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2633 PS17, Line 2633: const bool RaftConsensus::HasVotedCurrentTermUnlocked() const { : DCHECK(lock_.is_locked()); > it should still have an opid right? ie if we get to this path it should hav Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2844 PS17, Line 2844: : : > Since this is just for comparing configs in the if() closure below, move it Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager-test.cc File src/kudu/consensus/time_manager-test.cc: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager-test.cc@a176 PS17, Line 176: > Maybe, add an assertion instead: this is testing pinning safetime advancement, which no longer happens http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc File src/kudu/consensus/time_manager.cc: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc@110 PS17, Line 110: // : // The only opportunity for unrepeatable reads in this setup is if an old leader > hrm, is this method still even very useful? Maybe it would be clearer to ju would prefer to keep this, even if now it's just really checking the stage. as the docs say it's necessary for leader leases, also consensus doesn't directly interact with the clock anymore. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc@294 PS17, Line 294: : > this looks wrong. you are falling through to the next case which breaks any you're right that this looked confusing. fixed that. the point is to either get the current clock value or the last safe ts. hopefully it's clear now. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc@307 PS17, Line 307: > can you DCHECK_EQ(mode_, LEADER) here? Done http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc@267 PS8, Line 267: prepare_state_ = PREPARED; > it's sketching me out that the state transition to PREPARED happens under t Left a comment related to that in LOC 330. The point is that the transaction is now started after being submitted to consensus, and consensus replication and its callback will race with this thread, so we need to make sure we've started the txn before marking it PREPARED signaling it's ready to move to apply. Done. http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc@295 PS8, Line 295: // Register a dummy transaction with mvcc manager that we'll abort later. > Did you see this comment and the one above? Done http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc@295 PS8, Line 295: // Register a dummy transaction with mvcc manager that we'll abort later. > instead of using a dummy transaction, would it be possible to register it e hum, that might work, but for us to have different timestamps for the same txn we'd likely have to add a timestamp state or mode so that we don't believe the timestamp elsewhere if its not definitive. I'll just bite the bullet and add a pinning mechanism to mvcc (this was a lazy way of doing that). Update: Added a scoped clean time advancement pin mechanism. http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@336 PS17, Line 336: > s/rule/invariant/ Done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@360 PS17, Line 360: > s/constraint/above invariant/ Done mvcc can't know of all transactions, but if does crash if we try to either submit a transaction with a ts lower than clean time or if we commit a transaction with a ts lower than clean time http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@371 PS17, Line 371: $0)", s.ToStrin > This variable masks a variable of the same name in an outer scope ah, good catch. done http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@384 PS17, Line 384: { > can we still get to this case? I don't see why not -- To view, visit http://gerrit.cloudera.org:8080/7221 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2bedcc5c3708d06f48131f34ea14c835867b4800 Gerrit-Change-Number: 7221 Gerrit-PatchSet: 8 Gerrit-Owner: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Wed, 14 Mar 2018 21:40:07 +0000 Gerrit-HasComments: Yes
