Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7439 )
Change subject: mvcc: allow tablet shutdown without completing txs ...................................................................... Patch Set 15: (3 comments) > Patch Set 15: > > > Patch Set 15: > > > > (1 comment) > > The behavior I saw before was that when I stopped MVCC on a leader and > continued to write to that leader, something would successfully complete and > I would see a mismatch in attempt numbers, resulting in a check failure. > > I think this is being caused by the following race with three nodes A*, B, > and C: > 1. A* gets a client request for rpc0 > 2. rpc0 gets replicated and the nodes begin applying with attempt 0 > 3. A* fails. The client will try different nodes, but if A* remains the > leader, it will eventually cycle back to A* and try again, so it replicates > again. > 4. B and C begin to apply rpc0 attempt 1, registering it with the result > tracker and preempting the follower transaction rpc0 attempt 0 > 5. B and C finish Applying rpc0 attempt 0 and fail because attempt 0 is no > longer the driver > > If I put a sleep in ResultTracker::RecordCompletionAndRespond(), the above > scenario is triggered consistently. We do this preemption to ensure that we > don't respond to a leader transaction that may no longer be valid, but I > don't see where we might be aborting this follower transaction. > > Currently dong more testing around this. We just had another in-person chat that I thought I'd summarize: This race is dependent on the fact that we're aborting _after_ replicating to followers. This may not be possible on the master branch. So A* is aborting during the leader's Apply phase, causing the client to retry and cycle through replicas searching for the leader (since before we would abort if the replica were not a leader). Eventually it would reach the same replica and successfully replicate the same RPC _again_. The followers OTOH are on their merry way Applying this successfully-replicated attempt0. When the RPC gets replicated the second time, followers preempt the current attempt and eventually crash because of it. Getting back to the question of where to abort, I think Init() or Prepare() would do the trick, I would think the earlier the better. Either _should_ prevent this, but I don't see the point in even submitting to the Prepare pool. http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/mvcc.h File src/kudu/tablet/mvcc.h: http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/mvcc.h@277 PS15, Line 277: SetShuttingDown > since there is no "action" this entity is doing maybe "closed" with be a be Done http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.h@405 PS15, Line 405: // Stops the tablet. Once called, currently-Applying operations should : // terminate early, and further transactions should be aborted. : void Stop(); : : bool Stopped() const { : return mvcc_.IsShuttingDown(); : } > I don't think we need this in tablet, since it's a plain proxy to mvcc whic I added this in per Todd's request. You're right that we don't _need_ this in tablet, but it's a cleaner API than reaching into MVCC directly. I suppose the argument for it would be more compelling if there were more things that needed to be stopped on Stop(). http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.cc@438 PS15, Line 438: gscoped_ptr<ScopedTransaction> mvcc_tx; : DCHECK(tx_state->has_timestamp()); : mvcc_tx.reset(new ScopedTransaction(&mvcc_, tx_state->timestamp())); : tx_state->SetMvccTx(std::move(mvcc_tx)); > I though we had discussed having transactions fail on the prepare phase if My thinking was that Prepare could only be run if it were first Inited. There may be some scenario between Init() and Prepare() the disk fails, but in that case, once the transaction is replicated, this should fail in the Apply phase. I.e. nothing should really change in the Prepare phase (yet) because the Prepare phase only touches the WAL dir. -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-Change-Number: 7439 Gerrit-PatchSet: 15 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 03 Oct 2017 21:08:03 +0000 Gerrit-HasComments: Yes
