Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7439 )

Change subject: mvcc: allow tablet shutdown without completing txs
......................................................................


Patch Set 15:

(3 comments)

> Patch Set 15:
>
> > Patch Set 15:
> >
> > (1 comment)
>
> The behavior I saw before was that when I stopped MVCC on a leader and 
> continued to write to that leader, something would successfully complete and 
> I would see a mismatch in attempt numbers, resulting in a check failure.
>
> I think this is being caused by the following race with three nodes A*, B, 
> and C:
> 1. A* gets a client request for rpc0
> 2. rpc0 gets replicated and the nodes begin applying with attempt 0
> 3. A* fails. The client will try different nodes, but if A* remains the 
> leader, it will eventually cycle back to A* and try again, so it replicates 
> again.
> 4. B and C begin to apply rpc0 attempt 1, registering it with the result 
> tracker and preempting the follower transaction rpc0 attempt 0
> 5. B and C finish Applying rpc0 attempt 0 and fail because attempt 0 is no 
> longer the driver
>
> If I put a sleep in ResultTracker::RecordCompletionAndRespond(), the above 
> scenario is triggered consistently. We do this preemption to ensure that we 
> don't respond to a leader transaction that may no longer be valid, but I 
> don't see where we might be aborting this follower transaction.
>
> Currently dong more testing around this.

We just had another in-person chat that I thought I'd summarize:
This race is dependent on the fact that we're aborting _after_ replicating to 
followers. This may not be possible on the master branch.

So A* is aborting during the leader's Apply phase, causing the client to retry 
and cycle through replicas searching for the leader (since before we would 
abort if the replica were not a leader). Eventually it would reach the same 
replica and successfully replicate the same RPC _again_.

The followers OTOH are on their merry way Applying this successfully-replicated 
attempt0. When the RPC gets replicated the second time, followers preempt the 
current attempt and eventually crash because of it.

Getting back to the question of where to abort, I think Init() or Prepare() 
would do the trick, I would think the earlier the better. Either _should_ 
prevent this, but I don't see the point in even submitting to the Prepare pool.

http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/mvcc.h
File src/kudu/tablet/mvcc.h:

http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/mvcc.h@277
PS15, Line 277: SetShuttingDown
> since there is no "action" this entity is doing maybe "closed" with be a be
Done


http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.h@405
PS15, Line 405:   // Stops the tablet. Once called, currently-Applying 
operations should
              :   // terminate early, and further transactions should be 
aborted.
              :   void Stop();
              :
              :   bool Stopped() const {
              :     return mvcc_.IsShuttingDown();
              :   }
> I don't think we need this in tablet, since it's a plain proxy to mvcc whic
I added this in per Todd's request. You're right that we don't _need_ this in 
tablet, but it's a cleaner API than reaching into MVCC directly.

I suppose the argument for it would be more compelling if there were more 
things that needed to be stopped on Stop().


http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/7439/15/src/kudu/tablet/tablet.cc@438
PS15, Line 438:   gscoped_ptr<ScopedTransaction> mvcc_tx;
              :   DCHECK(tx_state->has_timestamp());
              :   mvcc_tx.reset(new ScopedTransaction(&mvcc_, 
tx_state->timestamp()));
              :   tx_state->SetMvccTx(std::move(mvcc_tx));
> I though we had discussed having transactions fail on the prepare phase if
My thinking was that Prepare could only be run if it were first Inited. There 
may be some scenario between Init() and Prepare() the disk fails, but in that 
case, once the transaction is replicated, this should fail in the Apply phase. 
I.e. nothing should really change in the Prepare phase (yet) because the 
Prepare phase only touches the WAL dir.



--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-Change-Number: 7439
Gerrit-PatchSet: 15
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 03 Oct 2017 21:08:03 +0000
Gerrit-HasComments: Yes

Reply via email to