David Ribeiro Alves has posted comments on this change.

Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the 
consensus thread
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5294/4//COMMIT_MSG
Commit Message:

PS4, Line 12: which we know
> Not sure this is true, see comments inline
I meant "replica" transactions, will clarify


PS4, Line 18: consensus thread
> Is the consensus thread the prepare thread? Not sure which pool we are talk
by "consensus thread" I mean the thread that is updating consensus and that is 
triggering replica transactions. This doesn't change the "leader transactions" 
path in terms of Start()


http://gerrit.cloudera.org:8080/#/c/5294/3/src/kudu/tablet/transactions/transaction_driver.cc
File src/kudu/tablet/transactions/transaction_driver.cc:

PS3, Line 431:   if (PREDICT_FALSE(replication_state_copy == 
REPLICATION_FAILED)) {
             :     prepare_latch_.Wait();
             :   }
> at first glance this latch thing seems a bit sketchy, but I need to spend a
I tried a couple of different things before I resorted to the latch. The main 
point here is that we need to make sure that the transaction is no longer on 
mvcc before returning from this method.

Because of the race between the Prepare(), running on the prepare pool, and 
this method, running on the consensus update thread, to call ApplyAsync() it's 
hard to have this guarantee.

I tried different ways of fudging with the vars but ultimately could not find a 
non-racy way to make sure that, If we changed the state to REPLICATION_FAILED 
above in time for the prepare thread to see it (and move on to apply, which 
will actually abort), making sure that it completes before returning from this 
method.

Note that we only wait on the latch for replica transactions that have failed.


http://gerrit.cloudera.org:8080/#/c/5294/4/src/kudu/tablet/transactions/transaction_driver.cc
File src/kudu/tablet/transactions/transaction_driver.cc:

Line 102:       prepare_latch_(0) {
> Since a given transaction prepares only once, why not initialize the prepar
we need to only set the latch if the transaction actually Init'ed want to be 
double sure that it's submitted to prepare before we risk waiting.


PS4, Line 118: in the consensus thread
> I'm confused by this comment. I'm pretty sure this is executed on a TabletS
I meant by the thread that is updating consensus. note that this is the 
"REPLICA" path. Will make this clearer in the comments.


PS4, Line 148: innited
> typo: inited
Done


Line 391:   scoped_refptr<TransactionDriver> ref(this);
> This is pretty smelly. Can we not ensure the caller has a ref, instead? Als
the object won't be destroyed before we change the state below.
I agree that lifetime of this class is screwed up, but we seem to resort to 
increasing the refcount in several other places and would like not to put 
sorting that out on the critical path.


Line 435: 
> nit: spurious line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5294
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-HasComments: Yes

Reply via email to