David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread ......................................................................
Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/5294/4//COMMIT_MSG Commit Message: PS4, Line 12: which we know > Not sure this is true, see comments inline I meant "replica" transactions, will clarify PS4, Line 18: consensus thread > Is the consensus thread the prepare thread? Not sure which pool we are talk by "consensus thread" I mean the thread that is updating consensus and that is triggering replica transactions. This doesn't change the "leader transactions" path in terms of Start() http://gerrit.cloudera.org:8080/#/c/5294/3/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: PS3, Line 431: if (PREDICT_FALSE(replication_state_copy == REPLICATION_FAILED)) { : prepare_latch_.Wait(); : } > at first glance this latch thing seems a bit sketchy, but I need to spend a I tried a couple of different things before I resorted to the latch. The main point here is that we need to make sure that the transaction is no longer on mvcc before returning from this method. Because of the race between the Prepare(), running on the prepare pool, and this method, running on the consensus update thread, to call ApplyAsync() it's hard to have this guarantee. I tried different ways of fudging with the vars but ultimately could not find a non-racy way to make sure that, If we changed the state to REPLICATION_FAILED above in time for the prepare thread to see it (and move on to apply, which will actually abort), making sure that it completes before returning from this method. Note that we only wait on the latch for replica transactions that have failed. http://gerrit.cloudera.org:8080/#/c/5294/4/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 102: prepare_latch_(0) { > Since a given transaction prepares only once, why not initialize the prepar we need to only set the latch if the transaction actually Init'ed want to be double sure that it's submitted to prepare before we risk waiting. PS4, Line 118: in the consensus thread > I'm confused by this comment. I'm pretty sure this is executed on a TabletS I meant by the thread that is updating consensus. note that this is the "REPLICA" path. Will make this clearer in the comments. PS4, Line 148: innited > typo: inited Done Line 391: scoped_refptr<TransactionDriver> ref(this); > This is pretty smelly. Can we not ensure the caller has a ref, instead? Als the object won't be destroyed before we change the state below. I agree that lifetime of this class is screwed up, but we seem to resort to increasing the refcount in several other places and would like not to put sorting that out on the critical path. Line 435: > nit: spurious line Done -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-HasComments: Yes