[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5294 to look at the new patch set (#8). Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread In order for a consensus replica to safely move "safe" time, it needs to know that all transactions that come before it have at least started. One way to do this is to call transaction_->Start() on transaction_driver_->Init() which we know is called by the thread updating consensus for non-leader transctions. This should be ok, as the leader has already correctly serialized the transactions. However, if transactions are now started on Init() then, in the case they abort (i.e. when transaction_driver_->ReplicationFinished() is called with a non-ok status, again done by the thread updating consensus), we now must make sure that the transaction is actually removed from mvcc before that method returns. Otherwise consensus might call Start() on another transaction with the same timestamp before the failed transaction is removed from mvcc. To do this, this patch adds a latch to make sure that, if replication failed, the prepare phase has completed before returning from ReplicationFinished(). Finally this patch adds a couple of statements to increase the driver's refcount so that it survives until the latch may be called. Ran this in dist-test, slow mode, asan, 1000 times. No failures. Results: exactly_once_writes-itest: http://dist-test.cloudera.org//job?job_id=david.alves.1480659005.9899 raft_consensus-itest : http://dist-test.cloudera.org//job?job_id=david.alves.1480659414.10085 Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c --- M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h 2 files changed, 58 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/5294/8 -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5294 to look at the new patch set (#7). Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread In order for a consensus replica to safely move "safe" time, it needs to know that all transactions that come before it have at least started. One way to do this is to call transaction_->Start() on transaction_driver_->Init() which we know is called by the thread updating consensus, for non-leader transctions. This should be ok as the leader has already correctly serialized the transactions. However, if transactions are now started on Init() then, in the case they abort (i.e. when transaction_driver_->ReplicationFinished() is called with a non-ok status, again done by the thread updating consensus), we now must make sure that the transaction is actually removed from mvcc before that method returns. Otherwise consensus might call Start() on another transaction with the same timestamp before the failed transaction is removed from mvcc. To do this, this patch adds a latch to make sure that, if replication failed, the prepare phase has completed before returning from ReplicationFinished(). Finally this patch adds a couple of statements to increase the driver's refcount so that it survives until the latch may be called. Ran this in dist-test, slow mode, asan, 1000 times. No failures. Results: exactly_once_writes-itest: http://dist-test.cloudera.org//job?job_id=david.alves.1480659005.9899 raft_consensus-itest : http://dist-test.cloudera.org//job?job_id=david.alves.1480659414.10085 Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c --- M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h 2 files changed, 58 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/5294/7 -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/5294/6/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 148: RETURN_NOT_OK(txn_tracker_->Add(this)); > if this fails, then it seems like transaction_->Start() would have been cal you're right, I will add a test to make sure this doesn't happen. maybe even solve KUDU-625 PS6, Line 421: If replication has failed we need to make sure the transaction is removed from mvcc before we : // return from this method, > seems like it would be cleaner to just be using some kind of locking here.. yeah, that makes sense to me. I think this class is due for a new overall. PS6, Line 429: replication_state_copy > why not just read 'status' directly? Done -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/5294/6/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 148: RETURN_NOT_OK(txn_tracker_->Add(this)); if this fails, then it seems like transaction_->Start() would have been called, and then we're relying on the destructor here to abort the mvcc transaction? is that path well covered? not sure if we have a test which makes a replica transaction tracker hit its memory limit, and KUDU-625 makes me nervous PS6, Line 421: If replication has failed we need to make sure the transaction is removed from mvcc before we : // return from this method, seems like it would be cleaner to just be using some kind of locking here... I don't want to block progress on this if we have stress tests showing that it works OK, but let's think more about how to restructure the different prepare/replicate/etc interleavings to be more comprehensible. PS6, Line 429: replication_state_copy why not just read 'status' directly? -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5294/4/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 102: prepare_latch_(0) { > we need to only set the latch if the transaction actually Init'ed want to b Actually I changed my mind. I think you're right, if submitting to prepare fails then calling ReplicationFinished() is wrong. Will start the latch at 1 -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5294 to look at the new patch set (#5). Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread In order for a consensus replica to safely move "safe" time, it needs to know that all transactions that come before it have at least started. One way to do this is to call transaction_->Start() on transaction_driver_->Init() which we know is called by the thread updating consensus, for non-leader transctions. This should be ok as the leader has already correctly serialized the transactions. However, if transactions are now started on Init() then, in the case they abort (i.e. when transaction_driver_->ReplicationFinished() is called with a non-ok status, again done by the thread updating consensus), we now must make sure that the transaction is actually removed from mvcc before that method returns. Otherwise consensus might call Start() on another transaction with the same timestamp before the failed transaction is removed from mvcc. To do this, this patch adds a latch to make sure that, if replication failed, the prepare phase has completed before returning from ReplicationFinished(). Finally this patch adds a couple of statements to increase the driver's refcount so that it survives until the latch may be called. Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c --- M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h 2 files changed, 61 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/5294/5 -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/5294/4//COMMIT_MSG Commit Message: PS4, Line 12: which we know > Not sure this is true, see comments inline I meant "replica" transactions, will clarify PS4, Line 18: consensus thread > Is the consensus thread the prepare thread? Not sure which pool we are talk by "consensus thread" I mean the thread that is updating consensus and that is triggering replica transactions. This doesn't change the "leader transactions" path in terms of Start() http://gerrit.cloudera.org:8080/#/c/5294/3/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: PS3, Line 431: if (PREDICT_FALSE(replication_state_copy == REPLICATION_FAILED)) { : prepare_latch_.Wait(); : } > at first glance this latch thing seems a bit sketchy, but I need to spend a I tried a couple of different things before I resorted to the latch. The main point here is that we need to make sure that the transaction is no longer on mvcc before returning from this method. Because of the race between the Prepare(), running on the prepare pool, and this method, running on the consensus update thread, to call ApplyAsync() it's hard to have this guarantee. I tried different ways of fudging with the vars but ultimately could not find a non-racy way to make sure that, If we changed the state to REPLICATION_FAILED above in time for the prepare thread to see it (and move on to apply, which will actually abort), making sure that it completes before returning from this method. Note that we only wait on the latch for replica transactions that have failed. http://gerrit.cloudera.org:8080/#/c/5294/4/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 102: prepare_latch_(0) { > Since a given transaction prepares only once, why not initialize the prepar we need to only set the latch if the transaction actually Init'ed want to be double sure that it's submitted to prepare before we risk waiting. PS4, Line 118: in the consensus thread > I'm confused by this comment. I'm pretty sure this is executed on a TabletS I meant by the thread that is updating consensus. note that this is the "REPLICA" path. Will make this clearer in the comments. PS4, Line 148: innited > typo: inited Done Line 391: scoped_refptr ref(this); > This is pretty smelly. Can we not ensure the caller has a ref, instead? Als the object won't be destroyed before we change the state below. I agree that lifetime of this class is screwed up, but we seem to resort to increasing the refcount in several other places and would like not to put sorting that out on the critical path. Line 435: > nit: spurious line Done -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Todd Lipcon has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/5294/3/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: PS3, Line 431: if (PREDICT_FALSE(replication_state_copy == REPLICATION_FAILED)) { : prepare_latch_.Wait(); : } at first glance this latch thing seems a bit sketchy, but I need to spend an hour or so looking at it to see if I can come up with a better idea... sorry I didn't get a chance to look in detail this evening. -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Mike Percy has posted comments on this change. Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. Patch Set 4: (7 comments) http://gerrit.cloudera.org:8080/#/c/5294/4//COMMIT_MSG Commit Message: PS4, Line 12: which we know Not sure this is true, see comments inline PS4, Line 18: consensus thread Is the consensus thread the prepare thread? Not sure which pool we are talking about here. http://gerrit.cloudera.org:8080/#/c/5294/4/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 102: prepare_latch_(0) { Since a given transaction prepares only once, why not initialize the prepare_latch_ to 1? Then on abort we can count it down. PS4, Line 118: in the consensus thread I'm confused by this comment. I'm pretty sure this is executed on a TabletService service pool thread: TabletServiceImpl::Write() -> TabletPeer::SubmitWrite() -> TabletPeer::NewLeaderTransactionDriver() -> TransactionDriver::Init() PS4, Line 148: innited typo: inited Line 391: scoped_refptr ref(this); This is pretty smelly. Can we not ensure the caller has a ref, instead? Also, if this is so racy, what guarantees that this object is not destroyed before we take the ref on this line? If we really must do this then we should do a more careful job of documenting the lifecycle guarantees. Line 435: nit: spurious line -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5294 to look at the new patch set (#3). Change subject: KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread .. KUDU-798 (part 3) Make replica transactions start/abort on the consensus thread In order for a replica to safely move "safe" time, it needs to know that all transactions that come before it have at least started. One way to do this is to call transaction_->Start() on transaction_driver_->Init() which we know is called by the consensus thread. This should be ok as the leader has already correctly serialized the transactions. However, if transactions are now started on Init() then, in the case they abort (i.e. when transaction_driver_->ReplicationFinished() is called with a non-ok status, again done on the consensus thread), we now must make sure that the transaction is actually removed from mvcc in-line. Otherwise the consensus thread might call Start() on another transaction with the same timestamp before the failed transaction is removed from mvcc. To do this, this patch adds a latch to make sure that, if replication failed, the prepare phase has completed. Finally this patch adds a couple of statements to increase the driver's refcount so that it survives until the latch may be called. Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c --- M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h 2 files changed, 60 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/94/5294/3 -- To view, visit http://gerrit.cloudera.org:8080/5294 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie360e597eea86551c453717d7a1a000848027f4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Kudu Jenkins