Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
......................................................................


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc
File src/kudu/integration-tests/txn_commit-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302
PS7, Line 302: DeleteTable
> Does it work as expected as well if dropping a tablet (just a part of a tab
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@316
PS7, Line 316: TestCommitAfterDroppingRangeP
> I think it would be great to add a scenario when TxnStatusManager instance
Yeah, I can add that as a follow-up if you dont mind it being in a separate 
patch. Pausing is tricky with my current test harnesses, so I'd like to add 
some additional coverage using an EMC, per your other suggestions.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@565
PS7, Line 565:   for (const auto& s : results) {
             :     EXPECT_OK(s);
             :   }
> Is this going to put the txn keepalive thread to sleep as well?  I'm not su
Ah indeed. Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690
PS7, Line 690: // Shut down our par
> I guess a real 'abrupt' way of terminating the process is sending SIGKILL :
Yeah, calling Shutdown() from the mini server helps exercise any potential 
races in the destructor, but SIGKILL would be more organic.

I'll add that as a follow-up change.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754
PS7, Line 754:   }
> It would be great to add at least one failure scenario, when finalizing the
That case is very similar to the TestCommitWhenParticipantsAreDown case, though 
less trivial to implement with 1, 2, and 3 node clusters. I can add that as a 
follow-up.


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152
PS4, Line 152:
> Ah, indeed.  It seems I misunderstood where the status comes from (even if
Added a comment


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201:     if (PREDICT_FALSE(s.IsTimedOut())) {
             :       // Retry timeout errors. Other transient errors should be 
retried by the
             :       // client until timeout.
> After some consideration, I'm not quite sure this 'good-riddance' semantic
The case we're thinking about here is if a transaction is racing with a drop 
partition. The observation I'm relying on here is that the result of deleting 
the partition after committing and deleting the partition before committing are 
the same -- the end result is the partition is deleted with no way of reading 
the data.

And if the table were deleted while inserting data, it's the onus of the 
writing application to detect that, since there'd be NotFound errors well 
before the commit phase.

It's also important to pass through this case rather than aborting for the 
following case:
1. begin committing a transaction with participants 1..5, running the 
BEGIN_COMMIT op on all participants
2. finalize commit on participants 1..5, running the FINALIZE_COMMIT op on all 
participants
3. leadership changes or the node crashes, a new leader is elected and the 
commit task is retried
4. partition 5 is deleted
5. we notice that partition 5 is deleted. Should we abort the transaction? But 
we've already finalized the commit on some participants!

I added a comment trying to explain that.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@286
PS7, Line 286:   txn_client_->ParticipateInTransactionAsync(
             :       participant_ids_[participant_idx],
> nit: just for the sake of working with std::atomic in a consistent way, may
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@463
PS7, Line 463:
> Should this be a parameter controlled by a run-time flag?
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@471
PS7, Line 471:  now, just
> nit: maybe, move this outside of the 'for ()' cycle?
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@492
PS7, Line 492:   {
             :     std::lock_guard<simple_spinlock> l(lock_);
             :     highest_txn_id_ = std::max(highest_txn_id, highest_txn_id_);
             :
> Why not to do so prior to starting new commit tasks?  I guess it might be l
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Feb 2021 20:28:57 +0000
Gerrit-HasComments: Yes

Reply via email to