[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-05 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 16: Code-Review+2

(3 comments)

Looks good to me. I didn't get a chance to check the tests closely. But I 
believe Alexey has done so. Thanks a lot Andrew for the patch!

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@257
PS14, Line 257:   participant_ids_[participant_idx],
  :   std::move(op_pb),
  :   
MonoDelta::FromMilliseconds(FLAGS_txn_background_rpc_timeout_ms),
  :   std::move(participated_cb),
  :   &begin_commit_timestamps_[participant_idx]);
  : }
> It's not identical everywhere, but sure.
Ack


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@268
PS14, Line 268: pated_cb = [this, scoped_this = std::move(scoped_this),
  :   participant_idx, commit_timestamp] 
(const Status& s) {
  : if (IsShuttingDownCleanupIfLastOp()) {
  :
> Yeah, added a comment to make that clear that it's intentional.
Ack


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@787
PS14, Line 787:
  :   std::lock_guar
> There is always at least the BEGIN_COMMIT op timestamp, regardless of any w
Ack



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 16
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 05 Feb 2021 08:12:29 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 16:

(1 comment)

Ah oops, clicked submit without another round from Hao. I can address any 
further feedback in follow-up patches.

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@205
PS7, Line 205: CK_LT(participant_idx, participant_ids_.size());
 :   // Status callback ca
> nit: should this comment be moved to L201 to cover 's.IsNotFound()' case?
Done



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 16
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 05 Feb 2021 07:21:37 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-04 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Reviewed-on: http://gerrit.cloudera.org:8080/16952
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,366 insertions(+), 113 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 16
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-04 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 15: Code-Review+2

(5 comments)

Looks good to me!  Maybe, Hao has more feedback?

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc
File src/kudu/integration-tests/txn_commit-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302
PS7, Line 302: , TestCommi
> Done
Thank you!


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690
PS7, Line 690: shared_ptr Yeah, calling Shutdown() from the mini server helps exercise any potential
Thanks!


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754
PS7, Line 754: // TxnStatusManager.
> That case is very similar to the TestCommitWhenParticipantsAreDown case, th
Sure -- it seems ExternalMiniCluster-based harness fits better for that 
purpose.  Also, if you are sure the control paths and everything else is the 
same, probably having just TestCommitWhenParticipantsAreDown should be good 
enough.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201:   return false;
 : }
 :
> We discussed this further with Hao and agreed that it's worth exploring an
Thank you!


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201:   return false;
 : }
 :
> The case we're thinking about here is if a transaction is racing with a dro
Dropping a partition after commit is complete is a bit different because at 
least at the commit timestamp all the written data was persisted in the 
database.  However, since right now DDL operations doesn't participate in MVCC 
story in Kudu, there isn't any difference -- I agree.



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 15
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 05 Feb 2021 06:55:47 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 15:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@92
PS14, Line 92: BeginCommitAsyn
> nit: maybe name it BeginCommitAsyncTask if it is asynchronously as well. Sa
Done


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@102
PS14, Line 102: commit_timestamp
> nit: can you comment on how the timestamp will be used?
Done


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@321
PS14, Line 321:   std::lock_guard l(lock_);
> nit: can you update the comment to add that we will also load commits in fl
Done


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@123
PS14, Line 123:
> nit: space.
Done


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@215
PS14, Line 215: urn;
  : }
> nit: remove?
Done


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@257
PS14, Line 257:   participant_ids_[participant_idx],
  :   std::move(op_pb),
  :   
MonoDelta::FromMilliseconds(FLAGS_txn_background_rpc_timeout_ms),
  :   std::move(participated_cb),
  :   &begin_commit_timestamps_[participant_idx]);
  : }
> nit: do you think this block of code worth writing to a method of own? As I
It's not identical everywhere, but sure.


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@268
PS14, Line 268: pated_cb = [this, scoped_this = std::move(scoped_this),
  :   participant_idx, commit_timestamp] 
(const Status& s) {
  : if (IsShuttingDownCleanupIfLastOp()) {
  :
> So we proceed to ScheduleFinalizeCommitWrite even we encounter error here?
Yeah, added a comment to make that clear that it's intentional.


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@787
PS14, Line 787:
  :   std::lock_guar
> This happens when there is no ops in the transaction? What does invalid tim
There is always at least the BEGIN_COMMIT op timestamp, regardless of any write 
ops.

Invalid timestamp is just a dummy timestamp.



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 15
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 05 Feb 2021 05:28:33 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-04 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#15).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,366 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/15
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 15
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-04 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 14:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@92
PS14, Line 92: BeginCommitTask
nit: maybe name it BeginCommitAsyncTask if it is asynchronously as well. Same 
for below.


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@102
PS14, Line 102: commit_timestamp
nit: can you comment on how the timestamp will be used?


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.h@321
PS14, Line 321: // Loads the contents of the transaction status tablet into 
memory.
nit: can you update the comment to add that we will also load commits in flight 
and resume these commits?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@205
PS7, Line 205: return;
 : }
nit: should this comment be moved to L201 to cover 's.IsNotFound()' case?


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@123
PS14, Line 123: "
nit: space.


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@215
PS14, Line 215: If the participant has been deleted (it's not found), proceed 
as if it
  :   // were committed.
nit: remove?


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@257
PS14, Line 257: if (stop_task_ || txn_status_manager_->shutting_down()) {
  :   if (--ops_in_flight_ == 0) {
  : 
txn_status_manager_->RemoveCommitTask(this->txn_id_.value(), this);
  :   }
  :   return;
  : }
nit: do you think this block of code worth writing to a method of own? As I see 
it is repeating serval places?


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@268
PS14, Line 268: (PREDICT_FALSE(!s.ok())) {
  :   LOG(WARNING) << Substitute("Participant $0 
FINALIZE_COMMIT op returned $1",
  :  
participant_ids_[participant_idx], s.ToString());
  : }
So we proceed to ScheduleFinalizeCommitWrite even we encounter error here?


http://gerrit.cloudera.org:8080/#/c/16952/14/src/kudu/transactions/txn_status_manager.cc@787
PS14, Line 787: // If there are no participants for this transaction; just 
write an invalid
  : // timestamp
This happens when there is no ops in the transaction? What does invalid 
timestamp mean in this case?



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 14
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 05 Feb 2021 01:35:35 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#14).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,331 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/14
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 14
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#13).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,329 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/13
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 13
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#12).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,328 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/12
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 12
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#11).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,326 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/11
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 11
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#10).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,325 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/10
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 10
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#9).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

Some follow-ups to expect:
- This isn't as robust to failures as an approach that writes an
  intermediate state to the TxnStatusManager in between steps 3 and 4. A
  follow-up patch will implement that.
- A separate patch will implement aborting transactions.
- I disabled the background tasks in some tests that assume state
  changes are entirely controlled by clients. A follow-up change will
  address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,324 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/9
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 9
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201: if (PREDICT_FALSE(s.IsTimedOut())) {
 :   // Retry timeout errors. Other transient errors should be 
retried by the
 :   // client until timeout.
> The case we're thinking about here is if a transaction is racing with a dro
We discussed this further with Hao and agreed that it's worth exploring an 
implementation in which we write an intermediate record to the TSM before 
starting to send out the FINALIZE_COMMIT ops. This would allow us to be more 
robust against unforeseen issues that might require us to abort 
post-commit-call.



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Feb 2021 21:58:19 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc
File src/kudu/integration-tests/txn_commit-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302
PS7, Line 302: DeleteTable
> Does it work as expected as well if dropping a tablet (just a part of a tab
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@316
PS7, Line 316: TestCommitAfterDroppingRangeP
> I think it would be great to add a scenario when TxnStatusManager instance
Yeah, I can add that as a follow-up if you dont mind it being in a separate 
patch. Pausing is tricky with my current test harnesses, so I'd like to add 
some additional coverage using an EMC, per your other suggestions.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@565
PS7, Line 565:   for (const auto& s : results) {
 : EXPECT_OK(s);
 :   }
> Is this going to put the txn keepalive thread to sleep as well?  I'm not su
Ah indeed. Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690
PS7, Line 690: // Shut down our par
> I guess a real 'abrupt' way of terminating the process is sending SIGKILL :
Yeah, calling Shutdown() from the mini server helps exercise any potential 
races in the destructor, but SIGKILL would be more organic.

I'll add that as a follow-up change.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754
PS7, Line 754:   }
> It would be great to add at least one failure scenario, when finalizing the
That case is very similar to the TestCommitWhenParticipantsAreDown case, though 
less trivial to implement with 1, 2, and 3 node clusters. I can add that as a 
follow-up.


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152
PS4, Line 152:
> Ah, indeed.  It seems I misunderstood where the status comes from (even if
Added a comment


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201: if (PREDICT_FALSE(s.IsTimedOut())) {
 :   // Retry timeout errors. Other transient errors should be 
retried by the
 :   // client until timeout.
> After some consideration, I'm not quite sure this 'good-riddance' semantic
The case we're thinking about here is if a transaction is racing with a drop 
partition. The observation I'm relying on here is that the result of deleting 
the partition after committing and deleting the partition before committing are 
the same -- the end result is the partition is deleted with no way of reading 
the data.

And if the table were deleted while inserting data, it's the onus of the 
writing application to detect that, since there'd be NotFound errors well 
before the commit phase.

It's also important to pass through this case rather than aborting for the 
following case:
1. begin committing a transaction with participants 1..5, running the 
BEGIN_COMMIT op on all participants
2. finalize commit on participants 1..5, running the FINALIZE_COMMIT op on all 
participants
3. leadership changes or the node crashes, a new leader is elected and the 
commit task is retried
4. partition 5 is deleted
5. we notice that partition 5 is deleted. Should we abort the transaction? But 
we've already finalized the commit on some participants!

I added a comment trying to explain that.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@286
PS7, Line 286:   txn_client_->ParticipateInTransactionAsync(
 :   participant_ids_[participant_idx],
> nit: just for the sake of working with std::atomic in a consistent way, may
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@463
PS7, Line 463:
> Should this be a parameter controlled by a run-time flag?
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@471
PS7, Line 471:  now, just
> nit: maybe, move this outside of the 'for ()' cycle?
Done


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@492
PS7, Line 492:   {
 : std::lock_guard l(lock_);
 : highest_txn_id_ = std::max(highest_txn_id, highest_txn_id_);
 :
> Why not to do so prior to starting new commit tasks?  I g

[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-02 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#8).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

A separate patch will implement aborting transactions.

I also disabled the background tasks in some tests that assume state
changes are entirely controlled by clients. A follow-up change will
address these to account for the state changes more organically.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduTransaction.java
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 1,323 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/8
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 7: Code-Review+1

(11 comments)

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc
File src/kudu/integration-tests/txn_commit-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@302
PS7, Line 302: DeleteTable
Does it work as expected as well if dropping a tablet (just a part of a table), 
say, dropping one range of a range-partitioned table?  Is it worth adding a 
test scenario for such case?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@316
PS7, Line 316: TestRestartingWhileCommitting
I think it would be great to add a scenario when TxnStatusManager instance is 
frozen/unfrozen (i.e. sent SIGSTOP/SIGCONT) during the commit phase.  That 
should include a case when the leadership is transferred to another instance 
because of the unresponsive process hosting the former TxnStatusManager leader 
replica, but once the commit finalization is picked up by the new leader, the 
former leader is back and tries to continue its the commit phase finalization 
routine.

It guess it might be a good test to verify the robustness of the commit phase 
and re-enterability of corresponding methods.

What do you think?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@565
PS7, Line 565:
 :   // Wait a bit to allow would-be background aborts to happen.
 :   SleepFor(MonoDelta::FromSeconds(1));
Is this going to put the txn keepalive thread to sleep as well?  I'm not sure.

Instead, why not to move the 'txn' handle into a sub-scope, and serialize it 
into a string which is stored in the top-level scope.  Then, when the original 
'txn' scope goes out of scope, it's possible to de-serialize the handle and use 
the de-serialized handle for calls like KuduTransaction::IsCommitComplete() and 
alike.

What do you think?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@690
PS7, Line 690: tsm_ts_->Shutdown();
I guess a real 'abrupt' way of terminating the process is sending SIGKILL :)

Do you think it's worth adding a scenario like that in this test suite?  Doing 
that in a separate change list might be the best option, I guess.


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/integration-tests/txn_commit-itest.cc@754
PS7, Line 754: }
It would be great to add at least one failure scenario, when finalizing the 
commit phase fails to verify that TxnManager doesn't crash and the transaction 
fails to finalize:
  * one participant responses with an error to its BEGIN_COMMIT request (e.g., 
only one replica out of 3 is running -- BTW, is it going to turn into 
Status::TimedOut() seen at the TxnStatusManager's side or that would be 
something else?)
  * sending the commit timestamp from a participant fails (again, because of 
only one replica out of 3 is running)


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152
PS4, Line 152:
> I don't think so, since I'm pretty sure the client automatically retries Se
Ah, indeed.  It seems I misunderstood where the status comes from (even if you 
added a comment :) ).

Thank you for the clarification!


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@201
PS7, Line 201: if (PREDICT_FALSE(s.IsNotFound())) {
 :   LOG(INFO) << Substitute("Participant $0 was not found: $1",
 :   participant_ids_[participant_idx], 
s.ToString());
After some consideration, I'm not quite sure this 'good-riddance' semantic is a 
good fit for all cases.

E.g., what if some batch of data is being inserted, and a tablet (or a whole 
table) is dropped unintentionally during that process?  From the user 
perspective (who isn't aware of the dropped tablet/table) that should be 
certainly an error -- the data that they thought is reliably persisted after 
the commit has successfully finalized is actually gone (what a surprise, isn't 
it?).

Should we instead fail a transaction is such a case?  Maybe, at least add a 
flag to make this behavior configurable, where by default a transaction fails 
to commit if one of its actors with some added data disappeared during the 
commit phase?


http://gerrit.cloudera.org:8080/#/c/16952/7/src/kudu/transactions/txn_status_manager.cc@286
PS7, Line 286:   DCHECK_EQ(0, ops_in_flight_);
 :   o

[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152
PS4, Line 152:
> Does it make sense to retry Status::ServiceUnavailable() as well?
I don't think so, since I'm pretty sure the client automatically retries 
ServiceUnavailable errors until the timeout. Unless there's a specific code 
path you're referring to?



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 02 Feb 2021 01:13:47 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#7).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

A separate patch will implement aborting transactions.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 1,264 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/7
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#6).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

A separate patch will implement aborting transactions.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 1,263 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/6
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@146
PS4, Line 146: (--ops_in_flight_ == 0)
> I'm not sure it's safe even if ops_in_flight_ is an atomic.  What happens i
Ah, operator--() is actually fetch_sub(1)-1.  In such case, it should be safe.

Then the only concern is not to decrement it past 0, but it seems it cannot be 
the case in this code.



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 01 Feb 2021 09:04:11 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@146
PS4, Line 146: (--ops_in_flight_ == 0)
I'm not sure it's safe even if ops_in_flight_ is an atomic.  What happens if 
two threads do this simultaneously when the value of 'ops_in_flight_' was 2?  
As I understand, atomic guarantees atomic read and write, but I'm not sure 
about the serialization of the access.  Should it be compare_exchange_strong() 
instead?


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@152
PS4, Line 152: Retry timeout errors.
Does it make sense to retry Status::ServiceUnavailable() as well?



--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 01 Feb 2021 08:43:29 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16952 )

Change subject: KUDU-2612: background task to commit transaction
..


Patch Set 5:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7200
PS4, Line 7200:
> nit here and below where changed from 'false' --> 'true': consider not spec
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7201
PS4, Line 7201:   shared_ptr txn;
  :   ASSERT_OK(client_->NewTransaction(&txn));
  :   ASSERT_OK(txn->Commit());
  :   bool is_complete = false;
> nit: I guess this should be removed once the scenario updated to -- I guess
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/client/client-test.cc@7217
PS4, Line 7217:
> nit: maybe, set it to 'false' to be complete sure that IsCommitComplete() w
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc
File src/kudu/integration-tests/txn_commit-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@222
PS4, Line 222: KuduScanner scanner(table.get());
 : RETURN_NOT_OK(scanner.Open());
 : int rows = 0;
> Actually, there is way of doing this without exposing transaction identifie
Ah, thanks for the pointer! Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@283
PS4, Line 283: kN
> nit: consider introducing a constant and using it here and below
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@291
PS4, Line 291: itComplete(&
> nit: if it makes sense, consider omitting the 'wait' parameter since here s
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@316
PS4, Line 316:   shared_ptr txn;
 :   shared_ptr txn_session;
 :   ASSERT_OK(BeginTransaction(participant_ids_, &txn, 
&txn_session));
 :   FLAGS_txn_status_manager_inject_latency_fina
> nit: how long does it take to complete?  If more than 3 seconds, consider d
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@663
PS4, Line 663:   ASSERT_TRUE(completion_status.IsIncomplete()) << 
completion_status.ToString();
> nit: maybe, add ' << completion_status.ToString()' to diagnose a failure (i
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/integration-tests/txn_commit-itest.cc@680
PS4, Line 680:
> Just in case I didn't miss it among those scenarios: it would be great to a
Done


http://gerrit.cloudera.org:8080/#/c/16952/1/src/kudu/transactions/txn_status_entry.cc
File src/kudu/transactions/txn_status_entry.cc:

http://gerrit.cloudera.org:8080/#/c/16952/1/src/kudu/transactions/txn_status_entry.cc@a45
PS1, Line 45:
> This condition no longer hold?
No, e.g. when reloading the tablet after committing. Seems our tests didn't 
cover those codepaths previously.


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.h@80
PS4, Line 80:
> nit: it would be nice to document the parameter and give it a bit more mean
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@137
PS4, Line 137: using kudu::pb_util::SecureShortDebugString;
> nit: add a DCHECK() for the index 'i' in 'participant_ids_'?
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@208
PS4, Line 208: such errors wou
> nit: if it makes sense, unify using 'finalize commit' and 'FINALIZE_COMMIT'
Done


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_status_manager.cc@253
PS4, Line 253:   if (PR
> What if the commit_pool_ is being shut down?  Is it supposed to crash?
Nice catch. I've updated the shutdown logic such that we always finish the 
commit tasks before shutting down the threadpool.


http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_system_client.cc
File src/kudu/transactions/txn_system_client.cc:

http://gerrit.cloudera.org:8080/#/c/16952/4/src/kudu/transactions/txn_system_client.cc@449
PS4, Line 449:   Status s;
 :   do {
 : s = GetClient(client);
 : if (PREDICT_TRUE(s.ok())) {
 :   DCHECK(*client);
 :   return Status::OK();
 : }
 : SleepFor(MonoDelta::FromMilliseconds(100));
 :   }

[kudu-CR] KUDU-2612: background task to commit transaction

2021-02-01 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16952

to look at the new patch set (#5).

Change subject: KUDU-2612: background task to commit transaction
..

KUDU-2612: background task to commit transaction

This patch introduces background tasks that get run when
KuduTransaction::Commit() is called. The typical workflow is as follows:
1. Commit() is called, resulting in a BeginCommitTransaction() call on
   the TxnStatusManager.
2. An update is made to the transaction status table, marking the
   transaction's state as COMMIT_IN_PROGRESS.
3. The commit tasks are initiated -- BEGIN_COMMIT ops are sent
   asynchronously to every participant of the transaction.
4. Once all responses are received from the participants, a commit
   timestamp is determined, and FINALIZE_COMMIT ops are sent
   asynchronously to every participant.
5. Once all responses are received from the participants, an update is
   made to the transaction status table, marking the transaction's state
   as COMMITTED.

There are some nuances here around error handling. Namely, what do we do
if there are errors in sending the above requests? Well, it depends on
the error. Transient errors (i.e. timeouts) are simply retried. More
permanent errors need a bit more thought though:
- If a participant has been deleted, what do we do? This patch makes a
  best effort attempt to abort the transaction if so.
- Any other kinds of errors (e.g. illegal state errors from a
  participant) aren't expected in normal operation of a cluster. For
  this, we stop the commit task and log a warning. Hopefully an operator
  can intervene.

A separate patch will implement aborting transactions.

Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/transactions.proto
M src/kudu/transactions/txn_status_entry.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_system_client.cc
M src/kudu/transactions/txn_system_client.h
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 1,258 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/16952/5
--
To view, visit http://gerrit.cloudera.org:8080/16952
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie2258dded3ab3d527cb5d0abdc7d5e7deb4da15e
Gerrit-Change-Number: 16952
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)