[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..

KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run

In testing an in-flight patch[1], it was found that our current handling of
errors when participant registration hasn't completed can lead to
incorrect behavior: the path in which we handle NOT_FOUND errors while
committing expects that such errors only occur if the tablet has been
deleted, and we currently proceed with the commit. The incorrect
assumption here was that NOT_FOUND errors are only produced when the
tablet has been deleted, which is not the case.

This behavior will be amended in an upcoming patch[2], and we'll
actually abort the transaction on NOT_FOUND errors. Until then, this
patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND,
so at least the error type is consistent with the rest of the
participant op lifecycle errors.

[1] https://gerrit.cloudera.org/c/17037/
[2] https://gerrit.cloudera.org/c/17022

Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Reviewed-on: http://gerrit.cloudera.org:8080/17127
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
Reviewed-by: Hao Hao 
---
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.h
2 files changed, 3 insertions(+), 3 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved
  Hao Hao: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Thanks!  This sounds like a reasonable plan, indeed.
Yep, sounds good. I'll merge this and follow it up then.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 22:07:37 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Ah great question. In that case, I think the correct course of action would
Thanks!  This sounds like a reasonable plan, indeed.

I guess we can keep this patch as is, and address the issue in a separate 
patch.  Whatever you think is best choice here.

Exactly: if the write request is retried and TxnOpDispatcher tries to register 
a participant and issue BEGIN_TXN for the participant again, it will fail at 
registering the participant.  Then TxnOpDispatcher will respond with 
appropriate error back to client, so the client will know that its write failed.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 21:39:51 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Right: I meant the case when a participant is registered, but BEGIN_TXN has
Ah great question. In that case, I think the correct course of action would be 
to treat the non-existent transaction on the participant as OK, since the end 
result is still that the transaction doesn't exist once the ABORT_TXN returns, 
and I can update this patch and my following to reflect that. Then, the 
question is what happens if the write is retried after that happens, and the 
answer there is that the new dispatcher's call to RegisterParticipant would 
fail.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 20:02:43 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-26 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 19:56:59 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Right: I meant the case when either the participant is registered, but BEGI
Right: I meant the case when a participant is registered, but BEGIN_TXN hasn't 
been issued (say, tablet server with corresponding TxnOpDispatcher has 
crashed), and corresponding  client doesn't retry its write operation (e.g., it 
has timed out).

I'm curious how to get out of such situation?

BTW, TxnOpDispatcher clears its ops queue when receiving ABORT_TXN by 
responding with Status::Aborted() to all pending write requests.  So, it 
doesn't sent ServiceUnavailable() when receiving ABORT_TXN and pending 
operations are present in its queue.



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 06:04:35 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> Do you mean a scenario in which we haven't successfully called BEGIN_TXN (o
Right: I meant the case when either the participant is registered, but 
BEGIN_TXN hasn't been issued (say, tablet server with corresponding 
TxnOpDispatcher has crashed) and that particular client doesn't retry it's 
write operation because it has timed out.

In that case, how is aborting such a transaction would work?



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 05:21:07 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
> As a second thought, I realized I don't clearly see how we can cleanly abor
Do you mean a scenario in which we haven't successfully called BEGIN_TXN (or 
it's still in progress)? In that case, would there still be an active 
dispatcher? I thought in that case ABORT_TXN and all other participant ops 
would just be retried, since the dispatcher would return ServiceUnavailable?



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 04:59:55 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266
PS1, Line 266: ABORT_TXN
As a second thought, I realized I don't clearly see how we can cleanly abort a 
transaction which faltered at one of its participants and doesn't have metadata 
about started transaction if ABORT_TXN returns ILLEGAL_STATE as well.

Is it going to be addressed in a separate patch or 
https://gerrit.cloudera.org/#/c/17022/ will take care of this as well?



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 03:00:25 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Patch Set 1: Code-Review+2

(1 comment)

Thank you for fixing this!

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@248
PS1, Line 248: TestTransactionNotFound
nit: does it makes sense to rename this?  Or, maybe, at least add some text 
blurb to explain what these sub-scenarios mean in the bigger picture



--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 26 Feb 2021 02:50:59 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Andrew Wong (Code Review)
Andrew Wong has removed Anonymous Coward (314) from this change.  ( 
http://gerrit.cloudera.org:8080/17127 )

Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..


Removed reviewer null.
--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run

2021-02-25 Thread Andrew Wong (Code Review)
Hello Alexey Serbin, Anonymous Coward (314),

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/17127

to review the following change.


Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run
..

KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run

In testing an in-flight patch[1], it was found that our current handling of
errors when participant registration hasn't completed can lead to
incorrect behavior: the path in which we handle NOT_FOUND errors while
committing expects that such errors only occur if the tablet has been
deleted, and we currently proceed with the commit. The incorrect
assumption here was that NOT_FOUND errors are only produced when the
tablet has been deleted, which is not the case.

This behavior will be amended in an upcoming patch[2], and we'll
actually abort the transaction on NOT_FOUND errors. Until then, this
patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND,
so at least the error type is consistent with the rest of the
participant op lifecycle errors.

[1] https://gerrit.cloudera.org/c/17037/
[2] https://gerrit.cloudera.org/c/17022

Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
---
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.h
2 files changed, 3 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/17127/1
--
To view, visit http://gerrit.cloudera.org:8080/17127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d
Gerrit-Change-Number: 17127
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Anonymous Coward (314)