[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run In testing an in-flight patch[1], it was found that our current handling of errors when participant registration hasn't completed can lead to incorrect behavior: the path in which we handle NOT_FOUND errors while committing expects that such errors only occur if the tablet has been deleted, and we currently proceed with the commit. The incorrect assumption here was that NOT_FOUND errors are only produced when the tablet has been deleted, which is not the case. This behavior will be amended in an upcoming patch[2], and we'll actually abort the transaction on NOT_FOUND errors. Until then, this patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND, so at least the error type is consistent with the rest of the participant op lifecycle errors. [1] https://gerrit.cloudera.org/c/17037/ [2] https://gerrit.cloudera.org/c/17022 Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Reviewed-on: http://gerrit.cloudera.org:8080/17127 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin Reviewed-by: Hao Hao --- M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.h 2 files changed, 3 insertions(+), 3 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved Hao Hao: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Thanks! This sounds like a reasonable plan, indeed. Yep, sounds good. I'll merge this and follow it up then. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 22:07:37 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Ah great question. In that case, I think the correct course of action would Thanks! This sounds like a reasonable plan, indeed. I guess we can keep this patch as is, and address the issue in a separate patch. Whatever you think is best choice here. Exactly: if the write request is retried and TxnOpDispatcher tries to register a participant and issue BEGIN_TXN for the participant again, it will fail at registering the participant. Then TxnOpDispatcher will respond with appropriate error back to client, so the client will know that its write failed. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 21:39:51 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Right: I meant the case when a participant is registered, but BEGIN_TXN has Ah great question. In that case, I think the correct course of action would be to treat the non-existent transaction on the participant as OK, since the end result is still that the transaction doesn't exist once the ABORT_TXN returns, and I can update this patch and my following to reflect that. Then, the question is what happens if the write is retried after that happens, and the answer there is that the new dispatcher's call to RegisterParticipant would fail. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 20:02:43 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 19:56:59 + Gerrit-HasComments: No
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Right: I meant the case when either the participant is registered, but BEGI Right: I meant the case when a participant is registered, but BEGIN_TXN hasn't been issued (say, tablet server with corresponding TxnOpDispatcher has crashed), and corresponding client doesn't retry its write operation (e.g., it has timed out). I'm curious how to get out of such situation? BTW, TxnOpDispatcher clears its ops queue when receiving ABORT_TXN by responding with Status::Aborted() to all pending write requests. So, it doesn't sent ServiceUnavailable() when receiving ABORT_TXN and pending operations are present in its queue. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 06:04:35 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > Do you mean a scenario in which we haven't successfully called BEGIN_TXN (o Right: I meant the case when either the participant is registered, but BEGIN_TXN hasn't been issued (say, tablet server with corresponding TxnOpDispatcher has crashed) and that particular client doesn't retry it's write operation because it has timed out. In that case, how is aborting such a transaction would work? -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 05:21:07 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN > As a second thought, I realized I don't clearly see how we can cleanly abor Do you mean a scenario in which we haven't successfully called BEGIN_TXN (or it's still in progress)? In that case, would there still be an active dispatcher? I thought in that case ABORT_TXN and all other participant ops would just be retried, since the dispatcher would return ServiceUnavailable? -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 04:59:55 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@266 PS1, Line 266: ABORT_TXN As a second thought, I realized I don't clearly see how we can cleanly abort a transaction which faltered at one of its participants and doesn't have metadata about started transaction if ABORT_TXN returns ILLEGAL_STATE as well. Is it going to be addressed in a separate patch or https://gerrit.cloudera.org/#/c/17022/ will take care of this as well? -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 03:00:25 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Patch Set 1: Code-Review+2 (1 comment) Thank you for fixing this! http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc File src/kudu/tablet/txn_participant-test.cc: http://gerrit.cloudera.org:8080/#/c/17127/1/src/kudu/tablet/txn_participant-test.cc@248 PS1, Line 248: TestTransactionNotFound nit: does it makes sense to rename this? Or, maybe, at least add some text blurb to explain what these sub-scenarios mean in the bigger picture -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Fri, 26 Feb 2021 02:50:59 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Andrew Wong has removed Anonymous Coward (314) from this change. ( http://gerrit.cloudera.org:8080/17127 ) Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. Removed reviewer null. -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2612: don't return NOT FOUND when BEGIN TXN has not yet run
Hello Alexey Serbin, Anonymous Coward (314), I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/17127 to review the following change. Change subject: KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run .. KUDU-2612: don't return NOT_FOUND when BEGIN_TXN has not yet run In testing an in-flight patch[1], it was found that our current handling of errors when participant registration hasn't completed can lead to incorrect behavior: the path in which we handle NOT_FOUND errors while committing expects that such errors only occur if the tablet has been deleted, and we currently proceed with the commit. The incorrect assumption here was that NOT_FOUND errors are only produced when the tablet has been deleted, which is not the case. This behavior will be amended in an upcoming patch[2], and we'll actually abort the transaction on NOT_FOUND errors. Until then, this patch adjust the behavior to return ILLEGAL_STATE instead of NOT_FOUND, so at least the error type is consistent with the rest of the participant op lifecycle errors. [1] https://gerrit.cloudera.org/c/17037/ [2] https://gerrit.cloudera.org/c/17022 Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d --- M src/kudu/tablet/txn_participant-test.cc M src/kudu/tablet/txn_participant.h 2 files changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/17127/1 -- To view, visit http://gerrit.cloudera.org:8080/17127 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I8fa8caba384ee30536114a3e6466ad90b6d8e45d Gerrit-Change-Number: 17127 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward (314)