[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17547 ) Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. KUDU-3291: properly disambiguate between deltas of a row with the same timestamp In performing a diff scan, Kudu iterates in small batches of rows, selecting deltas associated with each row that are relevant to the scan's timestamp bounds. Once all selected deltas are collected for a given row, the oldest and newest deltas are found by a sorting criteria meant to sort deltas by their application order: 1. Deltas of lower timestamps are less than deltas of higher timestmaps. 2. UNDO deltas are less than REDO deltas. 3. Within each delta store's iterator, a counter of selected deltas is used to disambiguate between rows that have the same timestamp. A critical assumption here is that this disambiguator is only used for deltas of the same delta store. For REDOs, a lower counter implies a lower application order -- the opposite is true for UNDOs. What the above criteria don't account for is the fact that certain iterators can iterate over separate delta stores that have deltas of the same timestamp and type. If Kudu delta flushes while applying a large batch of updates to the same row, the result is that some of the batch's updates can land in the newly flushed REDO delta file, while the rest land in the new DMS. In iterating over these stores with a DeltaIteratorMerger, which combines the deltas of several delta stores, this breaks the assumption described above, resulting in the crash reported in KUDU-3291. To remediate this, in iterators that merge multiple delta stores, namely, the DeltaIteratorMerger, a single top-level counter is used to guide the disambiguators generated by each sub-iterator. Before iterating over a new batch of deltas in a given sub-iterator, this counter is propagated to the sub-iterators as the new starting point of its counter. The result is that the disambiguators generated by the DeltaIteratorMerger can be used to define a total ordering of the deltas selected. Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Reviewed-on: http://gerrit.cloudera.org:8080/17547 Reviewed-by: Alexey Serbin Tested-by: Andrew Wong Reviewed-by: Grant Henke --- M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/tablet-test-base.h 8 files changed, 157 insertions(+), 32 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Andrew Wong: Verified Grant Henke: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/17547 ) Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 08 Jun 2021 14:45:14 + Gerrit-HasComments: No
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17547 ) Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. Patch Set 4: Verified+1 Test failure looks unrelated: ClientTest.TxnKeepAliveAndUnavailableTxnManagerLongTime -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 08 Jun 2021 03:38:00 + Gerrit-HasComments: No
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Andrew Wong has removed a vote on this change. Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17547 ) Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 08 Jun 2021 01:08:27 + Gerrit-HasComments: No
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17547 to look at the new patch set (#4). Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. KUDU-3291: properly disambiguate between deltas of a row with the same timestamp In performing a diff scan, Kudu iterates in small batches of rows, selecting deltas associated with each row that are relevant to the scan's timestamp bounds. Once all selected deltas are collected for a given row, the oldest and newest deltas are found by a sorting criteria meant to sort deltas by their application order: 1. Deltas of lower timestamps are less than deltas of higher timestmaps. 2. UNDO deltas are less than REDO deltas. 3. Within each delta store's iterator, a counter of selected deltas is used to disambiguate between rows that have the same timestamp. A critical assumption here is that this disambiguator is only used for deltas of the same delta store. For REDOs, a lower counter implies a lower application order -- the opposite is true for UNDOs. What the above criteria don't account for is the fact that certain iterators can iterate over separate delta stores that have deltas of the same timestamp and type. If Kudu delta flushes while applying a large batch of updates to the same row, the result is that some of the batch's updates can land in the newly flushed REDO delta file, while the rest land in the new DMS. In iterating over these stores with a DeltaIteratorMerger, which combines the deltas of several delta stores, this breaks the assumption described above, resulting in the crash reported in KUDU-3291. To remediate this, in iterators that merge multiple delta stores, namely, the DeltaIteratorMerger, a single top-level counter is used to guide the disambiguators generated by each sub-iterator. Before iterating over a new batch of deltas in a given sub-iterator, this counter is propagated to the sub-iterators as the new starting point of its counter. The result is that the disambiguators generated by the DeltaIteratorMerger can be used to define a total ordering of the deltas selected. Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 --- M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/tablet-test-base.h 8 files changed, 157 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/17547/4 -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17547 ) Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. Patch Set 3: (8 comments) http://gerrit.cloudera.org:8080/#/c/17547/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17547/3//COMMIT_MSG@36 PS3, Line 36: this counter is : propagated to the sub-iterators > Just to make sure I understand this correctly: is there any guarantee that Yeah, right now, each sub-iterator only keeps track of the deltas within the delta store being iterated over. In that sense, the disambiguators assigned by each sub-iterator are correct only for that delta store. http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h@144 PS3, Line 144: a.disambiguator != b.disambiguator > nit: wrap into PREDICT_TRUE()? Done http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h@147 PS3, Line 147: "Could not differentiate between two deltas"; > Not a part of this changelist, but does it make sense to drump information Done http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h@473 PS3, Line 473: DCHECK_GE > Can we put DCHECK_GT() here? Or it can be that no deltas are selected in a It is possible that no deltas are selected in the sub-iterators. http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc File src/kudu/tablet/diff_scan-test.cc: http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@228 PS3, Line 228: & > nit: maybe, store it by value? Done http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@233 PS3, Line 233: ASSERT_OK > nit: could it ever fail? If yes, then it'd necessary to join the thread si Done http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@243 PS3, Line 243: opts.order = ORDERED;; : opts.include_deleted_rows = true; > How crucial are these custom settings for the reproduction of the issue? I Done http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@246 PS3, Line 246: static const bool > nit: as for 'kRowKey', could use constexpr here as well (just for uniformit This is removed. -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Tue, 08 Jun 2021 00:36:20 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17547 ) Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. Patch Set 3: Code-Review+1 (8 comments) LGTM overall, just a couple of questions (just to make sure my understanding is correct) and nits. http://gerrit.cloudera.org:8080/#/c/17547/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17547/3//COMMIT_MSG@36 PS3, Line 36: this counter is : propagated to the sub-iterators Just to make sure I understand this correctly: is there any guarantee that those counters would be different for sub-iterators? Is that because every selected delta increases the count? http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h@144 PS3, Line 144: a.disambiguator != b.disambiguator nit: wrap into PREDICT_TRUE()? http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h@147 PS3, Line 147: "Could not differentiate between two deltas"; Not a part of this changelist, but does it make sense to drump information on deltas into the fatal error message? http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/delta_store.h@473 PS3, Line 473: DCHECK_GE Can we put DCHECK_GT() here? Or it can be that no deltas are selected in a sub-iterator? http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc File src/kudu/tablet/diff_scan-test.cc: http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@228 PS3, Line 228: & nit: maybe, store it by value? http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@233 PS3, Line 233: ASSERT_OK nit: could it ever fail? If yes, then it'd necessary to join the thread since line 234 wouldn't be reachable http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@243 PS3, Line 243: opts.order = ORDERED;; : opts.include_deleted_rows = true; How crucial are these custom settings for the reproduction of the issue? It would be nice to add a comment about that. http://gerrit.cloudera.org:8080/#/c/17547/3/src/kudu/tablet/diff_scan-test.cc@246 PS3, Line 246: static const bool nit: as for 'kRowKey', could use constexpr here as well (just for uniformity) -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 07 Jun 2021 20:17:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17547 to look at the new patch set (#3). Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. KUDU-3291: properly disambiguate between deltas of a row with the same timestamp In performing a diff scan, Kudu iterates in small batches of rows, selecting deltas associated with each row that are relevant to the scan's timestamp bounds. Once all selected deltas are collected for a given row, the oldest and newest deltas are found by a sorting criteria meant to sort deltas by their application order: 1. Deltas of lower timestamps are less than deltas of higher timestmaps. 2. UNDO deltas are less than REDO deltas. 3. Within each delta store's iterator, a counter of selected deltas is used to disambiguate between rows that have the same timestamp. A critical assumption here is that this disambiguator is only used for deltas of the same delta store. For REDOs, a lower counter implies a lower application order -- the opposite is true for UNDOs. What the above criteria don't account for is the fact that certain iterators can iterate over separate delta stores that have deltas of the same timestamp and type. If Kudu delta flushes while applying a large batch of updates to the same row, the result is that some of the batch's update can land in the newly flushed REDO delta file, while the rest land in the new DMS. In iterating over these stores with a DeltaIteratorMerger, which combines the deltas of several delta stores, this breaks the assumption described above, resulting in the crash reported in KUDU-3291. To remediate this, in iterators that merge multiple delta stores, namely, the DeltaIteratorMerger, a single top-level counter is used to define the disambiguators generated by sub-iterator. Before iterating over a new batch of deltas in a given sub-iterator, this counter is propagated to the sub-iterators as the new starting point of its counter. The result is that the disambiguators generated by the DeltaIteratorMerger can be used to define a total ordering of the deltas selected. Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 --- M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/tablet-test-base.h 8 files changed, 152 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/17547/3 -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17547 to look at the new patch set (#2). Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. KUDU-3291: properly disambiguate between deltas of a row with the same timestamp In performing a diff scan, Kudu iterates in small batches of rows, selecting deltas associated with each row that are relevant to the scan's timestamp bounds. Once all selected deltas are collected for a given row, the oldest and newest deltas are found by a sorting criteria meant to sort deltas by their application order: 1. Deltas of lower timestamps are less than deltas of higher timestmaps. 2. UNDO deltas are less than REDO deltas. 3. Within each delta store's iterator, a counter of selected deltas is used to disambiguate between rows that have the same timestamp. A critical assumption here is that this disambiguator is only used for deltas of the same delta store. For REDOs, a lower counter implies a lesser application order -- the opposite is true for UNDOs. What the above criteria don't account for is the fact that certain iterators can iterate over separate delta stores have deltas of the same timestamp and type. If Kudu delta flushes while applying a large batch of updates to the same row, the result is that some of the batch's update can land in the newly flushed REDO delta file, while the rest land in the new DMS. In iterating over these stores with a DeltaIteratorMerger, which combines the deltas of several delta stores, this breaks the assumption described above, resulting in the crash reported in KUDU-3291. To remediate this, in iterators that merge multiple delta stores, namely, the DeltaIteratorMerger, a single top-level counter is used to define the disambiguators generated by sub-iterator. Before iterating over a new batch of deltas in a given sub-iterator, this counter is propagated to the sub-iterators as the new starting point of its counter. The result is that the disambiguators generated by the DeltaIteratorMerger can be used to define a total ordering of the deltas selected. Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 --- M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/tablet-test-base.h 8 files changed, 151 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/17547/2 -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)
[kudu-CR] KUDU-3291: properly disambiguate between deltas of a row with the same timestamp
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17547 Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp .. KUDU-3291: properly disambiguate between deltas of a row with the same timestamp In performing a diff scan, Kudu iterates in small batches of rows, selecting deltas associated with each row that are relevant to the scan's timestamp bounds. Once all selected deltas are collected for a given row, the oldest and newest deltas are found by a sorting criteria meant to sort deltas by their application order: 1. Deltas of lower timestamps are less than deltas of higher timestmaps. 2. UNDO deltas are less than REDO deltas. 3. Within each delta store's iterator, a counter of selected deltas is used to disambiguate between rows that have the same timestamp. A critical assumption here is that this disambiguator is only used for deltas of the same delta store. For REDOs, a lower counter implies a lesser application order -- the opposite is true for UNDOs. What the above criteria don't account for is the fact that certain iterators can iterate over separate delta stores have deltas of the same timestamp and type. If Kudu delta flushes while applying a large batch of updates to the same row, the result is that some of the batch's update can land in the newly flushed REDO delta file, while the rest land in the new DMS. In iterating over these stores with a DeltaIteratorMerger, which combines the deltas of several delta stores, this breaks the assumption described above, resulting in the crash reported in KUDU-3291. To remediate this, in iterators that merge multiple delta stores, namely, the DeltaIteratorMerger, a single top-level counter is used to define the disambiguators generated by sub-iterator. Before iterating over a new batch of deltas in a given sub-iterator, this counter is propagated to the sub-iterators as the new starting point of its counter. The result is that the disambiguators generated by the DeltaIteratorMerger can be used to define a total ordering of the deltas selected. Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 --- M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/tablet-test-base.h 8 files changed, 144 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/17547/1 -- To view, visit http://gerrit.cloudera.org:8080/17547 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17547 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong