Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17561
Change subject: KUDU-3291: properly disambiguate between deltas of a row with the same timestamp ...................................................................... KUDU-3291: properly disambiguate between deltas of a row with the same timestamp In performing a diff scan, Kudu iterates in small batches of rows, selecting deltas associated with each row that are relevant to the scan's timestamp bounds. Once all selected deltas are collected for a given row, the oldest and newest deltas are found by a sorting criteria meant to sort deltas by their application order: 1. Deltas of lower timestamps are less than deltas of higher timestmaps. 2. UNDO deltas are less than REDO deltas. 3. Within each delta store's iterator, a counter of selected deltas is used to disambiguate between rows that have the same timestamp. A critical assumption here is that this disambiguator is only used for deltas of the same delta store. For REDOs, a lower counter implies a lower application order -- the opposite is true for UNDOs. What the above criteria don't account for is the fact that certain iterators can iterate over separate delta stores that have deltas of the same timestamp and type. If Kudu delta flushes while applying a large batch of updates to the same row, the result is that some of the batch's updates can land in the newly flushed REDO delta file, while the rest land in the new DMS. In iterating over these stores with a DeltaIteratorMerger, which combines the deltas of several delta stores, this breaks the assumption described above, resulting in the crash reported in KUDU-3291. To remediate this, in iterators that merge multiple delta stores, namely, the DeltaIteratorMerger, a single top-level counter is used to guide the disambiguators generated by each sub-iterator. Before iterating over a new batch of deltas in a given sub-iterator, this counter is propagated to the sub-iterators as the new starting point of its counter. The result is that the disambiguators generated by the DeltaIteratorMerger can be used to define a total ordering of the deltas selected. Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Reviewed-on: http://gerrit.cloudera.org:8080/17547 Reviewed-by: Alexey Serbin <aser...@cloudera.com> Tested-by: Andrew Wong <aw...@cloudera.com> Reviewed-by: Grant Henke <granthe...@apache.org> (cherry picked from commit 9ecee7ba065f1f7c844f1afd7136ab0565ce2340) --- M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/tablet-test-base.h 8 files changed, 157 insertions(+), 32 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/17561/1 -- To view, visit http://gerrit.cloudera.org:8080/17561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.15.x Gerrit-MessageType: newchange Gerrit-Change-Id: Iccfc518999d36679f85ed901ba65cf7b4894cd55 Gerrit-Change-Number: 17561 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <aw...@cloudera.com>