Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12205 )
Change subject: KUDU-2645: tablet: Support deduplication of deleted rows in MergeIterator ...................................................................... KUDU-2645: tablet: Support deduplication of deleted rows in MergeIterator This patch makes it possible to do an incremental diff scan of an entire tablet. A follow-up patch will expose this capability to scanners at the RPC level. Included is new support in the MergeIterator for deduplicating deleted rows resulting from when a row with a particular primary key is deleted from one rowset and reinserted into a different rowset. These duplicates may be returned when the 'include_deleted_rows' option is enabled in the row iterator options. Because of this new behavior, while the MergeIterator will deduplicate deleted rows if they are included in the result set, the UnionIterator will not deduplicate them and instead return all instances found, because it's not possible to efficiently support such deduplication without a merge-like process. One tangentially-related change in this patch is that TestMerge in generic_iterators-test.cc was modified to no longer generate duplicate non-deleted keys for merge testing. Duplicate non-deleted row keys are no longer supported in the MergeIterator since there is currently no practical use for that, and it's more efficient not to support them since at the time of writing it isn't possible for them to appear in a real tablet. This patch adds tests at a couple of different levels, including a VectorIterator-based test that is useful for benchmarking MergeIterator performance on a simple schema, as well as a higher-level diff scan test that operates at the Tablet level. Change-Id: I00614b3fa5c6b4e7b620bb78489e24c5ad44daee Reviewed-on: http://gerrit.cloudera.org:8080/12205 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Mike Percy <[email protected]> --- M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/generic_iterators.h M src/kudu/common/schema.cc M src/kudu/common/schema.h M src/kudu/tablet/diff_scan-test.cc M src/kudu/tablet/rowset.cc M src/kudu/tablet/tablet.cc 8 files changed, 308 insertions(+), 79 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Mike Percy: Verified -- To view, visit http://gerrit.cloudera.org:8080/12205 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I00614b3fa5c6b4e7b620bb78489e24c5ad44daee Gerrit-Change-Number: 12205 Gerrit-PatchSet: 10 Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot (241)
