Hello Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/13011 to review the following change. Change subject: generic_iterators: MergeIterator whole block copy optimization ...................................................................... generic_iterators: MergeIterator whole block copy optimization This patch adds the "whole block copy" optimization to the MergeIterator. The idea is simple: whenever there's only one sub-iterator in the merge window, we can copy the entire block out of that sub-iterator instead of copying row-by-row. The challenging aspect was properly handling the client's SelectionVector: - When copying row-by-row, the MergeIterator must skip deselected rows in order to always return the next smallest row. The MergeIterState bookkeeping helped enforce this invariant. - When copying block-by-block, skipping deselected rows is harder (and potentially less performant) than the simple bitmap copy we currently do. Plus it's not necessary for correctness; the scan endpoint will skip deselected rows when serializing the RowBlock. So I opted to retain deselected rows in the block-by-block case, and updated the MergeIterState bookkeeping to cope. I also changed the default RowBlock sizes in various merge-related paths to be a power of 2. This should increase the likelihood of hitting the BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows, then one sub-iterator has an additional 65 rows: the BitmapCopy operations in the block copy could not be byte-aligned). It yielded a slight improvement in microbenchmarks, though not in the macrobenchmark. Finally, I tweaked the heuristic for whole-block-copying such that it only happens when there's more than one row to copy. I don't totally buy the rationale, but it did yield an improvement in both the micro and macro benchmarks, so I must be onto something. Below are the micro and macrobenchmark results. The microbenchmark was generic_iterators-test's overlapping and non-overlapping MergeIterator tests with 10 iterations, 1000 lists, and 10000 rows per list, collecting the wallclock time to scan and averaging it across the runs. The macrobenchmark was running 'kudu perf tablet_scan' on a representative 40GB tablet six times, dropping the first time (to reduce cache effects), and averaging the remaining wallclock times. no whole-block-copy: - non-overlapping: 0.9437 - overlapping: 9.5113 - representative tablet: 786.732 whole-block-copy, no power-of-2 RowBlocks - non-overlapping: 0.6301 - overlapping: 9.7518 - representative tablet: 751.297 whole-block-copy, power-of-2 RowBlocks - non-overlapping: 0.5316 - overlapping: 9.5718 - representative tablet: 754.961 whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left: - non-overlapping: 0.5247 - overlapping: 9.1287 - representative tablet: 720.534 Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d --- M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/rowblock.h M src/kudu/tserver/tablet_service.cc 4 files changed, 126 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/1 -- To view, visit http://gerrit.cloudera.org:8080/13011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d Gerrit-Change-Number: 13011 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>