Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13011 to look at the new patch set (#4). Change subject: generic_iterators: MergeIterator whole block copy optimization ...................................................................... generic_iterators: MergeIterator whole block copy optimization This patch adds the "whole block copy" optimization to the MergeIterator. The idea is simple: whenever there's only one sub-iterator in the merge window, we can copy the entire block out of that sub-iterator instead of copying row-by-row. The challenging aspect was properly handling the client's SelectionVector: - When copying row-by-row, the MergeIterator must skip deselected rows in order to always return the next smallest row. The MergeIterState bookkeeping helped enforce this invariant. - When copying block-by-block, skipping deselected rows is harder (and potentially less performant) than the simple bitmap copy we currently do. Plus it's not necessary for correctness; the scan endpoint will skip deselected rows when serializing the RowBlock. So I opted to retain deselected rows in the block-by-block case, and updated the MergeIterState bookkeeping to cope. I also changed the default RowBlock sizes in various merge-related paths to be a power of 2. This should increase the likelihood of hitting the BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows, then one sub-iterator has an additional 65 rows: the BitmapCopy operations in the block copy could not be byte-aligned). It yielded a slight improvement in microbenchmarks, though not in the macrobenchmark. Finally, I tweaked the heuristic for whole-block-copying such that it only happens when there's more than one row to copy. I don't totally buy the rationale, but it did yield an improvement in both the micro and macro benchmarks, so I must be onto something. Below are the micro and macrobenchmark results. The microbenchmark was generic_iterators-test's overlapping and non-overlapping MergeIterator tests with 10 iterations, 1000 lists, and 10000 rows per list, collecting the wallclock time to scan and averaging it across the runs. The macrobenchmark was running 'kudu perf tablet_scan' on a representative 40GB tablet six times, dropping the first time (to reduce cache effects), and averaging the remaining wallclock times. no whole-block-copy: - non-overlapping: 0.9437 - overlapping: 9.5113 - representative tablet: 786.732 whole-block-copy, no power-of-2 RowBlocks - non-overlapping: 0.6301 - overlapping: 9.7518 - representative tablet: 751.297 whole-block-copy, power-of-2 RowBlocks - non-overlapping: 0.5316 - overlapping: 9.5718 - representative tablet: 754.961 whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left: - non-overlapping: 0.5247 - overlapping: 9.1287 - representative tablet: 720.534 Todd and I discussed another possible optimization which I want to note here for posterity: rather than copying blocks, we could "attach" the data to the client's RowbBlock and serialize it to the wire directly. The attachment could be RowBlock-based by allowing RowBlocks to work like a refcounted "iovec", or it could be done more deeply to enable direct serialization of cached decoder data. Either way, anything that helps us avoid copying data is likely to be a performance win. Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d --- M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/rowblock.h M src/kudu/tserver/tablet_service.cc 4 files changed, 129 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/4 -- To view, visit http://gerrit.cloudera.org:8080/13011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d Gerrit-Change-Number: 13011 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>