Hello Mike Percy, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/13011

to review the following change.


Change subject: generic_iterators: MergeIterator whole block copy optimization
......................................................................

generic_iterators: MergeIterator whole block copy optimization

This patch adds the "whole block copy" optimization to the MergeIterator.
The idea is simple: whenever there's only one sub-iterator in the merge
window, we can copy the entire block out of that sub-iterator instead of
copying row-by-row.

The challenging aspect was properly handling the client's SelectionVector:
- When copying row-by-row, the MergeIterator must skip deselected rows in
  order to always return the next smallest row. The MergeIterState
  bookkeeping helped enforce this invariant.
- When copying block-by-block, skipping deselected rows is harder (and
  potentially less performant) than the simple bitmap copy we currently do.
  Plus it's not necessary for correctness; the scan endpoint will skip
  deselected rows when serializing the RowBlock.

So I opted to retain deselected rows in the block-by-block case, and updated
the MergeIterState bookkeeping to cope.

I also changed the default RowBlock sizes in various merge-related paths to
be a power of 2. This should increase the likelihood of hitting the
BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee
it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows,
then one sub-iterator has an additional 65 rows: the BitmapCopy operations
in the block copy could not be byte-aligned). It yielded a slight
improvement in microbenchmarks, though not in the macrobenchmark.

Finally, I tweaked the heuristic for whole-block-copying such that it only
happens when there's more than one row to copy. I don't totally buy the
rationale, but it did yield an improvement in both the micro and macro
benchmarks, so I must be onto something.

Below are the micro and macrobenchmark results. The microbenchmark was
generic_iterators-test's overlapping and non-overlapping MergeIterator tests
with 10 iterations, 1000 lists, and 10000 rows per list, collecting the
wallclock time to scan and averaging it across the runs. The macrobenchmark
was running 'kudu perf tablet_scan' on a representative 40GB tablet six
times, dropping the first time (to reduce cache effects), and averaging the
remaining wallclock times.

no whole-block-copy:
- non-overlapping: 0.9437
- overlapping: 9.5113
- representative tablet: 786.732

whole-block-copy, no power-of-2 RowBlocks
- non-overlapping: 0.6301
- overlapping: 9.7518
- representative tablet: 751.297

whole-block-copy, power-of-2 RowBlocks
- non-overlapping: 0.5316
- overlapping: 9.5718
- representative tablet: 754.961

whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left:
- non-overlapping: 0.5247
- overlapping: 9.1287
- representative tablet: 720.534

Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/rowblock.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 126 insertions(+), 62 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/1
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to