[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Patch Set 6: Verified+1

Overriding Jenkins, unrelated test failure.


--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 17 Apr 2019 21:46:13 +
Gerrit-HasComments: No


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..

generic_iterators: MergeIterator whole block copy optimization

This patch adds the "whole block copy" optimization to the MergeIterator.
The idea is simple: whenever there's only one sub-iterator in the merge
window, we can copy the entire block out of that sub-iterator instead of
copying row-by-row.

The challenging aspect was properly handling the client's SelectionVector:
- When copying row-by-row, the MergeIterator must skip deselected rows in
  order to always return the next smallest row. The MergeIterState
  bookkeeping helped enforce this invariant.
- When copying block-by-block, skipping deselected rows is harder (and
  potentially less performant) than the simple bitmap copy we currently do.
  Plus it's not necessary for correctness; the scan endpoint will skip
  deselected rows when serializing the RowBlock.

So I opted to retain deselected rows in the block-by-block case, and updated
the MergeIterState bookkeeping to cope.

I also changed the default RowBlock sizes in various merge-related paths to
be a power of 2. This should increase the likelihood of hitting the
BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee
it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows,
then one sub-iterator has an additional 65 rows: the BitmapCopy operations
in the block copy could not be byte-aligned). It yielded a slight
improvement in microbenchmarks, though not in the macrobenchmark.

Finally, I tweaked the heuristic for whole-block-copying such that it only
happens when there's more than one row to copy. I don't totally buy the
rationale, but it did yield an improvement in both the micro and macro
benchmarks, so I must be onto something.

Below are the micro and macrobenchmark results. The microbenchmark was
generic_iterators-test's overlapping and non-overlapping MergeIterator tests
with 10 iterations, 1000 lists, and 1 rows per list, collecting the
wallclock time to scan and averaging it across the runs. The macrobenchmark
was running 'kudu perf tablet_scan' on a representative 40GB tablet six
times, dropping the first time (to reduce cache effects), and averaging the
remaining wallclock times.

no whole-block-copy:
- non-overlapping: 0.9437
- overlapping: 9.5113
- representative tablet: 786.732

whole-block-copy, no power-of-2 RowBlocks
- non-overlapping: 0.6301
- overlapping: 9.7518
- representative tablet: 751.297

whole-block-copy, power-of-2 RowBlocks
- non-overlapping: 0.5316
- overlapping: 9.5718
- representative tablet: 754.961

whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left:
- non-overlapping: 0.5247
- overlapping: 9.1287
- representative tablet: 720.534

Todd and I discussed another possible optimization which I want to note here
for posterity: rather than copying blocks, we could "attach" the data to the
client's RowbBlock and serialize it to the wire directly. The attachment
could be RowBlock-based by allowing RowBlocks to work like a refcounted
"iovec", or it could be done more deeply to enable direct serialization of
cached decoder data. Either way, anything that helps us avoid copying data
is likely to be a performance win.

Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Reviewed-on: http://gerrit.cloudera.org:8080/13011
Reviewed-by: Mike Percy 
Tested-by: Adar Dembo 
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/rowblock.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 129 insertions(+), 62 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Adar Dembo: Verified

--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@262
PS5, Line 262:   
read_block_->selection_vector()->FindFirstRowSelected(next_row_idx_, )) {
> No, this is a more efficient implementation of the existing code from L257-
Got it, thanks for clarifying.



--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 17 Apr 2019 21:31:04 +
Gerrit-HasComments: Yes


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Adar Dembo (Code Review)
Hello Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13011

to look at the new patch set (#6).

Change subject: generic_iterators: MergeIterator whole block copy optimization
..

generic_iterators: MergeIterator whole block copy optimization

This patch adds the "whole block copy" optimization to the MergeIterator.
The idea is simple: whenever there's only one sub-iterator in the merge
window, we can copy the entire block out of that sub-iterator instead of
copying row-by-row.

The challenging aspect was properly handling the client's SelectionVector:
- When copying row-by-row, the MergeIterator must skip deselected rows in
  order to always return the next smallest row. The MergeIterState
  bookkeeping helped enforce this invariant.
- When copying block-by-block, skipping deselected rows is harder (and
  potentially less performant) than the simple bitmap copy we currently do.
  Plus it's not necessary for correctness; the scan endpoint will skip
  deselected rows when serializing the RowBlock.

So I opted to retain deselected rows in the block-by-block case, and updated
the MergeIterState bookkeeping to cope.

I also changed the default RowBlock sizes in various merge-related paths to
be a power of 2. This should increase the likelihood of hitting the
BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee
it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows,
then one sub-iterator has an additional 65 rows: the BitmapCopy operations
in the block copy could not be byte-aligned). It yielded a slight
improvement in microbenchmarks, though not in the macrobenchmark.

Finally, I tweaked the heuristic for whole-block-copying such that it only
happens when there's more than one row to copy. I don't totally buy the
rationale, but it did yield an improvement in both the micro and macro
benchmarks, so I must be onto something.

Below are the micro and macrobenchmark results. The microbenchmark was
generic_iterators-test's overlapping and non-overlapping MergeIterator tests
with 10 iterations, 1000 lists, and 1 rows per list, collecting the
wallclock time to scan and averaging it across the runs. The macrobenchmark
was running 'kudu perf tablet_scan' on a representative 40GB tablet six
times, dropping the first time (to reduce cache effects), and averaging the
remaining wallclock times.

no whole-block-copy:
- non-overlapping: 0.9437
- overlapping: 9.5113
- representative tablet: 786.732

whole-block-copy, no power-of-2 RowBlocks
- non-overlapping: 0.6301
- overlapping: 9.7518
- representative tablet: 751.297

whole-block-copy, power-of-2 RowBlocks
- non-overlapping: 0.5316
- overlapping: 9.5718
- representative tablet: 754.961

whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left:
- non-overlapping: 0.5247
- overlapping: 9.1287
- representative tablet: 720.534

Todd and I discussed another possible optimization which I want to note here
for posterity: rather than copying blocks, we could "attach" the data to the
client's RowbBlock and serialize it to the wire directly. The attachment
could be RowBlock-based by allowing RowBlocks to work like a refcounted
"iovec", or it could be done more deeply to enable direct serialization of
cached decoder data. Either way, anything that helps us avoid copying data
is likely to be a performance win.

Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/rowblock.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 129 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/6
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@262
PS5, Line 262:   
read_block_->selection_vector()->FindFirstRowSelected(next_row_idx_, )) {
> Is this the additional bookkeeping you are referring to in the commit messa
No, this is a more efficient implementation of the existing code from L257-264.

The bookkeeping change I alluded to was the removal of rows_advanced_ and 
rows_valid_, because both operated in terms of selected rows only, but now we 
don't care as much about distinguishing between selected and deselected rows, 
so I removed them in favor of using next_row_idx_ for "where are we?" and 
read_block_->nrows() for "how many rows exist?".


http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@792
PS5, Line 792: blo
> block
Done


http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@897
PS5, Line 897:
> nit: /*num_rows_to_advance=*/
Done



--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 17 Apr 2019 21:10:34 +
Gerrit-HasComments: Yes


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-17 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Patch Set 5: Code-Review+1

(3 comments)

Nice work

http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@262
PS5, Line 262:   
read_block_->selection_vector()->FindFirstRowSelected(next_row_idx_, )) {
Is this the additional bookkeeping you are referring to in the commit message?


http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@792
PS5, Line 792: row
block


http://gerrit.cloudera.org:8080/#/c/13011/5/src/kudu/common/generic_iterators.cc@897
PS5, Line 897:
nit: /*num_rows_to_advance=*/



--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 5
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 17 Apr 2019 20:45:42 +
Gerrit-HasComments: Yes


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Patch Set 4: Verified+1

Overriding Jenkins, unrelated (probably flaky?) test failure.


--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 15 Apr 2019 23:21:33 +
Gerrit-HasComments: No


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Hello Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13011

to look at the new patch set (#4).

Change subject: generic_iterators: MergeIterator whole block copy optimization
..

generic_iterators: MergeIterator whole block copy optimization

This patch adds the "whole block copy" optimization to the MergeIterator.
The idea is simple: whenever there's only one sub-iterator in the merge
window, we can copy the entire block out of that sub-iterator instead of
copying row-by-row.

The challenging aspect was properly handling the client's SelectionVector:
- When copying row-by-row, the MergeIterator must skip deselected rows in
  order to always return the next smallest row. The MergeIterState
  bookkeeping helped enforce this invariant.
- When copying block-by-block, skipping deselected rows is harder (and
  potentially less performant) than the simple bitmap copy we currently do.
  Plus it's not necessary for correctness; the scan endpoint will skip
  deselected rows when serializing the RowBlock.

So I opted to retain deselected rows in the block-by-block case, and updated
the MergeIterState bookkeeping to cope.

I also changed the default RowBlock sizes in various merge-related paths to
be a power of 2. This should increase the likelihood of hitting the
BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee
it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows,
then one sub-iterator has an additional 65 rows: the BitmapCopy operations
in the block copy could not be byte-aligned). It yielded a slight
improvement in microbenchmarks, though not in the macrobenchmark.

Finally, I tweaked the heuristic for whole-block-copying such that it only
happens when there's more than one row to copy. I don't totally buy the
rationale, but it did yield an improvement in both the micro and macro
benchmarks, so I must be onto something.

Below are the micro and macrobenchmark results. The microbenchmark was
generic_iterators-test's overlapping and non-overlapping MergeIterator tests
with 10 iterations, 1000 lists, and 1 rows per list, collecting the
wallclock time to scan and averaging it across the runs. The macrobenchmark
was running 'kudu perf tablet_scan' on a representative 40GB tablet six
times, dropping the first time (to reduce cache effects), and averaging the
remaining wallclock times.

no whole-block-copy:
- non-overlapping: 0.9437
- overlapping: 9.5113
- representative tablet: 786.732

whole-block-copy, no power-of-2 RowBlocks
- non-overlapping: 0.6301
- overlapping: 9.7518
- representative tablet: 751.297

whole-block-copy, power-of-2 RowBlocks
- non-overlapping: 0.5316
- overlapping: 9.5718
- representative tablet: 754.961

whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left:
- non-overlapping: 0.5247
- overlapping: 9.1287
- representative tablet: 720.534

Todd and I discussed another possible optimization which I want to note here
for posterity: rather than copying blocks, we could "attach" the data to the
client's RowbBlock and serialize it to the wire directly. The attachment
could be RowBlock-based by allowing RowBlocks to work like a refcounted
"iovec", or it could be done more deeply to enable direct serialization of
cached decoder data. Either way, anything that helps us avoid copying data
is likely to be a performance win.

Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/rowblock.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 129 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/4
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-12 Thread Adar Dembo (Code Review)
Hello Mike Percy, Todd Lipcon,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/13011

to review the following change.


Change subject: generic_iterators: MergeIterator whole block copy optimization
..

generic_iterators: MergeIterator whole block copy optimization

This patch adds the "whole block copy" optimization to the MergeIterator.
The idea is simple: whenever there's only one sub-iterator in the merge
window, we can copy the entire block out of that sub-iterator instead of
copying row-by-row.

The challenging aspect was properly handling the client's SelectionVector:
- When copying row-by-row, the MergeIterator must skip deselected rows in
  order to always return the next smallest row. The MergeIterState
  bookkeeping helped enforce this invariant.
- When copying block-by-block, skipping deselected rows is harder (and
  potentially less performant) than the simple bitmap copy we currently do.
  Plus it's not necessary for correctness; the scan endpoint will skip
  deselected rows when serializing the RowBlock.

So I opted to retain deselected rows in the block-by-block case, and updated
the MergeIterState bookkeeping to cope.

I also changed the default RowBlock sizes in various merge-related paths to
be a power of 2. This should increase the likelihood of hitting the
BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee
it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows,
then one sub-iterator has an additional 65 rows: the BitmapCopy operations
in the block copy could not be byte-aligned). It yielded a slight
improvement in microbenchmarks, though not in the macrobenchmark.

Finally, I tweaked the heuristic for whole-block-copying such that it only
happens when there's more than one row to copy. I don't totally buy the
rationale, but it did yield an improvement in both the micro and macro
benchmarks, so I must be onto something.

Below are the micro and macrobenchmark results. The microbenchmark was
generic_iterators-test's overlapping and non-overlapping MergeIterator tests
with 10 iterations, 1000 lists, and 1 rows per list, collecting the
wallclock time to scan and averaging it across the runs. The macrobenchmark
was running 'kudu perf tablet_scan' on a representative 40GB tablet six
times, dropping the first time (to reduce cache effects), and averaging the
remaining wallclock times.

no whole-block-copy:
- non-overlapping: 0.9437
- overlapping: 9.5113
- representative tablet: 786.732

whole-block-copy, no power-of-2 RowBlocks
- non-overlapping: 0.6301
- overlapping: 9.7518
- representative tablet: 751.297

whole-block-copy, power-of-2 RowBlocks
- non-overlapping: 0.5316
- overlapping: 9.5718
- representative tablet: 754.961

whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left:
- non-overlapping: 0.5247
- overlapping: 9.1287
- representative tablet: 720.534

Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/rowblock.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 126 insertions(+), 62 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/1
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon