[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

2018-11-08 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11856 )

Change subject: (01/05) delta_store: avoid copying deleted row data in 
ApplyUpdates
..

(01/05) delta_store: avoid copying deleted row data in ApplyUpdates

When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.

To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.

I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).

  Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s 
sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.515suser 0.516s 
sys 0.000s
  Time spent running 1000 scans with 100 deletes: real 0.512s   user 0.512s 
sys 0.000s
  Time spent running 1000 scans with 1000 deletes: real 0.553s  user 0.552s 
sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test 
--gtest_filter=*Varying* --benchmark_num_passes=1000':

   2201.029290  task-clock (msec) #0.991 CPUs utilized
 5  context-switches  #0.002 K/sec
 0  cpu-migrations#0.000 K/sec
 4,950  page-faults   #0.002 M/sec
 8,276,723,832  cycles#3.760 GHz
24,539,935,941  instructions  #2.96  insn per cycle
 4,709,709,705  branches  # 2139.776 M/sec
12,631,579  branch-misses #0.27% of all branches

   2.220370506 seconds time elapsed

  Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s 
sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.475suser 0.472s 
sys 0.004s
  Time spent running 1000 scans with 100 deletes: real 0.478s   user 0.476s 
sys 0.004s
  Time spent running 1000 scans with 1000 deletes: real 0.550s  user 0.552s 
sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test 
--gtest_filter=*Varying* --benchmark_num_passes=1000':

   2074.795741  task-clock (msec) #0.990 CPUs utilized
23  context-switches  #0.011 K/sec
 1  cpu-migrations#0.000 K/sec
 4,951  page-faults   #0.002 M/sec
 7,675,100,058  cycles#3.699 GHz
23,100,692,252  instructions  #3.01  insn per cycle
 4,539,777,117  branches  # 2188.060 M/sec
11,819,267  branch-misses #0.26% of all branches

   2.096193851 seconds time elapsed

Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.

Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Reviewed-on: http://gerrit.cloudera.org:8080/11856
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke 
Reviewed-by: Mike Percy 
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Grant Henke: Looks good to me, but someone else must approve
  Mike Percy: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

2018-11-08 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11856 )

Change subject: (01/05) delta_store: avoid copying deleted row data in 
ApplyUpdates
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Nov 2018 19:52:00 +
Gerrit-HasComments: No


[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

2018-11-06 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11856 )

Change subject: (01/05) delta_store: avoid copying deleted row data in 
ApplyUpdates
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 06 Nov 2018 17:48:35 +
Gerrit-HasComments: No


[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

2018-11-01 Thread Adar Dembo (Code Review)
Hello Mike Percy, David Ribeiro Alves, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11856

to look at the new patch set (#2).

Change subject: (01/05) delta_store: avoid copying deleted row data in 
ApplyUpdates
..

(01/05) delta_store: avoid copying deleted row data in ApplyUpdates

When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.

To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.

I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).

  Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s 
sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.515suser 0.516s 
sys 0.000s
  Time spent running 1000 scans with 100 deletes: real 0.512s   user 0.512s 
sys 0.000s
  Time spent running 1000 scans with 1000 deletes: real 0.553s  user 0.552s 
sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test 
--gtest_filter=*Varying* --benchmark_num_passes=1000':

   2201.029290  task-clock (msec) #0.991 CPUs utilized
 5  context-switches  #0.002 K/sec
 0  cpu-migrations#0.000 K/sec
 4,950  page-faults   #0.002 M/sec
 8,276,723,832  cycles#3.760 GHz
24,539,935,941  instructions  #2.96  insn per cycle
 4,709,709,705  branches  # 2139.776 M/sec
12,631,579  branch-misses #0.27% of all branches

   2.220370506 seconds time elapsed

  Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s 
sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.475suser 0.472s 
sys 0.004s
  Time spent running 1000 scans with 100 deletes: real 0.478s   user 0.476s 
sys 0.004s
  Time spent running 1000 scans with 1000 deletes: real 0.550s  user 0.552s 
sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test 
--gtest_filter=*Varying* --benchmark_num_passes=1000':

   2074.795741  task-clock (msec) #0.990 CPUs utilized
23  context-switches  #0.011 K/sec
 1  cpu-migrations#0.000 K/sec
 4,951  page-faults   #0.002 M/sec
 7,675,100,058  cycles#3.699 GHz
23,100,692,252  instructions  #3.01  insn per cycle
 4,539,777,117  branches  # 2188.060 M/sec
11,819,267  branch-misses #0.26% of all branches

   2.096193851 seconds time elapsed

Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.

Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/11856/2
--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon