[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11856 ) Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates .. (01/05) delta_store: avoid copying deleted row data in ApplyUpdates When applying deltas, the scan path will first populate a selection vector with deletes (i.e. a row is unset if there's a relevant DELETE for it), and then use it to enable two optimizations: 1. Short-circuit out if all rows in the block have been deleted. 2. Skip predicate evaluation and base data copying for deleted rows. To this we can add a third optimization: don't apply a delta whose mutation is to a deleted row. Note that it's not incorrect to apply such deltas (as we'll skip deleted rows when serializing the scan response to the client); it's just unnecessary work. I wrote a microbenchmark to evaluate the impact of this optimization. Below is the timing and perf stat output of the microbenchmark before and after applying the optimization (specifically, applying this patch, then commenting out the filtering in DeltaPrepare::ApplyUpdates for the first run, and uncommenting it for the next run). Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s sys 0.000s Time spent running 1000 scans with 10 deletes: real 0.515suser 0.516s sys 0.000s Time spent running 1000 scans with 100 deletes: real 0.512s user 0.512s sys 0.000s Time spent running 1000 scans with 1000 deletes: real 0.553s user 0.552s sys 0.000s Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000': 2201.029290 task-clock (msec) #0.991 CPUs utilized 5 context-switches #0.002 K/sec 0 cpu-migrations#0.000 K/sec 4,950 page-faults #0.002 M/sec 8,276,723,832 cycles#3.760 GHz 24,539,935,941 instructions #2.96 insn per cycle 4,709,709,705 branches # 2139.776 M/sec 12,631,579 branch-misses #0.27% of all branches 2.220370506 seconds time elapsed Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s sys 0.000s Time spent running 1000 scans with 10 deletes: real 0.475suser 0.472s sys 0.004s Time spent running 1000 scans with 100 deletes: real 0.478s user 0.476s sys 0.004s Time spent running 1000 scans with 1000 deletes: real 0.550s user 0.552s sys 0.000s Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000': 2074.795741 task-clock (msec) #0.990 CPUs utilized 23 context-switches #0.011 K/sec 1 cpu-migrations#0.000 K/sec 4,951 page-faults #0.002 M/sec 7,675,100,058 cycles#3.699 GHz 23,100,692,252 instructions #3.01 insn per cycle 4,539,777,117 branches # 2188.060 M/sec 11,819,267 branch-misses #0.26% of all branches 2.096193851 seconds time elapsed Note: I originally wrote this patch thinking it was necessary for diff scan correctness, but have since convinced myself that it's just an optimization. Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144 Reviewed-on: http://gerrit.cloudera.org:8080/11856 Tested-by: Kudu Jenkins Reviewed-by: Grant Henke Reviewed-by: Mike Percy --- M src/kudu/tablet/delta_applier.cc M src/kudu/tablet/delta_applier.h M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/deltamemstore.cc M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/tablet-test-util.h 13 files changed, 116 insertions(+), 46 deletions(-) Approvals: Kudu Jenkins: Verified Grant Henke: Looks good to me, but someone else must approve Mike Percy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/11856 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144 Gerrit-Change-Number: 11856 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11856 ) Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/11856 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144 Gerrit-Change-Number: 11856 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 08 Nov 2018 19:52:00 + Gerrit-HasComments: No
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11856 ) Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/11856 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144 Gerrit-Change-Number: 11856 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 06 Nov 2018 17:48:35 + Gerrit-HasComments: No
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Hello Mike Percy, David Ribeiro Alves, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/11856 to look at the new patch set (#2). Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates .. (01/05) delta_store: avoid copying deleted row data in ApplyUpdates When applying deltas, the scan path will first populate a selection vector with deletes (i.e. a row is unset if there's a relevant DELETE for it), and then use it to enable two optimizations: 1. Short-circuit out if all rows in the block have been deleted. 2. Skip predicate evaluation and base data copying for deleted rows. To this we can add a third optimization: don't apply a delta whose mutation is to a deleted row. Note that it's not incorrect to apply such deltas (as we'll skip deleted rows when serializing the scan response to the client); it's just unnecessary work. I wrote a microbenchmark to evaluate the impact of this optimization. Below is the timing and perf stat output of the microbenchmark before and after applying the optimization (specifically, applying this patch, then commenting out the filtering in DeltaPrepare::ApplyUpdates for the first run, and uncommenting it for the next run). Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s sys 0.000s Time spent running 1000 scans with 10 deletes: real 0.515suser 0.516s sys 0.000s Time spent running 1000 scans with 100 deletes: real 0.512s user 0.512s sys 0.000s Time spent running 1000 scans with 1000 deletes: real 0.553s user 0.552s sys 0.000s Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000': 2201.029290 task-clock (msec) #0.991 CPUs utilized 5 context-switches #0.002 K/sec 0 cpu-migrations#0.000 K/sec 4,950 page-faults #0.002 M/sec 8,276,723,832 cycles#3.760 GHz 24,539,935,941 instructions #2.96 insn per cycle 4,709,709,705 branches # 2139.776 M/sec 12,631,579 branch-misses #0.27% of all branches 2.220370506 seconds time elapsed Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s sys 0.000s Time spent running 1000 scans with 10 deletes: real 0.475suser 0.472s sys 0.004s Time spent running 1000 scans with 100 deletes: real 0.478s user 0.476s sys 0.004s Time spent running 1000 scans with 1000 deletes: real 0.550s user 0.552s sys 0.000s Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000': 2074.795741 task-clock (msec) #0.990 CPUs utilized 23 context-switches #0.011 K/sec 1 cpu-migrations#0.000 K/sec 4,951 page-faults #0.002 M/sec 7,675,100,058 cycles#3.699 GHz 23,100,692,252 instructions #3.01 insn per cycle 4,539,777,117 branches # 2188.060 M/sec 11,819,267 branch-misses #0.26% of all branches 2.096193851 seconds time elapsed Note: I originally wrote this patch thinking it was necessary for diff scan correctness, but have since convinced myself that it's just an optimization. Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144 --- M src/kudu/tablet/delta_applier.cc M src/kudu/tablet/delta_applier.h M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.cc M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile-test.cc M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore-test.cc M src/kudu/tablet/deltamemstore.cc M src/kudu/tablet/deltamemstore.h M src/kudu/tablet/tablet-test-util.h 13 files changed, 116 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/11856/2 -- To view, visit http://gerrit.cloudera.org:8080/11856 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144 Gerrit-Change-Number: 11856 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon