Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11395 )

Change subject: KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator
......................................................................

KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator

This patch rewrites much of DeltaFileIterator to leverage DeltaPreparer.
Besides sharing code, the motivation is to take advantage of the performance
improvement inherent to DeltaPreparer: decoding a batch of deltas just once
in PrepareBatch() as opposed to on each call to ApplyUpdates().

Seeing as DeltaPreparer was originally built for DMSIterator, here are the
various augmentations that were necessary to support DeltaFileIterator:
- REINSERT support, which meant more complicated deletion state tracking.
- FilterColumnIdsAndCollectDeltas support, cribbed from DeltaFileIterator.
- A templatized traits system to control which features were enabled. This
  also meant templatizing both DeltaPreparer and DeltaFileIterator.
- Early out from the "apply all deltas for a row" loop when the timestamp
  is no longer relevant. There's an opportunity to seek here and skip any
  remaining deltas belonging to the row, but testing with a new DMS iterator
  microbenchmark showed that this is only an improvement when the number of
  deltas skipped is sufficiently high.

The improvement should be most noticeable on tables with wide schemas where
multiple columns are projected. In these situations, the column-by-column
ApplyUpdates() approach incurred a lot of unnecessary delta decoding. To
prove it, I included a new deltafile iterator microbenchmark that scans a
subset of a wide schema's columns as a DeltaApplier would.

Before:

  Performance counter stats for 'bin/deltafile-test --gtest_filter=*Benchmark*' 
(5 runs):

      11358.256100      task-clock (msec)         #    0.998 CPUs utilized      
      ( +-  3.39% )
               140      context-switches          #    0.012 K/sec              
      ( +- 27.37% )
                 6      cpu-migrations            #    0.001 K/sec              
      ( +- 52.36% )
            34,231      page-faults               #    0.003 M/sec              
      ( +-  0.42% )
    42,288,292,153      cycles                    #    3.723 GHz                
      ( +-  4.12% )
   100,853,942,114      instructions              #    2.38  insn per cycle     
      ( +-  5.35% )
    19,689,789,259      branches                  # 1733.522 M/sec              
      ( +-  5.49% )
        69,419,478      branch-misses             #    0.35% of all branches    
      ( +-  5.14% )

      11.378958537 seconds time elapsed                                         
 ( +-  3.38% )

After:

  Performance counter stats for 'bin/deltafile-test --gtest_filter=*Benchmark*' 
(5 runs):

       4089.419224      task-clock (msec)         #    0.995 CPUs utilized      
      ( +-  1.25% )
                43      context-switches          #    0.011 K/sec              
      ( +-  4.10% )
                 2      cpu-migrations            #    0.000 K/sec              
      ( +- 32.39% )
            34,948      page-faults               #    0.009 M/sec              
      ( +-  0.22% )
    15,269,907,971      cycles                    #    3.734 GHz                
      ( +-  1.30% )
    32,409,048,370      instructions              #    2.12  insn per cycle     
      ( +-  1.85% )
     5,848,268,795      branches                  # 1430.098 M/sec              
      ( +-  1.85% )
        32,900,262      branch-misses             #    0.56% of all branches    
      ( +-  2.80% )

       4.111096224 seconds time elapsed                                         
 ( +-  1.18% )

It's interesting to see the number of page faults increase while everything
else has gone down, but that makes sense: the new implementation allocates
memory in PrepareBatch() in order to optimize the structure of the deltas.

To be extra safe, I also looped fuzz-itest 1000 times in slow mode. All of
the runs passed.

Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5
Reviewed-on: http://gerrit.cloudera.org:8080/11395
Tested-by: Adar Dembo <a...@cloudera.com>
Reviewed-by: Adar Dembo <a...@cloudera.com>
---
M src/kudu/tablet/delta_key.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/tablet-test-util.h
12 files changed, 525 insertions(+), 403 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11395
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5
Gerrit-Change-Number: 11395
Gerrit-PatchSet: 13
Gerrit-Owner: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to