Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-5347: Parquet scanner microoptimizations
......................................................................

IMPALA-5347: Parquet scanner microoptimizations

A mix of microoptimizations that profiling the parquet scanner revealed.
All lead to some measurable improvement and added up to significant
speedups for various scans.

* Add ALWAYS_INLINE to hot functions that GCC was mistakenly not inlining
  in all cases.
* Apply __restrict__ in a few places so the compiler knows that it is
  safe to cache values accessed via those pointers
* memset() the whole batch instead of the null indicators is cases where
  it is almost certainly cheaper.
* Avoid updating two correlated loop variables in MaterializeValueBatch().
* Avoid unnecessary initialization of often-unused 'val' in ReadSlot().
* Shave a few instructions off the (still very expensive) bit unpacking
  and dict decoding logic.

Performance:

Some local TPC-H and targeted-perf benchmarks showed average speedups of
~5%.

I did some benchmarks targeted at measuring column materialisation
performance using a version of lineitem with duplicated data to make
it bigger. These queries all got significantly faster.

Dict-encoded DECIMAL: 2.23 -> 1.23s

  SELECT count(*) FROM biglineitem WHERE l_quantity > 49

Plain-encoded BIGINT: 2.33s -> 1.62s

  SELECT count(*) FROM biglineitem WHERE l_orderkey != 10

Dict-encoded STRING: 2.73s -> 1.72s

  SELECT count(*) FROM biglineitem WHERE l_returnflag = 'A'

Multiple columns: 5.15s -> 3.74s

  SELECT count(*) FROM biglineitem
  WHERE l_quantity > 49 and l_partkey != 199 and l_tax < 100

Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/parquet-column-readers.cc
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/dict-encoding.h
M be/src/util/rle-encoding.h
6 files changed, 56 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/6950/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6950
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49ec523a65542fdbabd53fbcc4a8901d769e5cd5
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: anujphadke <apha...@cloudera.com>

Reply via email to