Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18359

to look at the new patch set (#2).

Change subject: IMPALA-11204: template implementation for OrcStringColumnReader
......................................................................

IMPALA-11204: template implementation for OrcStringColumnReader

There are some checks in OrcStringColumnReader::ReadValue() that we can
determine outside the scope of this method. They should be optimized
since this is a critical method that will be executed for each row (and
for each string column). With these checks, the method is too complex to
be inlined by the compiler in OrcBatchedReader::ReadValueBatch().

This patch templates OrcStringColumnReader with two parameters, one for
whether the column is dictionary encoded, the other for the target slot
type (i.e. STRING/CHAR/VARCHAR). Compiler is able to inline
OrcStringColumnReader::ReadValue() after this patch.

The encoding of a column can change in different ORC stripes. So we have
to re-create the column readers for each stripe. Note that we already do
so for orc::RowReader. So this patch changes the life-cycle of
OrcColumnReaders to match the processing of each stripe. They are now
managed by std::unique_ptr. This requires OrcStructReader be defined
earlier than HdfsOrcScanner. So we include orc-column-readers.h in
hdfs-orc-scanner.h and move all code that depends on the scanner
implementation in orc-column-readers.h to the source file.

Ran a single node perf test on TPCH(30) on my dev box using 3 impalad
instances. There are some improvements and no significant regressions:
+----------+--------+-------------+------------+
| Query    | Avg(s) | Base Avg(s) | Delta(Avg) |
+----------+--------+-------------+------------+
| TPCH-Q19 | 5.42   | 5.78        | I -6.21%   |
| TPCH-Q4  | 3.43   | 3.69        | I -7.25%   |
| TPCH-Q6  | 2.25   | 2.45        | I -8.18%   |
| TPCH-Q12 | 3.95   | 4.54        | I -13.04%  |
+----------+--------+-------------+------------+
File Format: orc/snap/block

Tests:
 - Ran CORE tests.

Change-Id: I166b8ad3a959e97a3911da968b8e76bc337e5fa4
---
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
4 files changed, 240 insertions(+), 196 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/18359/2
--
To view, visit http://gerrit.cloudera.org:8080/18359
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I166b8ad3a959e97a3911da968b8e76bc337e5fa4
Gerrit-Change-Number: 18359
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>

Reply via email to