Hello Zoltan Borok-Nagy, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15350
to look at the new patch set (#2).
Change subject: IMPALA-6506: Codegen in ORC scanner for primitives and struct
......................................................................
IMPALA-6506: Codegen in ORC scanner for primitives and struct
IMPALA-9228 introduced scratch batch handling for struct and
primitive types in the ORC scanner and the existing scratch batch
logic already supports Codegen for ProcessScratchBatch() function.
This change turns on this Codegen logic for primitives types and
structs in the ORC scanner.
Note, if the query involves collection types then
ProcessScratchBatch() is still codegend but the codegend function
isn't used as the regular row-by-row approach is followed in this
case without using a scratch batch.
Testing:
- Re-run the whole test suite to check for regressions.
- Checked the performance on a scale 25 TPCH workload in ORC format
using single_node_perf_run.py. Comparing the query runtimes it
seems that codegen brings a 1-21% improvement for most of the
queries. There is a slight decrease in 3 queries that are not
scan-heavy where codegen doesn't provide any help for scanning.
However, these are short queries where the size of the
degradation is in subseconds so I'd say the decrease is
negligible.
- Did a manual check for a table that contains both Parquet and ORC
partitions. Verified that in this case ProcessScratchBatch() is
codegend for both formats and the query results are as expected.
Change-Id: I2352d0c8fc75ff722e931bc8c866b3e43d3636f4
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
6 files changed, 58 insertions(+), 47 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/15350/2
--
To view, visit http://gerrit.cloudera.org:8080/15350
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2352d0c8fc75ff722e931bc8c866b3e43d3636f4
Gerrit-Change-Number: 15350
Gerrit-PatchSet: 2
Gerrit-Owner: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>