Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/15104 to look at the new patch set (#9). Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch ...................................................................... IMPALA-9228: ORC scanner reads rows into scratch batch Because of performance considerations this change enhances ORC scanner to populate a scratch batch on a column-by-column manner using data from the column readers. Once this is done the parquet code was reused to apply runtime filter and conjuncts and to populate the outgoing row batch. This approach reduces the number of virtual function calls and takes advantage of the columnar orientation of the data to enhance scan performance. Additionally, introducing the scratch batch concept also opens the door for codegen runtime filtering and applying conjuncts. Note, this change doesn't cover collection types just primitive types and struct. Collection types will follow the previous row-by-row approach. Testing: - Re-run the full test suite to verify that no regression is introduced. - Checked the performance impact by running TPCH workload on a scale 25 database using single_node_perf_run.py. The total query runtime is decreased by 0-20% depending on how scan heavy the particular query was. The more scan heavy the query is the more performance gain I observe. Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/CMakeLists.txt R be/src/exec/hdfs-columnar-scanner-ir.cc A be/src/exec/hdfs-columnar-scanner.cc A be/src/exec/hdfs-columnar-scanner.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/hdfs-orc-scanner.h M be/src/exec/hdfs-scanner.h M be/src/exec/orc-column-readers.cc M be/src/exec/orc-column-readers.h M be/src/exec/parquet/CMakeLists.txt M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h R be/src/exec/scratch-tuple-batch.h 15 files changed, 425 insertions(+), 144 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/9 -- To view, visit http://gerrit.cloudera.org:8080/15104 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca Gerrit-Change-Number: 15104 Gerrit-PatchSet: 9 Gerrit-Owner: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Gabor Kaszab <gaborkas...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>