Amogh Margoor has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17860 )
Change subject: [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. ...................................................................... [WIP] IMPALA-9873: Avoid materilization of columns for filtered out rows in Parquet table. Currently, entire row is materialized, before filtering upon it during scan. Instead, cost can be saved if only the columns required for filtering are materialized first and then rest of the columns are materialized only for rows surviving after filter. Performance: Peformance measured for single daemon, single threaded impalad upon TPCH scale 42 lineitem table with 252 million rows, unsorted data. Upto 2.5x improvement for non-page indexed and upto 4x improvement in page index seen. Queries for page index borrowed from blog: https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/ More details: https://docs.google.com/spreadsheets/d/17s5OLaFOPo-64kimAPP6n3kJA42vM-iVT24OvsQgfuA/edit?usp=sharing Testing: TBD Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 --- M be/src/codegen/gen_ir_descriptions.py M be/src/exec/hdfs-columnar-scanner-ir.cc M be/src/exec/hdfs-columnar-scanner.cc M be/src/exec/hdfs-columnar-scanner.h M be/src/exec/hdfs-orc-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.cc M be/src/exec/parquet/hdfs-parquet-scanner.h M be/src/exec/parquet/parquet-collection-column-reader.cc M be/src/exec/parquet/parquet-collection-column-reader.h M be/src/exec/parquet/parquet-column-chunk-reader.cc M be/src/exec/parquet/parquet-column-chunk-reader.h M be/src/exec/parquet/parquet-column-readers.cc M be/src/exec/parquet/parquet-column-readers.h M be/src/exec/scratch-tuple-batch.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift 18 files changed, 774 insertions(+), 121 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/17860/4 -- To view, visit http://gerrit.cloudera.org:8080/17860 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60 Gerrit-Change-Number: 17860 Gerrit-PatchSet: 4 Gerrit-Owner: Amogh Margoor <[email protected]> Gerrit-Reviewer: Amogh Margoor <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
