Xuebin Su has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/23012 )

Change subject: IMPALA-9874: Skip IO for late materialized columns
......................................................................

IMPALA-9874: Skip IO for late materialized columns

This patch implements IO skipping at column chunk level for Parquet
tables. Specifically, for late materialized columns, `StartScan()` will
not be called until after evaluating the predicates, and will be skipped
if no row in the current row group is selected. As a result, IO bound
queries with low selectivity can run significantly faster.

Testing:
- Added e2e tests in test_parquet_late_materialization.py to make sure
  that TotalBytesRead is reduced with late materialization.

Change-Id: I4a052b028220517503e634e3f916d1fbd60eb65d
---
M be/src/exec/hdfs-columnar-scanner.cc
M be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/orc/hdfs-orc-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/exec/parquet/parquet-column-chunk-reader.h
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-column-readers.h
M be/src/exec/parquet/parquet-complex-column-reader.h
M be/src/exec/parquet/parquet-page-reader.cc
M be/src/exec/parquet/parquet-page-reader.h
M be/src/exec/scratch-tuple-batch.h
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization.test
M tests/query_test/test_parquet_late_materialization.py
14 files changed, 239 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/23012/5
--
To view, visit http://gerrit.cloudera.org:8080/23012
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a052b028220517503e634e3f916d1fbd60eb65d
Gerrit-Change-Number: 23012
Gerrit-PatchSet: 5
Gerrit-Owner: Xuebin Su <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>

Reply via email to