Lars Volker has uploaded a new change for review.
http://gerrit.cloudera.org:8080/7354
Change subject: DRAFT IMPALA-5185: Skip pages based on Parquet::Statistics
......................................................................
DRAFT IMPALA-5185: Skip pages based on Parquet::Statistics
Already done:
- Refactor row group skipping into context creation and processing
- Split root level readers into constrained and non-constrained
- Basic row skipping logic in scanner
- Switch to absolute row numbers in column readers
- Have a NextValueToRead() logic in column readers
- Skipping rows in CollectionColumnReaders
- Skipping pages in BoolColumnReader
What's still missing:
- Cleaning up the SkipValue/SkipValueBatch methods
- Add skipping support to the Parquet deserializer
Change-Id: I8eec838c5baf22167049f570dd0ef9762c5ae0a6
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
M be/src/exec/parquet-column-stats.cc
M be/src/exec/parquet-column-stats.h
M be/src/util/parquet-reader.cc
A gen_data.py
M tests/query_test/test_parquet_stats.py
9 files changed, 860 insertions(+), 90 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/54/7354/3
--
To view, visit http://gerrit.cloudera.org:8080/7354
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8eec838c5baf22167049f570dd0ef9762c5ae0a6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Pooja Nilangekar <[email protected]>