Hello Quanlong Huang, Daniel Becker, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/18700
to look at the new patch set (#2).
Change subject: IMPALA-11414: Off-by-one error in Parquet late materialization
......................................................................
IMPALA-11414: Off-by-one error in Parquet late materialization
With PARQUET_LATE_MATERIALIZATION we can set the number of minimum
consecutive rows that if filtered out, we avoid materialization of rows
in other columns in parquet.
E.g. if PARQUET_LATE_MATERIALIZATION is 10, and in a filtered column we
find at least 10 consecutive rows that don't pass the predicates we
avoid materializing the corresponding rows in the other columns.
But due to an off-by-one error we actually only needed
(PARQUET_LATE_MATERIALIZATION - 1) consecutive elements. This means if
we set PARQUET_LATE_MATERIALIZATION to one, then we need zero
consecutive filtered out elements which leads to a crash/DCHECK. The bug
is in the GetMicroBatches() algorithm when we produce the micro batches
based on the selected rows.
Setting PARQUET_LATE_MATERIALIZATION to 0 doesn't make sense so it
shouldn't be allowed.
Testing
* e2e test with PARQUET_LATE_MATERIALIZATION=1
* e2e test for checking SET PARQUET_LATE_MATERIALIZATION=N
Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa
---
M be/src/exec/scratch-tuple-batch-test.cc
M be/src/exec/scratch-tuple-batch.h
M be/src/service/query-options.cc
A
testdata/workloads/functional-query/queries/QueryTest/parquet-late-materialization-unique-db.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
M tests/query_test/test_parquet_late_materialization.py
6 files changed, 46 insertions(+), 8 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/18700/2
--
To view, visit http://gerrit.cloudera.org:8080/18700
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38f95ad48c4ac8c1e06651565ab5c496283b29fa
Gerrit-Change-Number: 18700
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>