Hello Csaba Ringhofer, Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17071
to look at the new patch set (#2).
Change subject: IMPALA-10501: Hit DCHECK in parquet-column-readers.cc:
def_levels_.CacheRemaining() <= num_buffered_values_
......................................................................
IMPALA-10501: Hit DCHECK in parquet-column-readers.cc:
def_levels_.CacheRemaining() <= num_buffered_values_
We had a DCHECK in ScalarColumnReader::MaterializeValueBatch() that
checked that 'num_buffered_values_' is greater or equal to the
number of cached values in the Parquet definition level decoder.
In SkipTopLevelRows() we used decoder.ReadLevel() which loaded
the cache of the decoder with probably more values than the
actual value count. It is because literal runs are stored in groups
of 8, i.e. there might be padding zeros at the end.
Alternatively we can fill the cache of the decoder with
CacheNextBatch(num_vals). In this case we won't load more values
than the actual value count.
Testing
* until this patch TestParquetStats::test_page_index was flaky
because of this issue
* I tested the solution on a hacked Impala that randomly generated
skip ranges
Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f
---
M be/src/exec/parquet/parquet-column-readers.cc
M be/src/exec/parquet/parquet-level-decoder.h
2 files changed, 18 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/17071/2
--
To view, visit http://gerrit.cloudera.org:8080/17071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic071473e7b315300fd5e163225d3e39735f09c4f
Gerrit-Change-Number: 17071
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>