Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16771

to look at the new patch set (#4).

Change subject: IMPALA-10345: Impala hits DCHECK in 
parquet-column-stats.inline.h
......................................................................

IMPALA-10345: Impala hits DCHECK in parquet-column-stats.inline.h

During Parquet file writing, a DCHECK checks if row group stats have
copied the min/max string values into their internal buffers. This check
is at the finalization of each page. The copying of the string values
happened at the end of each row batch.

Thus, if a row batch spans over multiple pages then the min/max
string values don't get copied by the end of the page. Since the
memory is attached to the row batch this isn't really an error.

As a workaround this commit also copies the min/max string values
at the end of the page if they haven't been copied yet.

Testing
 * Added e2e test

Change-Id: I4289bd743e951cc4c607d5a5ea75d27825a1c12b
---
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
M tests/query_test/test_parquet_stats.py
3 files changed, 23 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/71/16771/4
--
To view, visit http://gerrit.cloudera.org:8080/16771
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4289bd743e951cc4c607d5a5ea75d27825a1c12b
Gerrit-Change-Number: 16771
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to