Hello Attila Jeges, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11431

to look at the new patch set (#4).

Change subject: IMPALA-7559: Disable stat filtering for UTC-normalized 
timestamp columns
......................................................................

IMPALA-7559: Disable stat filtering for UTC-normalized timestamp columns

If convert_legacy_hive_parquet_utc_timestamps=true and the Parquet
file is by parquet-mr (also used by Hive), then timestamps are
converted from UTC to local time during scanning. Stat filtering
did not handle this case correctly and compared UTC min/max values
from stats with local min/max values from predicates. This could
lead to skipping row groups incorrectly.

Note that parquet-mr only writes stats if min and max are equal,
because it cannot order timestamps correctly, so the only case
affected here is when every value is the same in the column chunk.

It would be possible to implement stat filtering correctly, but
this is non-trivial because of DST and historical timezone rule
changes.

Testing:
- added a Hive generated parquet file + custom cluster test
  that could reproduce this issue

Change-Id: Id4c02230993f2390c03d513f08bae2e9d3d538fa
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
M be/src/exec/parquet-column-readers.cc
M testdata/data/README
A testdata/data/hive_single_value_timestamp.parq
M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py
6 files changed, 74 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/11431/4
--
To view, visit http://gerrit.cloudera.org:8080/11431
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id4c02230993f2390c03d513f08bae2e9d3d538fa
Gerrit-Change-Number: 11431
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to