Hello Attila Jeges, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11431 to look at the new patch set (#4). Change subject: IMPALA-7559: Disable stat filtering for UTC-normalized timestamp columns ...................................................................... IMPALA-7559: Disable stat filtering for UTC-normalized timestamp columns If convert_legacy_hive_parquet_utc_timestamps=true and the Parquet file is by parquet-mr (also used by Hive), then timestamps are converted from UTC to local time during scanning. Stat filtering did not handle this case correctly and compared UTC min/max values from stats with local min/max values from predicates. This could lead to skipping row groups incorrectly. Note that parquet-mr only writes stats if min and max are equal, because it cannot order timestamps correctly, so the only case affected here is when every value is the same in the column chunk. It would be possible to implement stat filtering correctly, but this is non-trivial because of DST and historical timezone rule changes. Testing: - added a Hive generated parquet file + custom cluster test that could reproduce this issue Change-Id: Id4c02230993f2390c03d513f08bae2e9d3d538fa --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M be/src/exec/parquet-column-readers.cc M testdata/data/README A testdata/data/hive_single_value_timestamp.parq M tests/custom_cluster/test_hive_parquet_timestamp_conversion.py 6 files changed, 74 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/31/11431/4 -- To view, visit http://gerrit.cloudera.org:8080/11431 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id4c02230993f2390c03d513f08bae2e9d3d538fa Gerrit-Change-Number: 11431 Gerrit-PatchSet: 4 Gerrit-Owner: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Attila Jeges <atti...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>