Tim Armstrong has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer ......................................................................
Patch Set 2: That is really unfortunate that our timestamps are treated as byte arrays by parquet-mr - it makes the min/max stats mostly useless for pruning files. I feel like this is a bug in parquet-mr, since INT96 is in the spec (https://github.com/apache/parquet-format/blob/98c5e2b8575a809b09d996080428be730614d374/Encodings.md) and it's being treated inconsistently with int32/int64. Common sense would dictate that min/max of int96 should be treated the same as int32/int64. Seems like something we should open an issue against Parquet for? And Hive? Otherwise our timestamp stats will be pretty useless. In any case we should clarify this before writing out our own incompatible stats. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: No
