Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 2:

That is really unfortunate that our timestamps are treated as byte arrays by 
parquet-mr - it makes the min/max stats mostly useless for pruning files. I 
feel like this is a bug in parquet-mr, since INT96 is in the spec 
(https://github.com/apache/parquet-format/blob/98c5e2b8575a809b09d996080428be730614d374/Encodings.md)
 and it's being treated inconsistently with int32/int64. Common sense would 
dictate that min/max of int96 should be treated the same as int32/int64. Seems 
like something we should open an issue against Parquet for? And Hive? Otherwise 
our timestamp stats will be pretty useless. In any case we should clarify this 
before writing out our own incompatible stats.

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Ivanfi <[email protected]>
Gerrit-HasComments: No

Reply via email to