Lars Volker has posted comments on this change.

Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer
......................................................................


Patch Set 1:

> That is really unfortunate that our timestamps are treated as byte
 > arrays by parquet-mr - it makes the min/max stats mostly useless
 > for pruning files. I feel like this is a bug in parquet-mr, since
 > INT96 is in the spec 
 > (https://github.com/apache/parquet-format/blob/98c5e2b8575a809b09d996080428be730614d374/Encodings.md)
 > and it's being treated inconsistently with int32/int64. Common
 > sense would dictate that min/max of int96 should be treated the
 > same as int32/int64. Seems like something we should open an issue
 > against Parquet for? And Hive? Otherwise our timestamp stats will
 > be pretty useless. In any case we should clarify this before
 > writing out our own incompatible stats.

I agree, in fact this may actually be two separate bugs.

1) parquet-mr uses Binary internally to store INT96, and will use 
BinaryStatistics for those values 
(https://github.com/Parquet/parquet-mr/blob/fa8957d7939b59e8d391fa17000b34e865de015d/parquet-column/src/main/java/parquet/column/statistics/Statistics.java#L61).
2) Hive hands Timestamps over to parquet-mr as BINARY, too, instead of using 
INT96. Currently these won't make a difference, but once statistics support for 
INT96 will be fixed in parquet-mr, Hive would need to catch up.

@Zoltan, should I go ahead and open one issue with each of them to sort this 
out?

-- 
To view, visit http://gerrit.cloudera.org:8080/5611
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Zoltan Ivanfi <zi+ger...@cloudera.com>
Gerrit-HasComments: No

Reply via email to