Lars Volker has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer ......................................................................
Patch Set 2: > (1 comment) Apologies for the delayed reply. Hive writes timestamps using 12 bytes using little endian. Then it passes them to parquet-mr as a BINARY string, which means it is hitting PARQUET-251. This explains why I saw the odd values for min/max in my tests. Internally parquet-mr orders BINARY values using byte comparison, potentially leading to a min/max value not being the semantically smallest/largest value of a set of values. I am inclined to call this a bug in hive, but I'm curious to hear what you think about this. -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: No
