Tim Armstrong has posted comments on this change. Change subject: IMPALA-3909: Populate min/max statistics in Parquet writer ......................................................................
Patch Set 2: (1 comment) It seems like we really need to talk through the big picture of what to do here, since there are so many unresolved problems in the parquet spec. It seems like min/max for int, float and bool is straightforward but all other cases are problematic. http://gerrit.cloudera.org:8080/#/c/5611/2/be/src/exec/parquet-column-stats.h File be/src/exec/parquet-column-stats.h: PS2, Line 35: DECIMAL > I added to the comment to clarify on this. No I don't think we need to support writing that encoding for decimals. Henry had a patch for reading that decimal encoding, so we may need to tread carefully when implementing the read path though. That all said, I think the Parquet/parquet-mr's decimal min/max stats are broken. It looks like they compute min/max for decimal based on their default binary ordering. This has two problems: * Ordering by unsigned bytes puts negative numbers ahead of positive numbers for twos-complement integers. We really need to . Maybe this is tolerable, although it probably makes it less effective (e.g. min: 0, max: -1 covers all possible values) * Parquet-mr does something even stranger by treating the bytes as signed: https://issues.apache.org/jira/browse/PARQUET-686 I think essentially parquet-mr's stats are broken in all cases where BinaryStatistics is used here: https://github.com/Parquet/parquet-mr/blob/master/parquet-column/src/main/java/parquet/column/statistics/Statistics.java#L64 -- To view, visit http://gerrit.cloudera.org:8080/5611 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8368ee58daa50c07a3b8ef65be70203eb941f619 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Ivanfi <[email protected]> Gerrit-HasComments: Yes
