[ https://issues.apache.org/jira/browse/PARQUET-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Julien Le Dem resolved PARQUET-839. ----------------------------------- Resolution: Duplicate > Min-max should be computed based on logical type > ------------------------------------------------ > > Key: PARQUET-839 > URL: https://issues.apache.org/jira/browse/PARQUET-839 > Project: Parquet > Issue Type: Bug > Components: parquet-format > Affects Versions: format-2.3.1 > Reporter: Tim Armstrong > > The min/max stats are currently underspecified - it is not clear in any cases > from the spec what the expected ordering is. > There are some related issues, like PARQUET-686 to fix specific problems, but > there seems to be a general assumption that the min/max should be defined > based on the primitive type instead of the logical type. > However, this makes the stats nearly useless for some logical types. E.g. > consider a DECIMAL encoded into a (variable-length) BINARY. The min-max of > the underlying binary type is based on the lexical order of the byte string, > but that does not correspond to any reasonable ordering of the decimal > values. E.g. 16 (0x1 0x0) will be ordered between 1 (0x0) and (0x2). This > makes min-max filtering a lot less effective and would force query engines > using parquet to implement workarounds to produce correct results (e.g. > custom comparators). -- This message was sent by Atlassian JIRA (v6.3.15#6346)