[ 
https://issues.apache.org/jira/browse/PARQUET-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PARQUET-839.
-----------------------------------
    Resolution: Duplicate

> Min-max should be computed based on logical type
> ------------------------------------------------
>
>                 Key: PARQUET-839
>                 URL: https://issues.apache.org/jira/browse/PARQUET-839
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-format
>    Affects Versions: format-2.3.1
>            Reporter: Tim Armstrong
>
> The min/max stats are currently underspecified - it is not clear in any cases 
> from the spec what the expected ordering is.
> There are some related issues, like PARQUET-686 to fix specific problems, but 
> there seems to be a general assumption that the min/max should be defined 
> based on the primitive type instead of the logical type.
> However, this makes the stats nearly useless for some logical types. E.g. 
> consider a DECIMAL encoded into a (variable-length) BINARY. The min-max of 
> the underlying binary type is based on the lexical order of the byte string, 
> but that does not correspond to any reasonable ordering of the decimal 
> values. E.g. 16 (0x1 0x0) will be ordered between 1 (0x0) and (0x2). This 
> makes min-max filtering a lot less effective and would force query engines 
> using parquet to implement workarounds to produce correct results (e.g. 
> custom comparators).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to