[ https://issues.apache.org/jira/browse/ARROW-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932399#comment-16932399 ]
Antoine Pitrou commented on ARROW-6149: --------------------------------------- cc [~wesmckinn] > [Parquet] Decimal comparisons used for min/max statistics are not correct > ------------------------------------------------------------------------- > > Key: ARROW-6149 > URL: https://issues.apache.org/jira/browse/ARROW-6149 > Project: Apache Arrow > Issue Type: Bug > Components: C++ > Affects Versions: 0.14.1 > Reporter: Philip Felton > Priority: Major > Fix For: 1.0.0 > > > The [Parquet Format > specifications|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md] > says > bq. If the column uses int32 or int64 physical types, then signed comparison > of the integer values produces the correct ordering. If the physical type is > fixed, then the correct ordering can be produced by flipping the > most-significant bit in the first byte and then using unsigned byte-wise > comparison. > However this isn't followed in the C++ Parquet code. 16-byte decimal > comparison is implemented using a lexicographical comparison of signed chars. > This appears to be because the function > [https://github.com/apache/arrow/blob/master/cpp/src/parquet/statistics.cc#L183] > just goes off the sort_order (signed) and physical_type > (FIXED_LENGTH_BYTE_ARRAY), there is no override for decimal. -- This message was sent by Atlassian Jira (v8.3.4#803005)