https://github.com/apache/parquet-format/pull/185 has been merged.
On Fri, Nov 4, 2022 at 9:54 PM Micah Kornfield
wrote:
> A new proposal for adding a logical annotation to support Float16 values
> [1] reopened the discussion on specifying how parquet should deal with
> edge cases for floating point types (PARQUET-1222 [2]).
>
> To try to resolve this the consensus from the JIRA is to not try to
> specify an ordering when writing but only rules but rather only specify
> rules for reading data. The rules where already present in the
> parquet.thrift file [3]. They are:
>
>>
>>
>>* - If the min is a NaN, it should be ignored.
>>* - If the max is a NaN, it should be ignored.
>>* - If the min is +0, the row group may contain -0 values as well.
>>* - If the max is -0, the row group may contain +0 values as well.
>>* - When looking for NaN values, min and max should be ignored.
>
>
> I've created a PR [4] to update README.md in parquet-format that:
> 1. Specifies statistics should not be used when a column has an unknown
> logical type since correct comparisons cannot be performed.
> 2. Specifies the ordering for primitive types and references the
> parquet.thrift for the details on how to handle floating point values.
>
> Feedback and other ideas are welcome.
>
> Thanks,
> Micah
>
> [1] https://github.com/apache/parquet-format/pull/184
> [2] https://issues.apache.org/jira/browse/PARQUET-1222
> [3]
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L897
> [4] https://github.com/apache/parquet-format/pull/185
>
>