[jira] [Updated] (PARQUET-1222) Definition of float and double sort order is ambiguous
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Szadovszky updated PARQUET-1222: -- Fix Version/s: (was: format-2.5.0) > Definition of float and double sort order is ambiguous > -- > > Key: PARQUET-1222 > URL: https://issues.apache.org/jira/browse/PARQUET-1222 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Zoltan Ivanfi >Priority: Critical > > Currently parquet-format specifies the sort order for floating point numbers > as follows: > {code:java} >* FLOAT - signed comparison of the represented value >* DOUBLE - signed comparison of the represented value > {code} > The problem is that the comparison of floating point numbers is only a > partial ordering with strange behaviour in specific corner cases. For > example, according to IEEE 754, -0 is neither less nor more than \+0 and > comparing NaN to anything always returns false. This ordering is not suitable > for statistics. Additionally, the Java implementation already uses a > different (total) ordering that handles these cases correctly but differently > than the C\+\+ implementations, which leads to interoperability problems. > TypeDefinedOrder for doubles and floats should be deprecated and a new > TotalFloatingPointOrder should be introduced. The default for writing doubles > and floats would be the new TotalFloatingPointOrder. This ordering should be > effective and easy to implement in all programming languages. > For reading existing stats created using TypeDefinedOrder, the following > compatibility rules should be applied: > * When looking for NaN values, min and max should be ignored. > * If the min is a NaN, it should be ignored. > * If the max is a NaN, it should be ignored. > * If the min is \+0, the row group may contain -0 values as well. > * If the max is -0, the row group may contain \+0 values as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1222) Definition of float and double sort order is ambiguous
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PARQUET-1222: --- Summary: Definition of float and double sort order is ambiguous (was: Definition of float and double sort order is ambigious) > Definition of float and double sort order is ambiguous > -- > > Key: PARQUET-1222 > URL: https://issues.apache.org/jira/browse/PARQUET-1222 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Zoltan Ivanfi >Priority: Critical > Fix For: format-2.5.0 > > > Currently parquet-format specifies the sort order for floating point numbers > as follows: > {code:java} >* FLOAT - signed comparison of the represented value >* DOUBLE - signed comparison of the represented value > {code} > The problem is that the comparison of floating point numbers is only a > partial ordering with strange behaviour in specific corner cases. For > example, according to IEEE 754, -0 is neither less nor more than \+0 and > comparing NaN to anything always returns false. This ordering is not suitable > for statistics. Additionally, the Java implementation already uses a > different (total) ordering that handles these cases correctly but differently > than the C\+\+ implementations, which leads to interoperability problems. > TypeDefinedOrder for doubles and floats should be deprecated and a new > TotalFloatingPointOrder should be introduced. The default for writing doubles > and floats would be the new TotalFloatingPointOrder. This ordering should be > effective and easy to implement in all programming languages. > For reading existing stats created using TypeDefinedOrder, the following > compatibility rules should be applied: > * When looking for NaN values, min and max should be ignored. > * If the min is a NaN, it should be ignored. > * If the max is a NaN, it should be ignored. > * If the min is \+0, the row group may contain -0 values as well. > * If the max is -0, the row group may contain \+0 values as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)