srowen commented on issue #26654: [SPARK-30009][CORE][SQL] Support different floating-point Ordering for Scala 2.12 / 2.13 URL: https://github.com/apache/spark/pull/26654#issuecomment-559238744 Some extra context. Here's the key difference between the two 2.13 orderings: ``` scala> Ordering.Double.TotalOrdering.compare(Double.NaN, Double.MaxValue) res5: Int = 1 scala> Ordering.Double.TotalOrdering.gt(Double.NaN, Double.MaxValue) res6: Boolean = true scala> Ordering.Double.IeeeOrdering.compare(Double.NaN, Double.MaxValue) res7: Int = 1 scala> Ordering.Double.IeeeOrdering.gt(Double.NaN, Double.MaxValue) res8: Boolean = false ``` The weird thing is `IeeeOrdering` tries to work like the `<`, `<=`, etc operators, but still uses `java.lang.Double.compare` for `compare`, which seems internally inconsistent. Scala 2.12's implementation of `Ordering.Double` looks the same as `IeeeOrdering`: ``` trait DoubleOrdering extends Ordering[Double] { outer => def compare(x: Double, y: Double) = java.lang.Double.compare(x, y) override def lteq(x: Double, y: Double): Boolean = x <= y override def gteq(x: Double, y: Double): Boolean = x >= y override def lt(x: Double, y: Double): Boolean = x < y override def gt(x: Double, y: Double): Boolean = x > y ... ``` `TotalOrdering` delegates all of that to the result of `java.lang.Double.compare`, which appears to already handle NaNs consistently by sorting them above all other non-NaN values. Actually, do we need `Utils.nanSafeCompareDoubles` then, if it's the same? So, back to the point: the question is whether this difference matters in the current build. The difference arises in a few places: - Double/FloatExactNumeric in numerics.scala used in Double/FloatType as their 'exactNumeric' implementation - Double/FloatType, which define an ordering based on Utils.nanSafeCompareXXX, but also use things like `DoubleAsIfIntegral` which inherits from `DoubleOrdering`. I don't know if these two occurrences' ordering actually matters, as on a glance, it looks like these objects do not use them for ordering. I don't know the details enough to figure whether there is a subtle issue there. I didn't immediately see a test which exercises something like sorting NaNs with non-NaN values, and am not sure how to write it to check code gen vs interpreted. But that would tell us one way or the other whether there is an issue. In any event, it seems like: - The choice of TotalOrdering in 2.13 is good as it's consistent with Double.compare, Utils.nanSafeCompareDoubles - We will have to see whether 2.13 tests pass later! and revisit if not - We might even be able to remove nanSafeCompareDoubles for simplicity, but not important
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
