[GitHub] [spark] srowen commented on issue #26654: [SPARK-30009][CORE][SQL] Support different floating-point Ordering for Scala 2.12 / 2.13

GitBox Wed, 27 Nov 2019 12:22:59 -0800

srowen commented on issue #26654: [SPARK-30009][CORE][SQL] Support different 
floating-point Ordering for Scala 2.12 / 2.13
URL: https://github.com/apache/spark/pull/26654#issuecomment-559238744
 
 
   Some extra context. Here's the key difference between the two 2.13 orderings:
   
   ```
   scala> Ordering.Double.TotalOrdering.compare(Double.NaN, Double.MaxValue)
   res5: Int = 1
   scala> Ordering.Double.TotalOrdering.gt(Double.NaN, Double.MaxValue)
   res6: Boolean = true
   scala> Ordering.Double.IeeeOrdering.compare(Double.NaN, Double.MaxValue)
   res7: Int = 1
   scala> Ordering.Double.IeeeOrdering.gt(Double.NaN, Double.MaxValue)
   res8: Boolean = false
   ```
   
   The weird thing is `IeeeOrdering` tries to work like the `<`, `<=`, etc 
operators, but still uses `java.lang.Double.compare` for `compare`, which seems 
internally inconsistent. 
   
   Scala 2.12's implementation of `Ordering.Double` looks the same as 
`IeeeOrdering`:
   
   ```
     trait DoubleOrdering extends Ordering[Double] {
       outer =>
   
       def compare(x: Double, y: Double) = java.lang.Double.compare(x, y)
   
       override def lteq(x: Double, y: Double): Boolean = x <= y
       override def gteq(x: Double, y: Double): Boolean = x >= y
       override def lt(x: Double, y: Double): Boolean = x < y
       override def gt(x: Double, y: Double): Boolean = x > y
      ...
   ```
   
   `TotalOrdering` delegates all of that to the result of 
`java.lang.Double.compare`, which appears to already handle NaNs consistently 
by sorting them above all other non-NaN values.
   
   Actually, do we need `Utils.nanSafeCompareDoubles` then, if it's the same?
   
   So, back to the point: the question is whether this difference matters in 
the current build. The difference arises in a few places:
   
   - Double/FloatExactNumeric in numerics.scala used in Double/FloatType as 
their 'exactNumeric' implementation
   - Double/FloatType, which define an ordering based on 
Utils.nanSafeCompareXXX, but also use things like `DoubleAsIfIntegral` which 
inherits from `DoubleOrdering`.
   
   I don't know if these two occurrences' ordering actually matters, as on a 
glance, it looks like these objects do not use them for ordering. I don't know 
the details enough to figure whether there is a subtle issue there.
   
   I didn't immediately see a test which exercises something like sorting NaNs 
with non-NaN values, and am not sure how to write it to check code gen vs 
interpreted. But that would tell us one way or the other whether there is an 
issue.
   
   In any event, it seems like:
   - The choice of TotalOrdering in 2.13 is good as it's consistent with 
Double.compare, Utils.nanSafeCompareDoubles
   - We will have to see whether 2.13 tests pass later! and revisit if not
   - We might even be able to remove nanSafeCompareDoubles for simplicity, but 
not important


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on issue #26654: [SPARK-30009][CORE][SQL] Support different floating-point Ordering for Scala 2.12 / 2.13

Reply via email to