Github user bavardage commented on the issue:

    https://github.com/apache/spark/pull/21794
  
    it does seem that spark currently does distinguish -0 and 0, at least as 
far as groupbys go
    
    ```
    scala> case class Thing(x : Float)
    defined class Thing
    
    scala> val df = Seq(Thing(0.0f), Thing(-0.0f),Thing(0.0f), 
Thing(-0.0f),Thing(0.0f), Thing(-0.0f),Thing(0.0f), Thing(-0.0f)).toDF
    2018-07-17 13:17:08 WARN  ObjectStore:568 - Failed to get database 
global_temp, returning NoSuchObjectException
    df: org.apache.spark.sql.DataFrame = [x: float]
    
    scala> df.groupBy($"x").count
    res0: org.apache.spark.sql.DataFrame = [x: float, count: bigint]
    
    scala> res0.collect
    res1: Array[org.apache.spark.sql.Row] = Array([-0.0,4], [0.0,4])
    ```
    
    doubles are hashed via `doubleToLongBits` 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L338
 which gives different bitwise representations of positive and negative 0


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to