Github user adoron commented on the issue: https://github.com/apache/spark/pull/23043 @cloud-fan that's what I thought as well at first, but the flow doesn't go through that code - running `Seq(0.0d, 0.0d, -0.0d).toDF("i").groupBy("i").count().collect()` and adding a breakpoint. The reason for -0.0 and 0.0 being put in different buckets of "group by" is in UnsafeFixedWidthAggregationMap::getAggregationBufferFromUnsafeRow(): ``` public UnsafeRow getAggregationBufferFromUnsafeRow(UnsafeRow key) { return getAggregationBufferFromUnsafeRow(key, key.hashCode()); } ``` The hashing is done on the UnsafeRow, and by this point the whole row is hashed as a unit and it's hard to find the double columns and their value.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org