cloud-fan commented on a change in pull request #23388: [SPARK-26448][SQL]
retain the difference between 0.0 and -0.0
URL: https://github.com/apache/spark/pull/23388#discussion_r245946612
##########
File path: docs/sql-migration-guide-upgrade.md
##########
@@ -25,7 +25,7 @@ displayTitle: Spark SQL Upgrading Guide
- In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a
grouped dataset with key attribute wrongly named as "value", if the key is
non-struct type, e.g. int, string, array, etc. This is counterintuitive and
makes the schema of aggregation queries weird. For example, the schema of
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the
grouping attribute to "key". The old behaviour is preserved under a newly added
configuration `spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue` with a
default value of `false`.
- - In Spark version 2.4 and earlier, float/double -0.0 is semantically equal
to 0.0, but users can still distinguish them via `Dataset.show`,
`Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0
internally, and users can't distinguish them any more.
+ - In Spark version 2.4 and earlier, float/double -0.0 is semantically equal
to 0.0, but -0.0 and 0.0 are considered as different values when used in
aggregate grouping keys, window partition keys and join keys. Since Spark 3.0,
this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()`
returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and
earlier.
Review comment:
I think we only need to mention the difference between new and old versions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]