Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/23054#discussion_r234475488
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide
- The `ADD JAR` command previously returned a result set with the single
value 0. It now returns an empty result set.
+ - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a
grouped dataset with key attribute wrongly named as "value", if the key is
atomic type, e.g. int, string, etc. This is counterintuitive and makes the
schema of aggregation queries weird. For example, the schema of
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the
grouping attribute to "key". The old behaviour is preserved under a newly added
configuration `spark.sql.legacy.atomicKeyAttributeGroupByKey` with a default
value of `false`.
--- End diff --
Ok. More accurate.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]