[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

cloud-fan Sun, 18 Nov 2018 17:36:01 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23054#discussion_r234475156
  
    --- Diff: docs/sql-migration-guide-upgrade.md ---
    @@ -17,6 +17,9 @@ displayTitle: Spark SQL Upgrading Guide
     
       - The `ADD JAR` command previously returned a result set with the single 
value 0. It now returns an empty result set.
     
    +  - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a 
grouped dataset with key attribute wrongly named as "value", if the key is 
atomic type, e.g. int, string, etc. This is counterintuitive and makes the 
schema of aggregation queries weird. For example, the schema of 
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the 
grouping attribute to "key". The old behaviour is preserved under a newly added 
configuration `spark.sql.legacy.atomicKeyAttributeGroupByKey` with a default 
value of `false`.
    --- End diff --
    
    I realized that, only struct type key has the `key` alias. So here we 
should say: `if the key is non-struct type, e.g. int, string, array, etc.`



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #23054: [SPARK-26085][SQL] Key attribute of primitive typ...

Reply via email to