Github user MaxGekk commented on a diff in the pull request:
https://github.com/apache/spark/pull/22030#discussion_r208620098
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql](
*
* {{{
* // Compute the sum of earnings for each year by course with each
course as a separate column
- * df.groupBy($"year").pivot($"course", Seq("dotNET",
"Java")).sum($"earnings")
+ * df.groupBy($"year").pivot($"course", Seq(lit("dotNET"),
lit("Java"))).sum($"earnings")
+ * }}}
+ *
+ * For pivoting by multiple columns, use the `struct` function to
combine the columns and values:
+ *
+ * {{{
+ * df
+ * .groupBy($"year")
+ * .pivot(struct($"course", $"training"), Seq(struct(lit("java"),
lit("Experts"))))
+ * .agg(sum($"earnings"))
* }}}
*
* @param pivotColumn the column to pivot.
* @param values List of values that will be translated to columns in
the output DataFrame.
* @since 2.4.0
*/
- def pivot(pivotColumn: Column, values: Seq[Any]):
RelationalGroupedDataset = {
+ def pivot(pivotColumn: Column, values: Seq[Column]):
RelationalGroupedDataset = {
--- End diff --
> My question was that it's from your speculation or actual feedback from
users...
This is an actual feedback from our users who want to do pivoting by
multiple columns. They have to use an external systems (even Microsoft Excel
does it better) for pivoting by many columns for now because Spark doesn't
allow this. You cannot express for example this on the latest release:
```
trainingSales
.groupBy($"sales.year")
.pivot(struct(lower($"sales.course"), $"training"), Seq(
struct(lit("dotnet"), lit("Experts")),
struct(lit("java"), lit("Dummies")))
).agg(sum($"sales.earnings"))
```
via `def pivot(pivotColumn: String, values: Seq[Any])`. I am not speaking
about the recently added method `def pivot(pivotColumn: Column, values:
Seq[Any])` which we are going to make more concise and eliminate unnecessary
generic type `Any`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]