Github user maryannxue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22030#discussion_r208453178
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql](
*
* {{{
* // Compute the sum of earnings for each year by course with each
course as a separate column
- * df.groupBy($"year").pivot($"course", Seq("dotNET",
"Java")).sum($"earnings")
+ * df.groupBy($"year").pivot($"course", Seq(lit("dotNET"),
lit("Java"))).sum($"earnings")
+ * }}}
+ *
+ * For pivoting by multiple columns, use the `struct` function to
combine the columns and values:
+ *
+ * {{{
+ * df
+ * .groupBy($"year")
+ * .pivot(struct($"course", $"training"), Seq(struct(lit("java"),
lit("Experts"))))
+ * .agg(sum($"earnings"))
* }}}
*
* @param pivotColumn the column to pivot.
* @param values List of values that will be translated to columns in
the output DataFrame.
* @since 2.4.0
*/
- def pivot(pivotColumn: Column, values: Seq[Any]):
RelationalGroupedDataset = {
+ def pivot(pivotColumn: Column, values: Seq[Column]):
RelationalGroupedDataset = {
--- End diff --
The very fundamental interface we should have is `pivot(Column,
Seq[Column])`, which allows any form and any type of pivot column, and the same
with pivot values. This is close to what we support in SQL (SQL pivot support
will actually be a subset of DataFrame pivot support after we have this
interface), and verifying that the pivot values are constant is taken care of
in the Analyzer.
That said, we still need to keep the old `pivot(String, Seq[Any])` for
simple usages and for backward compatibility, but I don't think we need to
expand its capability. It is pretty clear to me that pivot(String ...) takes a
column name and simple objects while with pivot(Column...) you could make any
sophisticated use of pivot you would like to.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]