Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22030#discussion_r208421457
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql](
        *
        * {{{
        *   // Compute the sum of earnings for each year by course with each 
course as a separate column
    -   *   df.groupBy($"year").pivot($"course", Seq("dotNET", 
"Java")).sum($"earnings")
    +   *   df.groupBy($"year").pivot($"course", Seq(lit("dotNET"), 
lit("Java"))).sum($"earnings")
    +   * }}}
    +   *
    +   * For pivoting by multiple columns, use the `struct` function to 
combine the columns and values:
    +   *
    +   * {{{
    +   *   df
    +   *     .groupBy($"year")
    +   *     .pivot(struct($"course", $"training"), Seq(struct(lit("java"), 
lit("Experts"))))
    +   *     .agg(sum($"earnings"))
        * }}}
        *
        * @param pivotColumn the column to pivot.
        * @param values List of values that will be translated to columns in 
the output DataFrame.
        * @since 2.4.0
        */
    -  def pivot(pivotColumn: Column, values: Seq[Any]): 
RelationalGroupedDataset = {
    +  def pivot(pivotColumn: Column, values: Seq[Column]): 
RelationalGroupedDataset = {
    --- End diff --
    
    Hm, wouldn't we better allow this `Seq[Column]` for both `pivot(String 
...)` and `pivot(Column ...)` too by `Seq[Any]` since `pivot(String ...)`'s 
signature allows it?
    
    BTW, we should document this in the param and describe the difference 
clearly in the documentation. Otherwise, seems the current API change makes the 
usage potentially quite confusing to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to