Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22030#discussion_r208620098
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql](
        *
        * {{{
        *   // Compute the sum of earnings for each year by course with each 
course as a separate column
    -   *   df.groupBy($"year").pivot($"course", Seq("dotNET", 
"Java")).sum($"earnings")
    +   *   df.groupBy($"year").pivot($"course", Seq(lit("dotNET"), 
lit("Java"))).sum($"earnings")
    +   * }}}
    +   *
    +   * For pivoting by multiple columns, use the `struct` function to 
combine the columns and values:
    +   *
    +   * {{{
    +   *   df
    +   *     .groupBy($"year")
    +   *     .pivot(struct($"course", $"training"), Seq(struct(lit("java"), 
lit("Experts"))))
    +   *     .agg(sum($"earnings"))
        * }}}
        *
        * @param pivotColumn the column to pivot.
        * @param values List of values that will be translated to columns in 
the output DataFrame.
        * @since 2.4.0
        */
    -  def pivot(pivotColumn: Column, values: Seq[Any]): 
RelationalGroupedDataset = {
    +  def pivot(pivotColumn: Column, values: Seq[Column]): 
RelationalGroupedDataset = {
    --- End diff --
    
    > My question was that it's from your speculation or actual feedback from 
users...
    
    This is an actual feedback from our users who want to do pivoting by 
multiple columns. They have to use an external systems (even Microsoft Excel 
does it better) for pivoting by many columns for now because Spark doesn't 
allow this. You cannot express for example this on the latest release:
    ```
    trainingSales
          .groupBy($"sales.year")
          .pivot(struct(lower($"sales.course"), $"training"), Seq(
            struct(lit("dotnet"), lit("Experts")),
            struct(lit("java"), lit("Dummies")))
          ).agg(sum($"sales.earnings"))
    ```
    
    via `def pivot(pivotColumn: String, values: Seq[Any])`. I am not speaking 
about the recently added method `def pivot(pivotColumn: Column, values: 
Seq[Any])` which we are going to make more concise and eliminate unnecessary 
generic type `Any`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to