[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...

HyukjinKwon Tue, 07 Aug 2018 23:33:09 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22030#discussion_r208469262
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -403,20 +415,29 @@ class RelationalGroupedDataset protected[sql](
        *
        * {{{
        *   // Compute the sum of earnings for each year by course with each 
course as a separate column
    -   *   df.groupBy($"year").pivot($"course", Seq("dotNET", 
"Java")).sum($"earnings")
    +   *   df.groupBy($"year").pivot($"course", Seq(lit("dotNET"), 
lit("Java"))).sum($"earnings")
    +   * }}}
    +   *
    +   * For pivoting by multiple columns, use the `struct` function to 
combine the columns and values:
    +   *
    +   * {{{
    +   *   df
    +   *     .groupBy($"year")
    +   *     .pivot(struct($"course", $"training"), Seq(struct(lit("java"), 
lit("Experts"))))
    +   *     .agg(sum($"earnings"))
        * }}}
        *
        * @param pivotColumn the column to pivot.
        * @param values List of values that will be translated to columns in 
the output DataFrame.
        * @since 2.4.0
        */
    -  def pivot(pivotColumn: Column, values: Seq[Any]): 
RelationalGroupedDataset = {
    +  def pivot(pivotColumn: Column, values: Seq[Column]): 
RelationalGroupedDataset = {
    --- End diff --
    
    I think https://github.com/apache/spark/pull/22030#discussion_r208456164 
makes perfect sense. We really don't need to make it complicated.
    
    >  having an explicit Seq[Column] type is less confusing and kind of tells 
people by itself that we are now support complex types in pivot values.
    
    My question was that it's from your speculation or actual feedback from 
users since the original interface has existed for few years and I haven't seen 
some complaints about this so far as far as I can tell.
    
    It's okay if we clearly document this with some examples. It wouldn't 
necessarily make some differences between same overloaded APIs.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22030: [SPARK-25048][SQL] Pivoting by multiple columns i...

Reply via email to