[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...

HyukjinKwon Wed, 08 Aug 2018 00:25:52 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21699#discussion_r208479881
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -371,25 +426,14 @@ class RelationalGroupedDataset protected[sql](
     
       /**
        * (Java-specific) Pivots a column of the current `DataFrame` and 
performs the specified
    -   * aggregation.
    -   *
    -   * There are two versions of pivot function: one that requires the 
caller to specify the list
    -   * of distinct values to pivot on, and one that does not. The latter is 
more concise but less
    -   * efficient, because Spark needs to first compute the list of distinct 
values internally.
    +   * aggregation. This is an overloaded version of the `pivot` method with 
`pivotColumn` of
    +   * the `String` type.
        *
    -   * {{{
    -   *   // Compute the sum of earnings for each year by course with each 
course as a separate column
    -   *   df.groupBy("year").pivot("course", Arrays.<Object>asList("dotNET", 
"Java")).sum("earnings");
    -   *
    -   *   // Or without specifying column values (less efficient)
    -   *   df.groupBy("year").pivot("course").sum("earnings");
    -   * }}}
    -   *
    -   * @param pivotColumn Name of the column to pivot.
    +   * @param pivotColumn the column to pivot.
        * @param values List of values that will be translated to columns in 
the output DataFrame.
    -   * @since 1.6.0
    +   * @since 2.4.0
        */
    -  def pivot(pivotColumn: String, values: java.util.List[Any]): 
RelationalGroupedDataset = {
    +  def pivot(pivotColumn: Column, values: java.util.List[Any]): 
RelationalGroupedDataset = {
    --- End diff --
    
    If there's a plan for auditing it in 3.0.0, I am okay with going ahead with 
`Column` but thing is, we should deprecate them first.
    
    For the current status, I think the problem here is, this is an overloaded 
version of pivot and wouldn't necessarily make the differences.
    
    I used `pivot` heavily in the previous company before and I am pretty sure 
`pivot(String, actual values)` has been a common pattern and usage so far.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21699: [SPARK-24722][SQL] pivot() with Column type argum...

Reply via email to