[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

icexelloss Tue, 10 Oct 2017 11:22:48 -0700

Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18732#discussion_r143812311
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
    @@ -435,6 +435,35 @@ class RelationalGroupedDataset protected[sql](
               df.logicalPlan.output,
               df.logicalPlan))
       }
    +
    +  /**
    +   * Applies a vectorized python user-defined function to each group of 
data.
    +   * The user-defined function defines a transformation: 
`Pandas.DataFrame` -> `Pandas.DataFrame`.
    +   * For each group, all elements in the group are passed as a 
`Pandas.DataFrame` and the results
    +   * for all groups are combined into a new `DataFrame`.
    +   *
    +   * This function does not support partial aggregation, and requires 
shuffling all the data in
    +   * the `DataFrame`.
    --- End diff --
    
    Fixed.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

Reply via email to