[GitHub] spark pull request #21887: [SPARK-23633][SQL] Update Pandas UDFs section in ...

HyukjinKwon Thu, 26 Jul 2018 18:13:07 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21887#discussion_r205644627
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1804,6 +1804,25 @@ The following example shows how to use 
`groupby().apply()` to subtract the mean
     For detailed usage, please see 
[`pyspark.sql.functions.pandas_udf`](api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf)
 and
     
[`pyspark.sql.GroupedData.apply`](api/python/pyspark.sql.html#pyspark.sql.GroupedData.apply).
     
    +### Grouped Aggregate
    +
    +Grouped aggregate Pandas UDFs are similar to Spark aggregate functions. 
Grouped aggregate Pandas UDFs are used with groupBy and
    +window operations. It defines an aggregation from one or more 
`pandas.Series`
    +to a scalar value, where the `pandas.Series` represents values for a 
column within the same group or window.
    +
    +Note that this type of UDF doesn't not support partial aggregation and all 
data for a group or window will be loaded into memory. Also,
    --- End diff --
    
    Seems a typo `doesn't not` (BTW, I usually avoid abbreviation in 
documentation though).



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21887: [SPARK-23633][SQL] Update Pandas UDFs section in ...

Reply via email to