[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

cloud-fan Wed, 17 Jan 2018 18:28:08 -0800

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19872#discussion_r162235197
  
    --- Diff: python/pyspark/sql/group.py ---
    @@ -82,6 +91,13 @@ def agg(self, *exprs):
             >>> from pyspark.sql import functions as F
             >>> sorted(gdf.agg(F.min(df.age)).collect())
             [Row(name=u'Alice', min(age)=2), Row(name=u'Bob', min(age)=5)]
    +
    +        >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
    +        >>> @pandas_udf('int', PandasUDFType.GROUP_AGG)
    +        ... def min_udf(v):
    +        ...     return v.min()
    +        >>> sorted(gdf.agg(min_udf(df.age)).collect())  # doctest: +SKIP
    --- End diff --
    
    I think in the future we should make pandas/arrow a requirement of pyspark, 
so that we can always assume the pandas/arrow is installed when run doc test.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19872: [SPARK-22274][PYTHON][SQL] User-defined aggregati...

Reply via email to