Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r162235197 --- Diff: python/pyspark/sql/group.py --- @@ -82,6 +91,13 @@ def agg(self, *exprs): >>> from pyspark.sql import functions as F >>> sorted(gdf.agg(F.min(df.age)).collect()) [Row(name=u'Alice', min(age)=2), Row(name=u'Bob', min(age)=5)] + + >>> from pyspark.sql.functions import pandas_udf, PandasUDFType + >>> @pandas_udf('int', PandasUDFType.GROUP_AGG) + ... def min_udf(v): + ... return v.min() + >>> sorted(gdf.agg(min_udf(df.age)).collect()) # doctest: +SKIP --- End diff -- I think in the future we should make pandas/arrow a requirement of pyspark, so that we can always assume the pandas/arrow is installed when run doc test.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org