Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r143800072
--- Diff: python/pyspark/sql/group.py ---
@@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None):
jgd = self._jgd.pivot(pivot_col)
else:
jgd = self._jgd.pivot(pivot_col, values)
- return GroupedData(jgd, self.sql_ctx)
+ return GroupedData(jgd, self._df)
+
+ @since(2.3)
+ def apply(self, udf):
+ """
+ Maps each group of the current :class:`DataFrame` using a pandas
udf and returns the result
+ as a `DataFrame`.
+
+ The user-defined function should take a `pandas.DataFrame` and
return another
+ `pandas.DataFrame`. For each group, all columns are passed
together as a `pandas.DataFrame`
+ to the user-function and the returned `pandas.DataFrame` are
combined as a `DataFrame`.
+ The returned `pandas.DataFrame` can be arbitrary length and its
schema must match the
--- End diff --
should be "can have an arbitrary length"
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]