Github user icexelloss commented on a diff in the pull request:
https://github.com/apache/spark/pull/18732#discussion_r143263694
--- Diff: python/pyspark/sql/group.py ---
@@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None):
jgd = self._jgd.pivot(pivot_col)
else:
jgd = self._jgd.pivot(pivot_col, values)
- return GroupedData(jgd, self.sql_ctx)
+ return GroupedData(jgd, self._df)
+
+ @since(2.3)
+ def apply(self, udf):
--- End diff --
@rxin just to recap our discussion regarding naming:
You asked:
> What's the difference between this one and the transform function you
also proposed? I'm trying to see if all the naming
makes sense when considered together.
Answer is:
`transform` takes a function: pd.Series -> pd.Series and apply the function
on each column (or subset of columns). The input and output Series are of the
same length.
`apply` takes a function: pd.DataFrame -> pd.DataFrame and apply the
function on the group. Similar to `flatMapGroups`
Does this make sense to you?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]