Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18732
@rxin, `transform` takes a function: pd.Series -> pd.Series and apply the
function on all columns:
```
df.show()
id v1 v2 v3
a 1.0 4.0 0.0
a 2.0 5.0 1.0
a 3.0 6.0 1.0
df.groupby('id').transform(pandas_udf(lambda v: v - v.mean(),
DoubleType())).show()
id v1 v2 v3
a -1.0 -1.0 -0.666667
a 0.0 0.0 0.333333
a 1.0 1.0 0.333333
```
This is mimicking `pd.DataFrame.groupby().transform`
`apply` takes a function: pd.DataFrame -> pd.DataFrame and is similar to
`flatMapGroups`
The name `apply` is originated from the R paper "The Split-Apply-Combine
Strategy for Data Analysis" and is used in both pandas and R to describe this
function, so the name `apply` should be pretty straight forward to
pandas/python user.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]