Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/18732
@cloud-fan, it's a good question, I thought quite a bit about it and
discussed with @viirya
-https://github.com/apache/spark/pull/18732#pullrequestreview-66106082
Just to recap, I think from a API perspective, having just one decorator
`pandas_udf` making it easier for user to use - they don't need to think about
which decorator to use where. It does make it a little bit complicated for
implementation because some code have to interpret the context in which a
pandas_udf is used, i.e., `pandas_udf` in `groupby()apply()` is a
`pandas.DataFrame -> pandas.DataFrame`, and in `withColumn`, `select` it's
`pandas.Series -> pandas.Series`.
Another thought is even if we were to introduce something like
`pandas_df_udf`, for instance, we might still run into issues in the future,
where, say, we want a aggregate pandas udf that defines mapping `pandas.Series
-> scalar`, so I don't think we can define a decorator for every input/output
shape because there can potentially be many.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]