Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/21427
I first glance, I thought this issue was slightly different than
https://issues.apache.org/jira/browse/SPARK-23929, but yeah it seems to be the
same. After reading through that discussion, I guess we need to be careful
about any changes. I'm not used to creating DataFrames by position, but it is
possible to do so with a list of tuples like the example from the doctest:
```
>>> @pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP) #
doctest: +SKIP
... def mean_udf(key, pdf):
... # key is a tuple of one numpy.int64, which is the value
... # of 'id' for the current group
... return pd.DataFrame([key + (pdf.v.mean(),)])
```
Then this would be a breaking change... so maybe it would be best to add
better documentation for now like @HyukjinKwon mentioned in SPARK-23929, and
target a change for Spark 3.0?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]