viirya commented on a change in pull request #24997: [SPARK-28198][PYTHON] Add
mapPartitionsInPandas to allow an iterator of DataFrames
URL: https://github.com/apache/spark/pull/24997#discussion_r298947615
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -2192,6 +2193,51 @@ def toPandas(self):
_check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
return pdf
+ def mapPartitionsInPandas(self, udf):
+ """
+ Maps each partition of the current :class:`DataFrame` using a pandas
udf and returns
+ the result as a `DataFrame`.
+
+ The user-defined function should take an iterator of
`pandas.DataFrame`s and return another
Review comment:
Various function types are introduced in `pandas_udf`'s doc. Since this adds
new usage of `SCALAR_ITER`, we should modify the doc there too?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]