[GitHub] [spark] viirya commented on a change in pull request #24997: [SPARK-28198][PYTHON] Add mapPartitionsInPandas to allow an iterator of DataFrames

GitBox Mon, 01 Jul 2019 03:14:17 -0700

viirya commented on a change in pull request #24997: [SPARK-28198][PYTHON] Add 
mapPartitionsInPandas to allow an iterator of DataFrames
URL: https://github.com/apache/spark/pull/24997#discussion_r298947615


 ##########
 File path: python/pyspark/sql/dataframe.py
 ##########
 @@ -2192,6 +2193,51 @@ def toPandas(self):
                         
_check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
             return pdf
 
+    def mapPartitionsInPandas(self, udf):
+        """
+        Maps each partition of the current :class:`DataFrame` using a pandas 
udf and returns
+        the result as a `DataFrame`.
+
+        The user-defined function should take an iterator of 
`pandas.DataFrame`s and return another
 
 Review comment:
   Various function types are introduced in `pandas_udf`'s doc. Since this adds 
new usage of `SCALAR_ITER`, we should modify the doc there too?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #24997: [SPARK-28198][PYTHON] Add mapPartitionsInPandas to allow an iterator of DataFrames

Reply via email to