cloud-fan commented on issue #24997: [SPARK-28198][PYTHON] Add mapPartitionsInPandas to allow an iterator of DataFrames URL: https://github.com/apache/spark/pull/24997#issuecomment-507491024 > I am not sure if we need to add every version of iterator or if we should keep the current API shape (adding every _ITER type of pandas evaluation code). Maybe we should allow end-users to specify if they want to take in Iterator[DataFrame] or not by a new parameter, e.g. ``` @pandas_udf(df.schema, PandasUDFType.SCALAR, df_iter=True) def func(iterator): for pdf in iterator: assert isinstance(pdf, pd.DataFrame) assert [d.name for d in list(pdf.dtypes)] == ['int32', 'object'] yield pdf ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
