cloud-fan commented on issue #24997: [SPARK-28198][PYTHON] Add 
mapPartitionsInPandas to allow an iterator of DataFrames
URL: https://github.com/apache/spark/pull/24997#issuecomment-507491024
 
 
   > I am not sure if we need to add every version of iterator or if we should 
keep the current API shape (adding every _ITER type of pandas evaluation code).
   
   Maybe we should allow end-users to specify if they want to take in 
Iterator[DataFrame] or not by a new parameter, e.g.
   ```
           @pandas_udf(df.schema, PandasUDFType.SCALAR, df_iter=True)
           def func(iterator):
               for pdf in iterator:
                   assert isinstance(pdf, pd.DataFrame)
                   assert [d.name for d in list(pdf.dtypes)] == ['int32', 
'object']
                   yield pdf
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to