mengxr commented on a change in pull request #24897: [SPARK-28056] [PYTHON] add doc for SCALAR_ITER Pandas UDF URL: https://github.com/apache/spark/pull/24897#discussion_r294596371
########## File path: docs/sql-pyspark-pandas-with-arrow.md ########## @@ -86,6 +86,22 @@ The following example shows how to create a scalar Pandas UDF that computes the </div> </div> +### Scalar Iterator + +Scalar iterator (`SCALAR_ITER`) Pandas UDF is the same as scalar Pandas UDF above except that the +underlying Python function takes an iterator of batches as input instead of a single batch and +it yields output batches instead of returning a single output batch. +It is useful when the UDF execution requires initializing some states, e.g., loading an machine +learning model file to apply inference to every input batch. + Review comment: Updated. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
