[GitHub] [spark] mengxr commented on a change in pull request #24897: [SPARK-28056] [PYTHON] add doc for SCALAR_ITER Pandas UDF

GitBox Mon, 17 Jun 2019 20:25:49 -0700

mengxr commented on a change in pull request #24897: [SPARK-28056] [PYTHON] add 
doc for SCALAR_ITER Pandas UDF
URL: https://github.com/apache/spark/pull/24897#discussion_r294596371


 ##########
 File path: docs/sql-pyspark-pandas-with-arrow.md
 ##########
 @@ -86,6 +86,22 @@ The following example shows how to create a scalar Pandas 
UDF that computes the
 </div>
 </div>
 
+### Scalar Iterator
+
+Scalar iterator (`SCALAR_ITER`) Pandas UDF is the same as scalar Pandas UDF 
above except that the
+underlying Python function takes an iterator of batches as input instead of a 
single batch and 
+it yields output batches instead of returning a single output batch.
+It is useful when the UDF execution requires initializing some states, e.g., 
loading an machine
+learning model file to apply inference to every input batch.
+
 
 Review comment:
   Updated.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mengxr commented on a change in pull request #24897: [SPARK-28056] [PYTHON] add doc for SCALAR_ITER Pandas UDF

Reply via email to