[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40520: [SPARK-42896][SQL][PYSPARK] Make `mapInPandas` / `mapInArrow` support barrier mode execution

via GitHub Thu, 23 Mar 2023 07:08:38 -0700


WeichenXu123 commented on code in PR #40520:
URL: https://github.com/apache/spark/pull/40520#discussion_r1146257661



##########
python/pyspark/sql/pandas/map_ops.py:
##########
@@ -60,6 +62,7 @@ def mapInPandas(
         schema : :class:`pyspark.sql.types.DataType` or str
             the return type of the `func` in PySpark. The value can be either a
             :class:`pyspark.sql.types.DataType` object or a DDL-formatted type 
string.
+        is_barrier : Use barrier mode execution if True.
 
         Examples
         --------

Review Comment:
   Q: An example like this:
   ```
           >>> from pyspark.sql.functions import pandas_udf
           >>> df = spark.createDataFrame([(1, 21), (2, 30)], ("id", "age"))
           >>> def filter_func(iterator):
           ...     for pdf in iterator:
           ...         yield pdf[pdf.id == 1]
           >>> df.mapInPandas(filter_func, df.schema, isBarrier=True).show()  # 
doctest: +SKIP
   ```
   It cannot illustrate how barrier mode works. Do you have better idea ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40520: [SPARK-42896][SQL][PYSPARK] Make `mapInPandas` / `mapInArrow` support barrier mode execution

Reply via email to