Re: [PR] [SPARK-56757][PYTHON] Refactor SQL_SCALAR_PANDAS_ITER_UDF [spark]

via GitHub Thu, 11 Jun 2026 14:02:07 -0700


gaogaotiantian commented on code in PR #55756:
URL: https://github.com/apache/spark/pull/55756#discussion_r3399114266



##########
python/pyspark/worker.py:
##########
@@ -3312,58 +3248,84 @@ def func(split_index: int, data: 
Iterator[pa.RecordBatch]) -> Iterator[pa.Record
         return func, None, ser, ser
 
     if eval_type == PythonEvalType.SQL_SCALAR_PANDAS_ITER_UDF:
-        assert num_udfs == 1, "One SCALAR_ITER UDF expected here."
+        import pandas as pd
+        import pyarrow as pa
 
-        arg_offsets, udf = udfs[0]
+        assert num_udfs == 1, "One SCALAR_PANDAS_ITER UDF expected here."
+        udf_func, args_offsets, kwargs_offsets, return_type = udfs[0]
+
+        # Pre-compute target schema for output coercion
+        return_schema = StructType([StructField("_0", return_type)])
+        # mypy can't validate a dynamic class as a type parameter, so use Any 
here.
+        expected_iter_type: Any = (

Review Comment:
   This should be `: type`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56757][PYTHON] Refactor SQL_SCALAR_PANDAS_ITER_UDF [spark]

Reply via email to