Re: [PR] [WIP][SPARK-54316][CORE][PYTHON][SQL] Consolidate `GroupPandasIterUDFSerializer` with `GroupPandasUDFSerializer` [spark]

via GitHub Tue, 25 Nov 2025 22:18:40 -0800


Yicong-Huang commented on code in PR #53043:
URL: https://github.com/apache/spark/pull/53043#discussion_r2563437419



##########
python/pyspark/worker.py:
##########
@@ -3270,6 +3260,39 @@ def mapper(a):
             df2_vals = table_from_batches(a[1], parsed_offsets[1][1])
             return f(df1_keys, df1_vals, df2_keys, df2_vals)
 
+    elif eval_type in (
+        PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF,
+        PythonEvalType.SQL_WINDOW_AGG_PANDAS_UDF,

Review Comment:
   the return type of  GroupPandasUDFSerializer is an iterator now. The mapper 
for SQL_GROUPED_AGG_PANDAS_UDF and SQL_WINDOW_AGG_PANDAS_UDF are expecting a 
list (so it can use a[0] to access the column). The iterator returned from 
GroupPandasUDFSerializer has to be converted to a list in their mapper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [WIP][SPARK-54316][CORE][PYTHON][SQL] Consolidate `GroupPandasIterUDFSerializer` with `GroupPandasUDFSerializer` [spark]

Reply via email to