Yicong-Huang commented on code in PR #53043:
URL: https://github.com/apache/spark/pull/53043#discussion_r2563437419
##########
python/pyspark/worker.py:
##########
@@ -3270,6 +3260,39 @@ def mapper(a):
df2_vals = table_from_batches(a[1], parsed_offsets[1][1])
return f(df1_keys, df1_vals, df2_keys, df2_vals)
+ elif eval_type in (
+ PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF,
+ PythonEvalType.SQL_WINDOW_AGG_PANDAS_UDF,
Review Comment:
the return type of GroupPandasUDFSerializer is an iterator now. The mapper
for SQL_GROUPED_AGG_PANDAS_UDF and SQL_WINDOW_AGG_PANDAS_UDF are expecting a
list (so it can use a[0] to access the column). The iterator returned from
GroupPandasUDFSerializer has to be converted to a list in their mapper.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]