Re: [PR] [SPARK-57645][PYTHON][TESTS] Add ASV microbenchmark for SQL_GROUPED_AGG_PANDAS_ITER_UDF [spark]

via GitHub Wed, 24 Jun 2026 03:59:02 -0700


uros-b commented on PR #56730:
URL: https://github.com/apache/spark/pull/56730#issuecomment-4788514192


   > The worker output of the new iterator bench was verified to be 
byte-identical to the non-iterator Pandas grouped-agg bench
   
   Minor note regarding the PR description, please confirm - in worker.py: the 
non-iterator SQL_GROUPED_AGG_PANDAS_UDF writes via 
ArrowStreamGroupSerializer(write_start_stream=True) while the ITER variant uses 
ArrowStreamAggPandasUDFSerializer; genuinely different output 
serializers/markers, so the byte streams are not identical. Please update in 
order to avoid misleading a future reader.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57645][PYTHON][TESTS] Add ASV microbenchmark for SQL_GROUPED_AGG_PANDAS_ITER_UDF [spark]

Reply via email to