uros-b commented on PR #56730: URL: https://github.com/apache/spark/pull/56730#issuecomment-4788514192
> The worker output of the new iterator bench was verified to be byte-identical to the non-iterator Pandas grouped-agg bench Minor note regarding the PR description, please confirm - in worker.py: the non-iterator SQL_GROUPED_AGG_PANDAS_UDF writes via ArrowStreamGroupSerializer(write_start_stream=True) while the ITER variant uses ArrowStreamAggPandasUDFSerializer; genuinely different output serializers/markers, so the byte streams are not identical. Please update in order to avoid misleading a future reader. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
