Yicong-Huang opened a new pull request, #56730: URL: https://github.com/apache/spark/pull/56730
### What changes were proposed in this pull request? Add an ASV microbenchmark for `SQL_GROUPED_AGG_PANDAS_ITER_UDF` to `python/benchmarks/bench_eval_type.py`, parallel to the existing `GroupedAggArrowIterUDFTimeBench`. New classes: `_GroupedAggPandasIterBenchMixin`, `GroupedAggPandasIterUDFTimeBench`, and `GroupedAggPandasIterUDFPeakmemBench`. The mixin reuses `_write_scenario`/`_build_scenario`/`_scenario_configs` from the non-iterator Pandas sibling and only overrides the eval type and the iterator-style UDFs (`sum_udf`, `mean_multi_udf`) that consume an `Iterator[pd.Series]`. ### Why are the changes needed? `SQL_GROUPED_AGG_PANDAS_ITER_UDF` had no worker-level microbenchmark. This fills the coverage gap and provides a before/after baseline for an upcoming serializer refactor of this eval type. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. Benchmark-only addition. The worker output of the new iterator bench was verified to be byte-identical to the non-iterator Pandas grouped-agg bench across all scenario/UDF combinations (only the trailing timing telemetry differs). ASV results (`COLUMNS=120 asv run --bench GroupedAggPandasIterUDFTimeBench -a repeat=3 --python=same`): ```text <ASV_RESULTS_PLACEHOLDER> ``` ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
