[PR] [SPARK-57645][PYTHON][TESTS] Add ASV microbenchmark for SQL_GROUPED_AGG_PANDAS_ITER_UDF [spark]

via GitHub Wed, 24 Jun 2026 00:11:55 -0700


Yicong-Huang opened a new pull request, #56730:
URL: https://github.com/apache/spark/pull/56730


   ### What changes were proposed in this pull request?
   
   Add an ASV microbenchmark for `SQL_GROUPED_AGG_PANDAS_ITER_UDF` to 
`python/benchmarks/bench_eval_type.py`, parallel to the existing 
`GroupedAggArrowIterUDFTimeBench`. New classes: 
`_GroupedAggPandasIterBenchMixin`, `GroupedAggPandasIterUDFTimeBench`, and 
`GroupedAggPandasIterUDFPeakmemBench`. The mixin reuses 
`_write_scenario`/`_build_scenario`/`_scenario_configs` from the non-iterator 
Pandas sibling and only overrides the eval type and the iterator-style UDFs 
(`sum_udf`, `mean_multi_udf`) that consume an `Iterator[pd.Series]`.
   
   ### Why are the changes needed?
   
   `SQL_GROUPED_AGG_PANDAS_ITER_UDF` had no worker-level microbenchmark. This 
fills the coverage gap and provides a before/after baseline for an upcoming 
serializer refactor of this eval type.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests. Benchmark-only addition. The worker output of the new 
iterator bench was verified to be byte-identical to the non-iterator Pandas 
grouped-agg bench across all scenario/UDF combinations (only the trailing 
timing telemetry differs).
   
   ASV results (`COLUMNS=120 asv run --bench GroupedAggPandasIterUDFTimeBench 
-a repeat=3 --python=same`):
   
   ```text
   <ASV_RESULTS_PLACEHOLDER>
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-57645][PYTHON][TESTS] Add ASV microbenchmark for SQL_GROUPED_AGG_PANDAS_ITER_UDF [spark]

Reply via email to