[PR] [SPARK-56658][PYTHON][TESTS] Add ASV microbenchmark for SQL_WINDOW_AGG_PANDAS_UDF [spark]

via GitHub Wed, 29 Apr 2026 00:17:58 -0700


Yicong-Huang opened a new pull request, #55603:
URL: https://github.com/apache/spark/pull/55603


   ### What changes were proposed in this pull request?
   
   Add ASV micro-benchmarks for `SQL_WINDOW_AGG_PANDAS_UDF` to 
`python/benchmarks/bench_eval_type.py`, mirroring the existing 
`SQL_WINDOW_AGG_ARROW_UDF` benchmarks added in 
[SPARK-56120](https://issues.apache.org/jira/browse/SPARK-56120).
   
   The new mixin `_WindowAggPandasBenchMixin` reuses the same scenario shapes 
and UDF set as the Arrow variant (sum, mean_multi) and writes the worker 
payload with `runner_conf={"window_bound_types": "unbounded"}`.
   
   ### Why are the changes needed?
   
   Part of [SPARK-55724](https://issues.apache.org/jira/browse/SPARK-55724). 
Establishes performance baselines before refactoring this eval type.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   `./python/asv check --python=same` passes. Local quick run on 
`WindowAggPandasUDFTimeBench` and `WindowAggPandasUDFPeakmemBench`:
   
   ```text
   bench_eval_type.WindowAggPandasUDFTimeBench.time_worker
   ================ ========== ================
   --                           udf
   ---------------- ---------------------------
       scenario      sum_udf    mean_multi_udf
   ================ ========== ================
    few_groups_sm    74.8 ms       75.3 ms
    few_groups_lg    115 ms        125 ms
    many_groups_sm   1.92 s        2.13 s
    many_groups_lg   578 ms        618 ms
      wide_cols      509 ms        539 ms
   ================ ========== ================
   
   bench_eval_type.WindowAggPandasUDFPeakmemBench.peakmem_worker
   ================ ========= ================
   --                          udf
   ---------------- --------------------------
       scenario      sum_udf   mean_multi_udf
   ================ ========= ================
    few_groups_sm     471M         470M
    few_groups_lg     535M         533M
    many_groups_sm    501M         499M
    many_groups_lg    606M         604M
      wide_cols       587M         585M
   ================ ========= ================
   ```
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-56658][PYTHON][TESTS] Add ASV microbenchmark for SQL_WINDOW_AGG_PANDAS_UDF [spark]

Reply via email to