[PR] [SPARK-54631][PYTHON] Add profiler support for Arrow Grouped Iter Aggregate UDF [spark]

via GitHub Sat, 06 Dec 2025 23:04:46 -0800


Yicong-Huang opened a new pull request, #53374:
URL: https://github.com/apache/spark/pull/53374


   ### What changes were proposed in this pull request?
   
   - `spark.python.profile`: Add `SQL_GROUPED_AGG_ARROW_ITER_UDF` to the 
profiler warning list in `udf.py` so that when `spark.python.profile` is 
enabled, users will see appropriate warnings consistent with other 
iterator-based UDFs.
   - `spark.sql.pyspark.udf.profiler`: No changes needed. This UDF type already 
works correctly because it returns scalar (not iterator), so it uses the 
non-iterator profiler branch in `wrap_perf_profiler` and `wrap_memory_profiler`.
   
   ### Why are the changes needed?
   
   To make profilers support for `SQL_GROUPED_AGG_ARROW_ITER_UDF` consistent 
with other UDFs.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. When users enable `spark.python.profile` with 
`SQL_GROUPED_AGG_ARROW_ITER_UDF`, they will now see a warning message 
consistent with other iterator-based UDFs.
   
   ### How was this patch tested?
   
   Added a test case `test_perf_profiler_arrow_udf_grouped_agg_iter` to verify 
that `spark.sql.pyspark.udf.profiler` works correctly with this UDF type. Also 
verified that the `spark.python.profile` profiler warning is triggered 
correctly in `test_unsupported`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-54631][PYTHON] Add profiler support for Arrow Grouped Iter Aggregate UDF [spark]

Reply via email to