Yicong-Huang opened a new pull request, #55704: URL: https://github.com/apache/spark/pull/55704
### What changes were proposed in this pull request? Backport of #55691 to `branch-4.x`. This PR adds ASV microbenchmarks for `SQL_ARROW_UDTF` (PyArrow-native Python UDTFs created via `arrow_udtf`) to `python/benchmarks/bench_eval_type.py`. The new `_ArrowUDTFBenchMixin` produces `ArrowUDTFTimeBench` and `ArrowUDTFPeakmemBench`, parametrized by scenario (batch size, column count, type pool) and three handler variants: - `identity_udtf` - yields the input batch as a `pa.Table` - `filter_udtf` - keeps rows whose first column is non-null (vectorized) - `count_udtf` - aggregates each batch into a single-row count table To support this, two helpers are added to `MockProtocolWriter`: - `write_worker_input` gains an optional `eval_conf` parameter (UDTF needs `table_arg_offsets` in EvalConf) - `write_arrow_udtf_payload` mirrors `PythonUDTFRunner.writeUDTF` on the JVM side: argument offsets, partition-child indexes, optional pickled `AnalyzeResult`, the cloudpickled UDTF class, the result schema, and the UDTF name The wire batch carries one struct column `_0` whose fields are the table's schema; `table_arg_offsets=[0]` tells `ArrowStreamArrowUDTFSerializer` to flatten that struct into a `pa.RecordBatch` for the UDTF's `eval(batch)` method. ### Why are the changes needed? This is part of [SPARK-55724](https://issues.apache.org/jira/browse/SPARK-55724) (Micro-benchmark PySpark Eval Types). Establishing a stable baseline for `SQL_ARROW_UDTF` is a prerequisite for the upcoming serializer refactor so we can detect any regression objectively. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Clean cherry-pick from master (commit 257c99dc750). New ASV microbenchmarks; same as the master PR. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
