Yicong-Huang opened a new pull request, #55673: URL: https://github.com/apache/spark/pull/55673
### What changes were proposed in this pull request? Add an ASV micro-benchmark for `SQL_ARROW_TABLE_UDF` (Python UDTF with `useArrow=True`) eval type to `bench_eval_type.py`. The new benchmark drives the worker through the UDTF wire protocol (separate from `SQL_*_UDF`: no `num_udfs`/`num_chained`/`result_id`; instead `num_partition_child_indexes`, optional pickled `AnalyzeResult`, the handler class, return-type JSON, and udtf name). It also threads `input_type` through `EvalConf` so the non-legacy Arrow code path is exercised. Supporting changes in `MockProtocolWriter`: - `write_worker_input` accepts an optional `eval_conf` dict alongside `runner_conf`. - New `write_udtf_payload` for the UDTF-specific command frame. UDTFs covered: `identity_udtf` (1->1), `explode_udtf` (1->3), `filter_udtf` (1->0/1), `stringify_udtf` (1->1, type change). Scenarios mirror the row-by-row sizing of `SQL_ARROW_BATCHED_UDF`. ### Why are the changes needed? Part of [SPARK-55724](https://issues.apache.org/jira/browse/SPARK-55724). Establishes a performance baseline before refactoring `SQL_ARROW_TABLE_UDF`. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? `COLUMNS=120 ./python/asv run --python=same --bench "ArrowTableUDF" -a repeat=3` (one of two stable runs): `ArrowTableUDFTimeBench`: \`\`\`text ================== ============== ============== ============== =============== -- udtf ------------------ ------------------------------------------------------------- scenario identity_udtf explode_udtf filter_udtf stringify_udtf ================== ============== ============== ============== =============== sm_batch_few_col 1.66+/-0.4s 1.62+/-0.06s 1.29+/-0.07s 1.63+/-0.08s sm_batch_many_col 712+/-200ms 719+/-200ms 537+/-100ms 522+/-9ms lg_batch_few_col 3.91+/-0.01s 3.94+/-0.02s 3.84+/-0.8s 4.40+/-0.3s lg_batch_many_col 2.09+/-0.02s 2.12+/-0.02s 1.72+/-0.01s 2.06+/-0.01s pure_ints 3.98+/-0.02s 4.06+/-0.06s 3.14+/-0.03s 3.97+/-0.01s pure_strings 4.10+/-0.02s 4.17+/-0s 3.27+/-0s 4.07+/-0.01s ================== ============== ============== ============== =============== \`\`\` `ArrowTableUDFPeakmemBench`: \`\`\`text ================== ============== ============== ============= =============== -- udtf ------------------ ------------------------------------------------------------ scenario identity_udtf explode_udtf filter_udtf stringify_udtf ================== ============== ============== ============= =============== sm_batch_few_col 470M 470M 469M 470M sm_batch_many_col 470M 470M 470M 471M lg_batch_few_col 479M 480M 478M 480M lg_batch_many_col 511M 511M 511M 512M pure_ints 479M 479M 478M 480M pure_strings 484M 484M 484M 484M ================== ============== ============== ============= =============== \`\`\` ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
