Yicong-Huang opened a new pull request, #55673:
URL: https://github.com/apache/spark/pull/55673

   ### What changes were proposed in this pull request?
   
   Add an ASV micro-benchmark for `SQL_ARROW_TABLE_UDF` (Python UDTF with 
`useArrow=True`) eval type to `bench_eval_type.py`.
   
   The new benchmark drives the worker through the UDTF wire protocol (separate 
from `SQL_*_UDF`: no `num_udfs`/`num_chained`/`result_id`; instead 
`num_partition_child_indexes`, optional pickled `AnalyzeResult`, the handler 
class, return-type JSON, and udtf name). It also threads `input_type` through 
`EvalConf` so the non-legacy Arrow code path is exercised.
   
   Supporting changes in `MockProtocolWriter`:
   - `write_worker_input` accepts an optional `eval_conf` dict alongside 
`runner_conf`.
   - New `write_udtf_payload` for the UDTF-specific command frame.
   
   UDTFs covered: `identity_udtf` (1->1), `explode_udtf` (1->3), `filter_udtf` 
(1->0/1), `stringify_udtf` (1->1, type change). Scenarios mirror the row-by-row 
sizing of `SQL_ARROW_BATCHED_UDF`.
   
   ### Why are the changes needed?
   
   Part of [SPARK-55724](https://issues.apache.org/jira/browse/SPARK-55724). 
Establishes a performance baseline before refactoring `SQL_ARROW_TABLE_UDF`.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   `COLUMNS=120 ./python/asv run --python=same --bench "ArrowTableUDF" -a 
repeat=3` (one of two stable runs):
   
   `ArrowTableUDFTimeBench`:
   \`\`\`text
   ================== ============== ============== ============== 
===============
   --                                              udtf
   ------------------ 
-------------------------------------------------------------
        scenario      identity_udtf  explode_udtf   filter_udtf    
stringify_udtf
   ================== ============== ============== ============== 
===============
    sm_batch_few_col   1.66+/-0.4s    1.62+/-0.06s   1.29+/-0.07s   1.63+/-0.08s
    sm_batch_many_col  712+/-200ms    719+/-200ms    537+/-100ms    522+/-9ms
    lg_batch_few_col   3.91+/-0.01s   3.94+/-0.02s   3.84+/-0.8s    4.40+/-0.3s
    lg_batch_many_col  2.09+/-0.02s   2.12+/-0.02s   1.72+/-0.01s   2.06+/-0.01s
    pure_ints          3.98+/-0.02s   4.06+/-0.06s   3.14+/-0.03s   3.97+/-0.01s
    pure_strings       4.10+/-0.02s   4.17+/-0s      3.27+/-0s      4.07+/-0.01s
   ================== ============== ============== ============== 
===============
   \`\`\`
   
   `ArrowTableUDFPeakmemBench`:
   \`\`\`text
   ================== ============== ============== ============= 
===============
   --                                              udtf
   ------------------ 
------------------------------------------------------------
        scenario      identity_udtf  explode_udtf   filter_udtf   stringify_udtf
   ================== ============== ============== ============= 
===============
    sm_batch_few_col       470M           470M          469M           470M
    sm_batch_many_col      470M           470M          470M           471M
    lg_batch_few_col       479M           480M          478M           480M
    lg_batch_many_col      511M           511M          511M           512M
    pure_ints              479M           479M          478M           480M
    pure_strings           484M           484M          484M           484M
   ================== ============== ============== ============= 
===============
   \`\`\`
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to