viirya opened a new pull request, #56470: URL: https://github.com/apache/spark/pull/56470
### What changes were proposed in this pull request? Follow-up to SPARK-57137/SPARK-57138. Apply the two conventions those PRs established to the remaining mixins in `python/benchmarks/bench_eval_type.py`: 1. Convert the seven remaining `@staticmethod` `_build_scenario` definitions to `@classmethod`, replacing the hardcoded `_XBenchMixin._scenario_configs[name]` lookups with `cls._scenario_configs[name]` (`_ArrowBatchedBenchMixin`, `_ArrowUDTFBenchMixin`, `_ArrowTableUDFBenchMixin`, `_GroupedMapArrowBenchMixin`, `_GroupedMapPandasBenchMixin`, `_MapArrowIterBenchMixin`, `_MapPandasIterBenchMixin`). 2. Add the `_eval_type` class attribute to the GroupedMap Arrow/Pandas bases and parameterize their `_write_scenario` on `self._eval_type`. This lets `_GroupedMapArrowIterBenchMixin` and `_GroupedMapPandasIterBenchMixin` drop their `_write_scenario` overrides, which were copies of the parent bodies differing only in the `PythonEvalType.SQL_...` constant. Net diff: +27 / -53 lines. ### Why are the changes needed? The hardcoded class-name lookup is a footgun: if a subclass overrides `_scenario_configs`, the inherited static `_build_scenario` silently reads the parent's configs instead. This is the exact failure mode the already-deduplicated pairs (Scalar, GroupedAgg, Window, Cogroup) avoid by resolving configs via `cls`. The two GroupedMap Iter subclasses also carried near-identical copies of `_write_scenario`, so any protocol-writing change had to be applied in lock-step across parent and child. After this change every `_build_scenario` in the file resolves configs via `cls` and no redundant `_write_scenario` copies remain. ### Does this PR introduce _any_ user-facing change? No. Test-only change in the benchmark module. ### How was this patch tested? - Verified the generated worker input is byte-identical to the pre-refactor output for all 20 `*TimeBench` classes across every (scenario, udf) cell, with the cloudpickled UDF command excluded (it embeds `co_firstlineno` and a random cloudpickle class-tracker id, both location/identity metadata unaffected by this change). - Compared the pickle opcode streams of all 60 pickled UDF/UDTF callables between old and new: the only differences are the metadata above; code, constants, and names are unchanged. - Ran `setup` + `time_worker` end-to-end for all 9 affected `*TimeBench` classes across every UDF, and `peakmem_worker` for the two Iter classes that dropped `_write_scenario`. ### Was this patch authored or co-authored using generative AI tooling? Yes. Generated-by: Claude Code (claude-opus-4-8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
