iemejia opened a new pull request, #56479: URL: https://github.com/apache/spark/pull/56479
### What changes were proposed in this pull request? Two improvements to the benchmark workflow: 1. **Skip TPC-DS data generation for non-TPCDS benchmarks.** Change `contains(inputs.class, '*')` to `inputs.class == '*'` so wildcard patterns like `*VectorizedDeltaReaderBenchmark` no longer trigger the expensive TPC-DS generation job (~5-10 min saved per run). 2. **Add early CPU model check step** that runs immediately after checkout, before compilation. Prints the CPU as a `::notice::` annotation for live visibility in the Actions UI, and optionally fails fast if the runner CPU does not match the `expected-cpu` input parameter. ### Why are the changes needed? The benchmark workflow currently generates TPC-DS data (~5-10 min) for every benchmark run, even when the benchmark class does not use TPC-DS data. This is because `contains(inputs.class, '*')` matches any class with a wildcard (e.g., `*VectorizedDeltaReaderBenchmark`), not just the literal `*` (all benchmarks). Additionally, when benchmark results need to match a specific CPU (e.g., AMD EPYC 7763 for consistent comparisons against upstream baselines), there is no way to detect a CPU mismatch until the full benchmark completes (~20-30 min). The early CPU check allows the job to fail within seconds of starting if the runner does not match, saving significant time and compute. ### Does this PR introduce _any_ user-facing change? No. This only affects the GHA benchmark workflow. Existing behavior is preserved when `expected-cpu` is not set (default). ### How was this patch tested? The workflow changes are self-contained in `.github/workflows/benchmark.yml`. Tested by inspection. The `expected-cpu` parameter is optional and defaults to empty (no check), preserving backward compatibility. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: OpenCode (Claude claude-opus-4.6) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
