Re: [PR] [SPARK-57420][INFRA] Only generate TPC-DS data when required and check CPU compatibility early in benchmark workflow [spark]

via GitHub Fri, 12 Jun 2026 11:59:02 -0700


iemejia commented on PR #56479:
URL: https://github.com/apache/spark/pull/56479#issuecomment-4694353019


   @LuciferYang Would you mind taking a look at this one when you get a chance? 
While working on the Parquet encoding benchmarks, I noticed the workflow was 
spending ~5-10 min generating TPC-DS data on every run even when the benchmark 
does not use it (because `contains(inputs.class, '*')` matches any wildcard 
pattern, not just the literal `*`).
   
   I also kept having to wait for full 20-30 min runs to complete only to 
discover the runner landed on the wrong CPU. For that, I added an optional 
`expected-cpu` input parameter that detects the runner CPU immediately after 
checkout and fails the job within seconds if it does not match -- so you do not 
waste the entire compilation + benchmark time before finding out.
   
   These two small fixes should save a lot of time for anyone using the 
benchmark workflow with specific class patterns and CPU-sensitive comparisons. 
Happy to adjust anything if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-57420][INFRA] Only generate TPC-DS data when required and check CPU compatibility early in benchmark workflow [spark]

Reply via email to