iemejia commented on code in PR #56479:
URL: https://github.com/apache/spark/pull/56479#discussion_r3430871736
##########
.github/workflows/benchmark.yml:
##########
@@ -73,7 +78,10 @@ jobs:
# Any TPC-DS related updates on this job need to be applied to tpcds-1g job
of build_and_test.yml as well
tpcds-1g-gen:
name: "Generate an TPC-DS dataset with SF=1"
- if: contains(inputs.class, 'TPCDSQueryBenchmark') ||
contains(inputs.class, 'LZ4TPCDSDataBenchmark') || contains(inputs.class,
'ZStandardTPCDSDataBenchmark') || contains(inputs.class, '*')
+ # Only generate TPC-DS data when running TPC-DS benchmarks or all
benchmarks (class == '*').
+ # Use exact equality instead of contains(inputs.class, '*') to avoid
matching wildcard
+ # patterns like '*VectorizedDeltaReaderBenchmark' that don't need TPC-DS
data.
+ if: contains(inputs.class, 'TPCDSQueryBenchmark') ||
contains(inputs.class, 'LZ4TPCDSDataBenchmark') || contains(inputs.class,
'ZStandardTPCDSDataBenchmark') || inputs.class == '*'
Review Comment:
Good catch! You're right -- a glob like `*TPCDS*` would have been caught by
the old `contains(inputs.class, '*')` but not by the per-class checks.
Simplified the condition to `contains(inputs.class, 'TPCDS') || inputs.class
== '*'` which catches all three TPC-DS benchmark class names (they all contain
"TPCDS") as well as any glob that mentions TPCDS.
For truly generic package-level globs like
`org.apache.spark.sql.execution.benchmark.*` that don't mention TPCDS, those
are effectively equivalent to `*` and users should use the default instead.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]