[PR] [SPARK-56948][SQL][TESTS] Make TPCDSQueryBenchmark heap/broadcast configurable [spark]

via GitHub Tue, 19 May 2026 09:36:57 -0700


yaooqinn opened a new pull request, #55988:
URL: https://github.com/apache/spark/pull/55988


   ### What changes were proposed in this pull request?
   
   Switch hardcoded `.set(...)` to `.setIfMissing(...)` for three SparkConf
   keys in `TPCDSQueryBenchmark`:
   
   - `spark.driver.memory`
   - `spark.executor.memory`
   - `spark.sql.autoBroadcastJoinThreshold`
   
   Also unify `spark.sql.shuffle.partitions` to use `setIfMissing` for
   consistency (functionally equivalent to the existing
   `System.getProperty` form).
   
   ### Why are the changes needed?
   
   `.set(...)` overrides any `-Dspark.*` JVM property, so users can't
   tune heap/broadcast threshold without editing source. At SF10 / SF100
   the hardcoded 3g heap OOMs. `spark.sql.shuffle.partitions` already
   supported override in the same file — this extends the same pattern
   to the remaining three keys.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. Defaults unchanged.
   
   ### How was this patch tested?
   
   Verified locally that `-Dspark.driver.memory=72g` (etc.) flow through
   to the SparkConf when launched via:
   
   ```
   build/sbt -Dspark.driver.memory=72g \
             -Dspark.executor.memory=72g \
             -Dspark.sql.autoBroadcastJoinThreshold=10485760 \
             -Dspark.sql.shuffle.partitions=512 \
             "sql/Test/runMain ...TPCDSQueryBenchmark --data-location ..."
   ```
   
   Without these flags, defaults remain `3g / 3g / 20MB / 4`.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.7
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-56948][SQL][TESTS] Make TPCDSQueryBenchmark heap/broadcast configurable [spark]

Reply via email to