yaooqinn opened a new pull request, #55988:
URL: https://github.com/apache/spark/pull/55988
### What changes were proposed in this pull request?
Switch hardcoded `.set(...)` to `.setIfMissing(...)` for three SparkConf
keys in `TPCDSQueryBenchmark`:
- `spark.driver.memory`
- `spark.executor.memory`
- `spark.sql.autoBroadcastJoinThreshold`
Also unify `spark.sql.shuffle.partitions` to use `setIfMissing` for
consistency (functionally equivalent to the existing
`System.getProperty` form).
### Why are the changes needed?
`.set(...)` overrides any `-Dspark.*` JVM property, so users can't
tune heap/broadcast threshold without editing source. At SF10 / SF100
the hardcoded 3g heap OOMs. `spark.sql.shuffle.partitions` already
supported override in the same file — this extends the same pattern
to the remaining three keys.
### Does this PR introduce _any_ user-facing change?
No. Defaults unchanged.
### How was this patch tested?
Verified locally that `-Dspark.driver.memory=72g` (etc.) flow through
to the SparkConf when launched via:
```
build/sbt -Dspark.driver.memory=72g \
-Dspark.executor.memory=72g \
-Dspark.sql.autoBroadcastJoinThreshold=10485760 \
-Dspark.sql.shuffle.partitions=512 \
"sql/Test/runMain ...TPCDSQueryBenchmark --data-location ..."
```
Without these flags, defaults remain `3g / 3g / 20MB / 4`.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]