c21 commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r455969279
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2659,12 +2660,24 @@ object SQLConf {
buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.maxBucketRatio")
.doc("The ratio of the number of two buckets being coalesced should be
less than or " +
"equal to this value for bucket coalescing to be applied. This
configuration only " +
- s"has an effect when
'${COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key}' is set to true.")
+ s"has an effect when '${COALESCE_BUCKETS_IN_JOIN_ENABLED.key}' is set
to true.")
.version("3.1.0")
.intConf
.checkValue(_ > 0, "The difference must be positive.")
.createWithDefault(4)
+ val COALESCE_BUCKETS_IN_SHUFFLED_HASH_JOIN_MAX_BUCKET_RATIO =
+
buildConf("spark.sql.bucketing.coalesceBucketsInShuffledHashJoin.maxBucketRatio")
+ .doc("The ratio of the number of two buckets being coalesced should be
less than or " +
+ "equal to this value for bucket coalescing to be applied. This
configuration only " +
+ s"has an effect when '${COALESCE_BUCKETS_IN_JOIN_ENABLED.key}' is set
to true. " +
+ "Note as coalescing reduces parallelism, there might be a higher risk
for " +
+ "out of memory error at shuffled hash join build side.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(_ > 0, "The difference must be positive.")
+ .createWithDefault(2)
Review comment:
@cloud-fan - I feel it's not very necessary for most of cases, but I am
fine either way. The only cases I can think it to be useful is:
(1).if we want to enable coalesce bucketed tables by default in the future.
We probably want to have a separate configs to be more cautious about shuffled
hash join, as it potentially can bring more OOM on build side.
(2).user has one complicated query involved shuffled hash join and sort
merge join on bucketed tables, and they want to tune coalescing for each join
separately.
@maropu - wondering what do you think? keep them separately or not?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]