cloud-fan commented on a change in pull request #28123:
URL: https://github.com/apache/spark/pull/28123#discussion_r438811200
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2595,6 +2595,26 @@ object SQLConf {
.checkValue(_ > 0, "The timeout value must be positive")
.createWithDefault(10L)
+ val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED =
+ buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.enabled")
+ .doc("When true, if two bucketed tables with the different number of
buckets are joined, " +
+ "the side with a bigger number of buckets will be coalesced to have
the same number " +
+ "of buckets as the other side. Bucket coalescing is applied only to
sort-merge joins " +
+ "and only when the bigger number of buckets is divisible by the
smaller number of buckets.")
+ .version("3.1.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO =
+
buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.maxBucketRatio")
+ .doc("The ratio of the number of two buckets being coalesced should be
less than or " +
+ "equal to this value for bucket coalescing to be applied. This
configuration only " +
+ s"has an effect when
'${COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED.key}' is set to true.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(_ > 0, "The difference must be positive.")
+ .createWithDefault(10)
Review comment:
I don't know what's the best default value, but seems better to pick 2 ^
n
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]