maropu commented on a change in pull request #28123: [SPARK-31350][SQL]
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407806862
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2574,6 +2574,27 @@ object SQLConf {
.booleanConf
.createWithDefault(false)
+ val COALESCE_BUCKET_IN_JOIN_ENABLED =
+ buildConf("spark.sql.bucketing.coalesceBucketInJoin.enabled")
+ .internal()
+ .doc("When true, if two bucketed tables with a different number of
buckets are joined, " +
+ "the side with a bigger number of buckets will be coalesced to have
the same number " +
+ "of buckets as the other side. This bucket coalescing can happen only
when the bigger " +
+ "number of buckets is divisible by the smaller number of buckets.")
+ .version("3.1.0")
+ .booleanConf
+ .createWithDefault(false)
+
+ val COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF =
+ buildConf("spark.sql.bucketing.coalesceBucketInJoin.maxNumBucketsDiff")
+ .doc("The difference in count of two buckets being coalesced should be
less than or " +
+ "equal to this value for bucket coalescing to be applied. This
configuration only " +
+ s"has an effect when '${COALESCE_BUCKET_IN_JOIN_ENABLED.key}' is set
to true.")
+ .version("3.1.0")
+ .intConf
+ .checkValue(_ > 0, "The minimum number of partitions must be positive.")
+ .createWithDefault(256)
Review comment:
just a question; how did you decide this number, 256?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]