[GitHub] [spark] viirya commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

GitBox Sat, 18 Jul 2020 16:52:05 -0700


viirya commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r456839808




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2651,12 +2651,13 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED =
-    buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.enabled")
+  val COALESCE_BUCKETS_IN_JOIN_ENABLED =
+    buildConf("spark.sql.bucketing.coalesceBucketsInJoin.enabled")
       .doc("When true, if two bucketed tables with the different number of 
buckets are joined, " +
         "the side with a bigger number of buckets will be coalesced to have 
the same number " +
-        "of buckets as the other side. Bucket coalescing is applied only to 
sort-merge joins " +
-        "and only when the bigger number of buckets is divisible by the 
smaller number of buckets.")
+        "of buckets as the other side. Bigger number of buckets is divisible 
by the smaller " +
+        "number of buckets. Bucket coalescing is applied to sort-merge joins 
and " +
+        "shuffled hash join.")

Review comment:
       Can we add more doc like "Coalescing bucketed table can avoid 
unnecessary shuffling during joining but it also reduces parallelism and could 
possibly cause OOM  for shuffled hash join"?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

Reply via email to