[GitHub] [spark] sunchao commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

via GitHub Fri, 08 Sep 2023 08:45:56 -0700


sunchao commented on code in PR #42306:
URL: https://github.com/apache/spark/pull/42306#discussion_r1320031556



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala:
##########
@@ -672,9 +708,17 @@ case class HashShuffleSpec(
   override def numPartitions: Int = partitioning.numPartitions
 }
 
+/**
+ * [[ShuffleSpec]] created by [[KeyGroupedPartitioning]].
+ * @param partitioning key grouped partitioning

Review Comment:
   nit: leave an empty line above the first `@param`



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala:
##########
@@ -1530,6 +1530,18 @@ object SQLConf {
       .booleanConf
       .createWithDefault(false)
 
+  val V2_BUCKETING_ALLOW_JOIN_KEYS_SUBSET_OF_PARTITION_KEYS =
+    
buildConf("spark.sql.sources.v2.bucketing.allowJoinKeysSubsetOfPartitionKeys.enabled")
+      .doc("Whether to allow storage-partition join in the case where join 
keys are" +
+        "a subset of the partition keys of the source tables.  At planning 
time, " +
+        "Spark will group the partitions by only those keys that are in the 
join keys." +
+        "This is currently enabled only if 
spark.sql.requireAllClusterKeysForDistribution " +

Review Comment:
   nit: replace with `spark.sql.requireAllClusterKeysForDistribution` 
`${REQUIRE_ALL_CLUSTER_KEYS_FOR_DISTRIBUTION.key}` (and add `s` to the 
beginning of this line)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a diff in pull request #42306: [SPARK-44647][SQL] Support SPJ where join keys are less than cluster keys

Reply via email to