c21 commented on a change in pull request #33182:
URL: https://github.com/apache/spark/pull/33182#discussion_r662854713



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -419,6 +419,15 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val FORCE_APPLY_SHUFFLEDHASHJOIN = 
buildConf("spark.sql.join.forceApplyShuffledHashJoin")
+    .internal()
+    .doc("When true, force applying shuffled hash join even if the table sizes 
exceed the " +
+      "threshold. This is for testing/benchmarking only. If this config is set 
to true, the " +
+      "value spark.sql.join.perferSortMergejoin will be ignored.")

Review comment:
       nit: `PREFER_SORTMERGEJOIN.key` instead of 
`spark.sql.join.perferSortMergejoin`.

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
##########
@@ -272,14 +272,14 @@ trait JoinSelectionHelper {
     val buildLeft = if (hintOnly) {
       hintToShuffleHashJoinLeft(hint)
     } else {
-      hintToPreferShuffleHashJoinLeft(hint) ||
+      hintToPreferShuffleHashJoinLeft(hint) || conf.forceApplyShuffledHashJoin 
||

Review comment:
       I think we don't want user to use this config, and this should be only 
taking effect in testing right? Should we add condition e.g. `Utils.isTesting`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -419,6 +419,15 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val FORCE_APPLY_SHUFFLEDHASHJOIN = 
buildConf("spark.sql.join.forceApplyShuffledHashJoin")
+    .internal()
+    .doc("When true, force applying shuffled hash join even if the table sizes 
exceed the " +
+      "threshold. This is for testing/benchmarking only. If this config is set 
to true, the " +
+      "value spark.sql.join.perferSortMergejoin will be ignored.")
+    .version("3.2.0")

Review comment:
       nit: we are on `3.3.0` now I think?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to