c21 commented on a change in pull request #33182: URL: https://github.com/apache/spark/pull/33182#discussion_r662854713
########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ########## @@ -419,6 +419,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val FORCE_APPLY_SHUFFLEDHASHJOIN = buildConf("spark.sql.join.forceApplyShuffledHashJoin") + .internal() + .doc("When true, force applying shuffled hash join even if the table sizes exceed the " + + "threshold. This is for testing/benchmarking only. If this config is set to true, the " + + "value spark.sql.join.perferSortMergejoin will be ignored.") Review comment: nit: `PREFER_SORTMERGEJOIN.key` instead of `spark.sql.join.perferSortMergejoin`. ########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala ########## @@ -272,14 +272,14 @@ trait JoinSelectionHelper { val buildLeft = if (hintOnly) { hintToShuffleHashJoinLeft(hint) } else { - hintToPreferShuffleHashJoinLeft(hint) || + hintToPreferShuffleHashJoinLeft(hint) || conf.forceApplyShuffledHashJoin || Review comment: I think we don't want user to use this config, and this should be only taking effect in testing right? Should we add condition e.g. `Utils.isTesting`? ########## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ########## @@ -419,6 +419,15 @@ object SQLConf { .booleanConf .createWithDefault(true) + val FORCE_APPLY_SHUFFLEDHASHJOIN = buildConf("spark.sql.join.forceApplyShuffledHashJoin") + .internal() + .doc("When true, force applying shuffled hash join even if the table sizes exceed the " + + "threshold. This is for testing/benchmarking only. If this config is set to true, the " + + "value spark.sql.join.perferSortMergejoin will be ignored.") + .version("3.2.0") Review comment: nit: we are on `3.3.0` now I think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org