c21 commented on a change in pull request #30280:
URL: https://github.com/apache/spark/pull/30280#discussion_r518928747
##########
File path:
sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-joins.sql
##########
@@ -6,8 +6,8 @@
-- 2. run with whole-stage-codegen, operator codegen or no codegen.
--CONFIG_DIM1 spark.sql.autoBroadcastJoinThreshold=10485760
---CONFIG_DIM1
spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true
---CONFIG_DIM1
spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false
+--CONFIG_DIM1
spark.sql.autoBroadcastJoinThreshold=10485760,spark.sql.join.preferSortMergeJoin=true
+--CONFIG_DIM1
spark.sql.autoBroadcastJoinThreshold=10485760,spark.sql.join.preferSortMergeJoin=false
Review comment:
@warrenzhu25 - Shuffled hash join will only be enabled with proper
config value for `spark.sql.autoBroadcastJoinThreshold` and
`spark.sql.shuffle.partitions`, and one side should be 3x smaller compared to
the other side
([code](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L368-L381)).
I don't think test cases here satisfy the second condition (one side 3x
smaller than the other side). Can you double check the query plan? Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]