cloud-fan commented on pull request #32683: URL: https://github.com/apache/spark/pull/32683#issuecomment-850199669
Another concern is the test coverage of shuffle hash join, as it's disabled by default. Can we add a testing config to forcibly apply shuffle hash join, and then test different join algorithms in golden file tests? e.g., in `outer-join.sql`, we have ``` --CONFIG_DIM1 spark.sql.autoBroadcastJoinThreshold=10485760 --CONFIG_DIM1 spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true --CONFIG_DIM1 spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false ``` But `preferSortMergeJoin=false` doesn't guarantee to use shuffle hash join, as we still have conditions like streamside must be 3 times larger than build side. The same applies to the TPCDS result checking, and by default we probably only test the broadcast join, cc @maropu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
