cloud-fan commented on pull request #32683:
URL: https://github.com/apache/spark/pull/32683#issuecomment-850199669


   Another concern is the test coverage of shuffle hash join, as it's disabled 
by default.
   
   Can we add a testing config to forcibly apply shuffle hash join, and then 
test different join algorithms in golden file tests? e.g., in `outer-join.sql`, 
we have
   ```
   --CONFIG_DIM1 spark.sql.autoBroadcastJoinThreshold=10485760
   --CONFIG_DIM1 
spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=true
   --CONFIG_DIM1 
spark.sql.autoBroadcastJoinThreshold=-1,spark.sql.join.preferSortMergeJoin=false
   ```
   
   But `preferSortMergeJoin=false` doesn't guarantee to use shuffle hash join, 
as we still have conditions like streamside must be 3 times larger than build 
side.
   
   The same applies to the TPCDS result checking, and by default we probably 
only test the broadcast join, cc @maropu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to