wangyum opened a new pull request #33288: URL: https://github.com/apache/spark/pull/33288
### What changes were proposed in this pull request? Add a new optimize rule(`BroadcastJoinOuterJoinStreamSide`) to add a BHJ ahead of SMJ for left outer/semi/anti join if stream side can build broadcast and it is much smaller that other side. A real case from our cluster: ```sql SELECT * FROM t1 c LEFT JOIN t2 b ON substring(b.extrnl_rfrnc_key,(instr(b.extrnl_rfrnc_key,'!')+1),char_length(b.extrnl_rfrnc_key))=c.exec_rsrc_ref_id WHERE c.prcsr_trxn_id = 3415882487483039; ``` Before this PR | After this PR -- | --  |  How to disable this rule: ```sql set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.BroadcastJoinOuterJoinStreamSide; ``` ### Why are the changes needed? Improve query performance if left outer/semi/anti join if it's left side is very small and right side is very large. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
