Cheng Su created SPARK-34729: -------------------------------- Summary: Faster execution for broadcast nested loop join (left semi/anti with no condition) Key: SPARK-34729 URL: https://issues.apache.org/jira/browse/SPARK-34729 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Cheng Su
For `BroadcastNestedLoopJoinExec` left semi and left anti join without condition. If we broadcast left side. Currently we check whether every row from broadcast side has a match or not by iterating broadcast side a lot of time - [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala#L256-L275] . This is unnecessary, as there's no condition, and we only need to check whether stream side is empty or not. Create this Jira to add the optimization. This can boost the affected query execution performance a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org