[GitHub] [spark] c21 commented on a change in pull request #31630: [SPARK-34514][SQL] Push down limit for LEFT SEMI and LEFT ANTI join

GitBox Wed, 24 Feb 2021 01:02:55 -0800


c21 commented on a change in pull request #31630:
URL: https://github.com/apache/spark/pull/31630#discussion_r581774000




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -555,6 +559,8 @@ object LimitPushDown extends Rule[LogicalPlan] {
           join.copy(
             left = maybePushLocalLimit(exp, left),
             right = maybePushLocalLimit(exp, right))
+        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
+          join.copy(left = maybePushLocalLimit(exp, left))

Review comment:
       @maropu - this actually reminds me whether we can further optimize 
during runtime, and I found I already did it for LEFT SEMI with AQE - 
https://github.com/apache/spark/pull/29484 . Similarly for LEFT ANTI join 
without condition, we can convert join logical plan node to an empty relation 
if right build side is not empty. Will submit a followup PR tomorrow.
   
   In addition, after taking a deep look at `BroadcastNestedLoopJoinExec` 
(never looked closely to that because it's not popular in our environment), I 
found many places that we can optimize:
   * populate `outputOrdering` and `outputPartitioning` when possible to avoid 
shuffle/sort in later stage.
   * shortcut for `LEFT SEMI/ANTI` in `defaultJoin()` as we don't need to look 
through all rows when there's no join condition.
   * code-gen the operator.
   
   I will file an umbrella JIRA with minor priority and do it gradually.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #31630: [SPARK-34514][SQL] Push down limit for LEFT SEMI and LEFT ANTI join

Reply via email to