cloud-fan commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r459907233



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##########
@@ -454,6 +491,48 @@ case class BroadcastHashJoinExec(
     val (matched, checkCondition, _) = getJoinCondition(ctx, input)
     val numOutput = metricTerm(ctx, "numOutputRows")
 
+    // fast stop if isOriginalInputEmpty = true, should accept all rows in 
streamedSide
+    if (broadcastRelation.value.isOriginalInputEmpty) {
+      return s"""
+                |// Anti Join isOriginalInputEmpty(true) accept all
+                |$numOutput.add(1);
+                |${consume(ctx, input)}
+          """.stripMargin
+    }
+
+    if (isNullAwareAntiJoin) {
+      if (broadcastRelation.value.allNullColumnKeyExistsInOriginalInput) {
+        return s"""
+                  |// NAAJ

Review comment:
       It's hard to do early stop with whole-stage-codegen. 
`CodegenSupport.limitNotReachedChecks` is an example about how to do it for the 
limit operator, which needs early stop as well.
   
   I'm not sure if it's worth to amend the whole-stage-codegen framework for 
NAAJ. One choice is to optimize it in AQE: when we find the build side of NAAJ 
is empty, replace the join node with the stream side plan.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to