leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r458150987



##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##########
@@ -471,6 +518,8 @@ case class BroadcastHashJoinExec(
          |    }
          |  }
          |}
+         |// special case for NullAwareAntiJoin, if anyNull in streamedRow, 
row should be dropped.
+         |${ if (isNullAwareAntiJoin) s"else { $found = true; }" else ""}

Review comment:
       > why do we need the change? We return earlier if isNullAwareAntiJoin is 
true.
   
   yes we need this change, there are five code path where isNullAwareAntiJoin 
= true
   
   1. inputEmpty = true => return all rows
   2. hashedRelation anyNullKeyExists = true => return no rows
   3. streamedRow is null, drop that row
   4. streamedRow is not null, found match in hashedRelation, drop that row
   5. streamedRow is not null, found no match in hashedRelation, keep that row
   
   so that we have to iterate the streamedSide, check every row on condition 
3~5.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to