leanken commented on a change in pull request #29104:
URL: https://github.com/apache/spark/pull/29104#discussion_r459050176
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##########
@@ -454,6 +478,28 @@ case class BroadcastHashJoinExec(
val (matched, checkCondition, _) = getJoinCondition(ctx, input)
val numOutput = metricTerm(ctx, "numOutputRows")
+ if (isNullAwareAntiJoin) {
+ require(leftKeys.length == 1, "leftKeys length should be 1")
+ require(rightKeys.length == 1, "rightKeys length should be 1")
+ require(joinType == LeftAnti, "joinType must be LeftAnti.")
+ require(buildSide == BuildRight, "buildSide must be BuildRight.")
+ require(SQLConf.get.nullAwareAntiJoinOptimizeEnabled,
+ "nullAwareAntiJoinOptimizeEnabled must be on for null aware anti join
optimize.")
+ require(checkCondition == "", "null aware anti join optimize condition
should be empty.")
+
+ if (broadcastRelation.value.inputEmpty) {
+ return s"""
+ |// singleColumn NAAJ inputEmpty(true) accept all
Review comment:
> BTW, if `numKeys == 0`, it means there is no null key? because there
is no record at all. I'm wondering why you define `HashRelation.isEmpty` as
`numKeys == 0 && ! anyNullKeyExists`
if you go back to see the first commit of the PR, of using HashSet to store
buildSide data, maybe you would understand why.
HashedRelation is built for normal join, basically filter out null key row
is suitable for normal Join. But it's totally different story for null aware
join, because whether the buildSide(Or should i say rightSide) isEmpty or
nullKeyExist matters a lot.
so i ask you the history of hashedRelation about filter out null key rows.
i think the HashedRelation code change being so weird, and confused it is
because HashedRelation is built for normal join. if there are A
NullAwareHashedRelation which can store nullKey rows, it would be more clear.
I was going to implement such NullAwareHashedRelation when i try to support
multi-column, that's why i want to convince your putting NAAJ into NewExecNode,
because I think it's ideal that Normal Join deal with HashedRelation, and
NullAwareJoin to deal with NullAwareHashedRelation separately.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]