leanken commented on pull request #29104:
URL: https://github.com/apache/spark/pull/29104#issuecomment-659750039
Hi. @agrawaldevesh
I am afraid that putting the optimize into BroadcastHashJoinExec is not that
easy.
right now, I've got
BroadcastNestedLoopJoinExec(LeftAnti with condition Or(EqualTo(a=b),
IsNull(EqualTo(a=b))))
if i want to translate into BroadcastHashJoinExec, first of all i need a
join key, right?
BroadcastHashJoinExec(LeftAnti joinKey(a=b), with condition)
But the EquiJoinKeys itself already break the integrity of the origin
condition Or(EqualTo(a=b), IsNull(EqualTo(a=b))
Let's see what codegenAnti is like:
```
s"""
|boolean $found = false;
|// generate join key for stream side
|${keyEv.code}
|// Check if the key has nulls.
|if (!($anyNull)) {
| // Check if the HashedRelation exists.
| UnsafeRow $matched =
(UnsafeRow)$relationTerm.getValue(${keyEv.value});
| if ($matched != null) {
| // Evaluate the condition.
| $checkCondition {
| $found = true;
| }
| }
|}
|if (!$found) {
| $numOutput.add(1);
| ${consume(ctx, input)}
|}
""".stripMargin
```
antiJoin with Key will keep streamedSideRow if streamedSide key is a null,
but it's totally opposite in NotInSubquery. I can certainly do some if-else
check here, but it might mess up the whole BroadcastHashJoinExec Code.
Besides the streamedSide key null difference, need to go through the entire
buildSide to see if there is a null key exists, that's also kind of weird.
BroadcastHashJoinExec assume that it has join key, but if i apply my
NotInSubquery check here, it would like, hey, I found two key should be joined,
but wait a minute, there are a tiny corner case here, so back off.
if it's up to me to choose, i won't choose to break integrity of
BroadcastHashJoinExec, i would rather count NotInSubquerySingleColumn as an
runtime optimize.
So, I am polling out the relative information for you guys, seeking advice
till I move forward to next step.
Choose A.
Count NotInSubquerySingleColumn as runtime optimize
Choose B.
Move code into BroadcastHashJoinExec but Codegen looks tricky.
looking for your reply, many many thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]