c21 commented on a change in pull request #29484:
URL: https://github.com/apache/spark/pull/29484#discussion_r473740090
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala
##########
@@ -442,7 +446,11 @@ trait HashJoin extends BaseJoinExec with CodegenSupport {
case BuildRight => input ++ buildVars
}
- if (keyIsUnique) {
+ if (isEmptyHashedRelation) {
+ s"""
+ |// If HashedRelation is empty, hash inner join simply returns
nothing.
Review comment:
@cloud-fan - yes sorry about it. After checking codegen code for example
query I can confirm this. For non-codegen (iterator mode) it works, but for
codegen it does not work because we are processing in `doConsume()` here so we
are still executing the stream side.
So I think
* for non-codegen code path: will keep the same change as in this PR now.
* for codegen code path: do not make change here in `HashJoin`, but adding
an adaptive execution logical plan rule e.g. called
`EliminateEmptyBroadcastHashJoin.scala` which checks `stage:
BroadcastQueryStageExec` to be empty or not
(`stage.broadcast.relationFuture.get().value == EmptyHashedRelation`), if it
is, then changing the logical plan from `Join` to `LocalRelation(data =
Seq.empty, ...)`.
Does it sound good as plan? Thanks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]