semi join with empty hashed relation

GitBox Thu, 20 Aug 2020 01:07:19 -0700


c21 commented on a change in pull request #29484:
URL: https://github.com/apache/spark/pull/29484#discussion_r473740090




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala
##########
@@ -442,7 +446,11 @@ trait HashJoin extends BaseJoinExec with CodegenSupport {
       case BuildRight => input ++ buildVars
     }
 
-    if (keyIsUnique) {
+    if (isEmptyHashedRelation) {
+      s"""
+         |// If HashedRelation is empty, hash inner join simply returns 
nothing.

Review comment:
       @cloud-fan - yes sorry about it. After checking codegen code for example 
query I can confirm this. For non-codegen (iterator mode) it works, but for 
codegen it does not work because we are processing in `doConsume()` here so we 
are still executing the stream side.
   
   So I think
   * for non-codegen code path: will keep the same change as in this PR now.
   * for codegen code path: do not make change here in `HashJoin`, but adding 
an adaptive execution logical plan rule e.g. called 
`EliminateEmptyBroadcastHashJoin.scala` which checks `stage: 
BroadcastQueryStageExec` to be empty or not 
(`stage.broadcast.relationFuture.get().value == EmptyHashedRelation`), if it 
is, then changing the logical plan from `Join`  to `LocalRelation(data = 
Seq.empty, ...)`.
   
   Does it sound good as plan? Thanks.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #29484: [SPARK-32649][SQL] Optimize BHJ/SHJ inner/semi join with empty hashed relation

Reply via email to