c21 commented on pull request #29181:
URL: https://github.com/apache/spark/pull/29181#issuecomment-662704525


   @bart-samwel - just to bring us in the same page.
   
   Current spark scala/java implementation for hash join (broadcast hash join 
and shuffled hash join) has following restriction:
   * [For left outer join, stream side can only be left 
side](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L288).
   
   * [Similarly, for right outer join, stream side can only be right 
side](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala#L281).
   
   * Full outer join is not supported in broadcast hash join and shuffled hash 
join (have to do a sort merge join, code reference same as above).
   
   Both of cases you mentioned are to do right outer join, with left stream 
side. This will not happen.
   
   A separate topic: I think it would be interesting to explore support full 
outer join in shuffled hash join and broadcast hash join where I discussed with 
@cloud-fan in [another 
PR](https://github.com/apache/spark/pull/29130#discussion_r456502678). I 
created a JIRA for this now - 
https://issues.apache.org/jira/browse/SPARK-32399. This should help save 
shuffle and sort as currently for full outer join, we always do a sort merge 
join no matter of table size. BTW, does delta engine support full outer join in 
hash join? Would like to understand more here. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to