c21 commented on pull request #29181: URL: https://github.com/apache/spark/pull/29181#issuecomment-663129725
> Delta engine will indeed support full outer join in SHJ. @bart-samwel - sounds good. I will work on to support full outer join in SHJ at its current java stack then in https://issues.apache.org/jira/browse/SPARK-32399. > BHJ is harder because you have to merge the "probedness" of all tasks before figuring out which rows you need to emit. For BHJ, every task gets a copy of whole build side. So I am thinking for each task, iterating all rows for build side, after exhausting stream side, and only emitting rows for its own part (we can rely on hash, e.g. task `i` only emits build side row if `hash(build_side_row_keys) % num_partitions_of_RDD == i`). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
