zhztheplayer commented on PR #52817:
URL: https://github.com/apache/spark/pull/52817#issuecomment-3540854361
Hi @cloud-fan, thanks for having a look.
> How is this implemented in this PR? I don't see any branching code
regarding different joins.
This is a bit subtle due to how the task memory manager it got in Spark code.
## For SHJ
https://github.com/apache/spark/blob/722bcc0f0d15245a39fae62c0c1c764e4b6a02f8/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoinExec.scala#L108-L116
As seen, `context.taskMemoryManager()` is passed in, and because we
[modified](https://github.com/apache/spark/pull/52817/files#diff-127291a0287f790755be5473765ea03eb65f8b58b9ec0760955f124e21e3452fR539)
the `LongHashedRelation` to use the tmm's memory mode, so after the PR, SHJ
will use off-heap if `spark.memory.offHeap.enabled=false`.
## For BHJ (Driver)
The code goes through this path:
https://github.com/apache/spark/blob/722bcc0f0d15245a39fae62c0c1c764e4b6a02f8/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L197
then
https://github.com/apache/spark/blob/722bcc0f0d15245a39fae62c0c1c764e4b6a02f8/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L1157-L1166
, where no tmm is passed for creating the hashed relation. In this case, a
temporary on-heap tmm will be created and used:
https://github.com/apache/spark/blob/722bcc0f0d15245a39fae62c0c1c764e4b6a02f8/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L142-L150.
## For BHJ (Executor)
Similar to the driver side, the deserialization code also uses a temporary
on-heap tmm:
https://github.com/apache/spark/blob/722bcc0f0d15245a39fae62c0c1c764e4b6a02f8/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L401-L407.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]