Github user JoshRosen commented on the issue:
https://github.com/apache/spark/pull/16989
I think that the current use of `MemoryMode.OFF_HEAP` allocation will cause
problems in out-of-the-box deployments using the default configurations. In
Spark's current memory manager implementation the total amount of Spark-managed
off-heap memory that we will use is controlled by `spark.memory.offHeap.size`
and the default value is 0. In this PR, the comment on
`spark.reducer.maxReqSizeShuffleToMem` says that it should be smaller than
`spark.memory.offHeap.size` and yet the default is 200 megabytes so the default
configuration is invalid.
Because `preferDirectBufs()` is `true` by default it looks like the code
here will always try to reserve memory using `MemoryMode.OFF_HEAP` and these
reservations will always fail in the default configuration because the off-heap
size will be zero, so I think the net effect of this patch will be to always
spill to disk.
One way to address this problem is to configure the default value of
`spark.memory.offHeap.size` to match the JVM's internal limit on the amount of
direct buffers that it can allocate minus some percentage or fixed overhead.
Basically the problem is that Spark's off-heap memory manager was originally
designed to only manage off-heap memory explicitly allocated by Spark itself
when creating its own buffers / pages or caching blocks, not to account for
off-heap memory used by lower-level code or third-party libraries. I'll see if
I can think of a clean way to fix this, which I think will need to be done
before the defaults used here can work as intended.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]