Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/16989 I think that the current use of `MemoryMode.OFF_HEAP` allocation will cause problems in out-of-the-box deployments using the default configurations. In Spark's current memory manager implementation the total amount of Spark-managed off-heap memory that we will use is controlled by `spark.memory.offHeap.size` and the default value is 0. In this PR, the comment on `spark.reducer.maxReqSizeShuffleToMem` says that it should be smaller than `spark.memory.offHeap.size` and yet the default is 200 megabytes so the default configuration is invalid. Because `preferDirectBufs()` is `true` by default it looks like the code here will always try to reserve memory using `MemoryMode.OFF_HEAP` and these reservations will always fail in the default configuration because the off-heap size will be zero, so I think the net effect of this patch will be to always spill to disk. One way to address this problem is to configure the default value of `spark.memory.offHeap.size` to match the JVM's internal limit on the amount of direct buffers that it can allocate minus some percentage or fixed overhead. Basically the problem is that Spark's off-heap memory manager was originally designed to only manage off-heap memory explicitly allocated by Spark itself when creating its own buffers / pages or caching blocks, not to account for off-heap memory used by lower-level code or third-party libraries. I'll see if I can think of a clean way to fix this, which I think will need to be done before the defaults used here can work as intended.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org