Github user JoshRosen commented on the issue:

    https://github.com/apache/spark/pull/16989
  
    I think that the current use of `MemoryMode.OFF_HEAP` allocation will cause 
problems in out-of-the-box deployments using the default configurations. In 
Spark's current memory manager implementation the total amount of Spark-managed 
off-heap memory that we will use is controlled by `spark.memory.offHeap.size` 
and the default value is 0. In this PR, the comment on 
`spark.reducer.maxReqSizeShuffleToMem` says that it should be smaller than 
`spark.memory.offHeap.size` and yet the default is 200 megabytes so the default 
configuration is invalid.
    
    Because `preferDirectBufs()` is `true` by default it looks like the code 
here will always try to reserve memory using `MemoryMode.OFF_HEAP` and these 
reservations will always fail in the default configuration because the off-heap 
size will be zero, so I think the net effect of this patch will be to always 
spill to disk.
    
    One way to address this problem is to configure the default value of 
`spark.memory.offHeap.size` to match the JVM's internal limit on the amount of 
direct buffers that it can allocate minus some percentage or fixed overhead. 
Basically the problem is that Spark's off-heap memory manager was originally 
designed to only manage off-heap memory explicitly allocated by Spark itself 
when creating its own buffers / pages or caching blocks, not to account for 
off-heap memory used by lower-level code or third-party libraries. I'll see if 
I can think of a clean way to fix this, which I think will need to be done 
before the defaults used here can work as intended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to