Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/15722
  
    @jiexiong PR descriptions are used in git commit messages, and should be 
clear and concise. The fix LGTM, but the description should be improved for 
future reference. How about we change it into the following (I only redacted 
the comments in the PR):
    
    ## What changes were proposed in this pull request?
    `BytesToBytesMap` currently does not release the in-memory storage (the 
`longArray` variable) after it spills to disk. This is typically not a problem 
during aggregation because the longArray should be much smaller than the pages, 
and because we grow the `longArray` at a conservative rate.
    
    However this can lead to an OOM when an already running task is allocated 
more than its fair share, this can happen because of a scheduling delay. In 
this case the `longArray` can grow beyond the fair share of memory for the 
task. This becomes problematic when the task spills and the long array is not 
freed, that causes subsequent memory allocation requests to be denied by the 
memory manager resulting in an OOM.
    
    This PR fixes this issuing by freeing the `longArray` when the 
`BytesToBytesMap` spills.
    
    ## How was this patch tested?
    Existing tests and tested on realworld workloads.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to