Github user advancedxy commented on the issue:

    https://github.com/apache/spark/pull/23083
  
    > For the task completion listener, I think it's an overkill to introduce a 
new API, do you know where exactly we leak the memory? and can we null it out 
when the ShuffleBlockFetcherIterator reaches to its end?
    
    If I understand correctly, the memory is leaked because external sorter is 
referenced in `TaskCompletionListener` and it's only gced when the task is 
completed. However for `coalesce` or similar APIs, multiple 
`BlockStoreShuffleReader`s are created as there are multiple input sources, the 
internal sorter is not released until all shuffle readers are consumed and task 
is finished.
    
    It's an overkill to introduce a new API. However, I think we can limited it 
into private[Spark] scope. 
    Like @szhem, I don't figure out another way to null out the sorter 
reference yet.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to