Github user advancedxy commented on the issue:
https://github.com/apache/spark/pull/23083
> For the task completion listener, I think it's an overkill to introduce a
new API, do you know where exactly we leak the memory? and can we null it out
when the ShuffleBlockFetcherIterator reaches to its end?
If I understand correctly, the memory is leaked because external sorter is
referenced in `TaskCompletionListener` and it's only gced when the task is
completed. However for `coalesce` or similar APIs, multiple
`BlockStoreShuffleReader`s are created as there are multiple input sources, the
internal sorter is not released until all shuffle readers are consumed and task
is finished.
It's an overkill to introduce a new API. However, I think we can limited it
into private[Spark] scope.
Like @szhem, I don't figure out another way to null out the sorter
reference yet.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]