GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/21738
[SPARK-21743][SQL][followup] free aggregate map when task ends
## What changes were proposed in this pull request?
This is the first follow-up of https://github.com/apache/spark/pull/21573 ,
which was only merged to 2.3.
This PR fixes the memory leak in another way: free the `UnsafeExternalMap`
when the task ends. All the data buffers in Spark SQL are using
`UnsafeExternalMap` and `UnsafeExternalSorter` under the hood, e.g. sort,
aggregate, window, SMJ, etc. `UnsafeExternalSorter` registers a task completion
listener to free the resource, we should apply the same thing to
`UnsafeExternalMap`.
TODO in the next PR:
do not consume all the inputs when having limit in whole stage codegen.
## How was this patch tested?
existing tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark limit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21738.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21738
----
commit 174c4e55b897beaf51e395376bbb3d651d394d94
Author: Wenchen Fan <wenchen@...>
Date: 2018-07-09T16:18:31Z
free aggregate map when task ends
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]