GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/19378
[SPARK-22143][SQL][BRANCH-2.2] Fix memory leak in OffHeapColumnVector
This is a backport of
https://github.com/apache/spark/commit/02bb0682e68a2ce81f3b98d33649d368da7f2b3d.
## What changes were proposed in this pull request?
`WriteableColumnVector` does not close its child column vectors. This can
create memory leaks for `OffHeapColumnVector` where we do not clean up the
memory allocated by a vectors children. This can be especially bad for string
columns (which uses a child byte column vector).
## How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap
vectors. Testing and diagnosis was done locally.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-22143-2.2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19378.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19378
----
commit c58a9e59a9bfb60ef85731b11213eba464c5076a
Author: Herman van Hovell <[email protected]>
Date: 2017-09-27T21:08:30Z
[SPARK-22143][SQL] Fix memory leak in OffHeapColumnVector
## What changes were proposed in this pull request?
`WriteableColumnVector` does not close its child column vectors. This can
create memory leaks for `OffHeapColumnVector` where we do not clean up the
memory allocated by a vectors children. This can be especially bad for string
columns (which uses a child byte column vector).
## How was this patch tested?
I have updated the existing tests to always use both on-heap and off-heap
vectors. Testing and diagnoses was done locally.
Author: Herman van Hovell <[email protected]>
Closes #19367 from hvanhovell/SPARK-22143.
# Conflicts:
#
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
#
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
#
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
commit 20540ecab64e367f3e30166545dc8725fb268a46
Author: Herman van Hovell <[email protected]>
Date: 2017-09-28T08:47:13Z
Fix up ColumnarBatchSuite
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]