GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/19378

    [SPARK-22143][SQL][BRANCH-2.2] Fix memory leak in OffHeapColumnVector

    This is a backport of 
https://github.com/apache/spark/commit/02bb0682e68a2ce81f3b98d33649d368da7f2b3d.
    
    ## What changes were proposed in this pull request?
    `WriteableColumnVector` does not close its child column vectors. This can 
create memory leaks for `OffHeapColumnVector` where we do not clean up the 
memory allocated by a vectors children. This can be especially bad for string 
columns (which uses a child byte column vector).
    
    ## How was this patch tested?
    I have updated the existing tests to always use both on-heap and off-heap 
vectors. Testing and diagnosis was done locally.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-22143-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19378.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19378
    
----
commit c58a9e59a9bfb60ef85731b11213eba464c5076a
Author: Herman van Hovell <[email protected]>
Date:   2017-09-27T21:08:30Z

    [SPARK-22143][SQL] Fix memory leak in OffHeapColumnVector
    
    ## What changes were proposed in this pull request?
    `WriteableColumnVector` does not close its child column vectors. This can 
create memory leaks for `OffHeapColumnVector` where we do not clean up the 
memory allocated by a vectors children. This can be especially bad for string 
columns (which uses a child byte column vector).
    
    ## How was this patch tested?
    I have updated the existing tests to always use both on-heap and off-heap 
vectors. Testing and diagnoses was done locally.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #19367 from hvanhovell/SPARK-22143.
    
    # Conflicts:
    #   
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
    #   
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
    #   
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala

commit 20540ecab64e367f3e30166545dc8725fb268a46
Author: Herman van Hovell <[email protected]>
Date:   2017-09-28T08:47:13Z

    Fix up ColumnarBatchSuite

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to