GitHub user sachouche opened a pull request:

    https://github.com/apache/drill/pull/1087

    Attempt to fix memory leak in Parquet

    ** Problem Description **
    This is an extremely rare leak which I was able to emulate by putting a 
sleep in the AsyncPageReader right after reading the page and before enqueue in 
the result queue. This is how this issue could manifest itself in real life 
scenario:
    - AsyncPageReader reads a page into a buffer but didn't enqueue yet the 
result (thread got preempted)
    - Parquet Scan thread blocked waiting on the task (Future object dequeued)
    - Cancel received and Scan thread interrupted 
    - Future.get() returns (Future object is lost)
    - Scan thread executes release logic
    - Scan thread is not able to interrupt the AsyncPageReader thread since the 
future object is lost
    -  AsyncPageReader thread resumes and enqueues the DrillBuf in the result 
queue
    - This results in a leak since this buffer is not properly released
    
    ** Fix Description **
    - The fix is straightforward as we peek the Future object during the 
blocking get() method
    - This way, an exception (such as an interrupt) will leave the Future 
object in the task queue
    - The cleanup logic will be able to guarantee the DrillBuf object is either 
GCed by the AsyncPageReader or ParquetScan thread

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sachouche/drill DRILL-6079

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1087.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1087
    
----
commit 52030d1d9cc3b8992a10ade8c7126d66e785043a
Author: Salim Achouche <sachouche2@...>
Date:   2017-12-22T19:50:56Z

    Attempt to fix memory leak in Parquet

----


---

Reply via email to