GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/20590

    [SPARK-23399][SQL] Register a task completion listner first for 
OrcColumnarBatchReader

    ## What changes were proposed in this pull request?
    
    This PR aims to resolve an open file leakage issue reported at SPARK-23390 
by moving the listener registration position. Currently, the sequence is like 
the following.
    
    1. Create `batchReader`
    2. `batchReader.initialize` opens a ORC file.
    3. `batchReader.initBatch` may take a long time to alloc memory in some 
environment and cause errors.
    4. `Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => 
iter.close()))`
    
    This PR moves 4 before 2 and 3. To sum up, the new sequence is 1 -> 4 -> 2 
-> 3.
    
    ## How was this patch tested?
    
    Currently, I couldn't find a way to add a test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-23399

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20590.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20590
    
----
commit 198f1861cfe4d2cd544cb3a09d3a271de1b656ab
Author: Dongjoon Hyun <dongjoon@...>
Date:   2018-02-12T21:46:49Z

    [SPARK-23399][SQL] Register a task completion listner first for 
OrcColumnarBatchReader

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to