GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/20590
[SPARK-23399][SQL] Register a task completion listner first for OrcColumnarBatchReader ## What changes were proposed in this pull request? This PR aims to resolve an open file leakage issue reported at SPARK-23390 by moving the listener registration position. Currently, the sequence is like the following. 1. Create `batchReader` 2. `batchReader.initialize` opens a ORC file. 3. `batchReader.initBatch` may take a long time to alloc memory in some environment and cause errors. 4. `Option(TaskContext.get()).foreach(_.addTaskCompletionListener(_ => iter.close()))` This PR moves 4 before 2 and 3. To sum up, the new sequence is 1 -> 4 -> 2 -> 3. ## How was this patch tested? Currently, I couldn't find a way to add a test case. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-23399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20590.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20590 ---- commit 198f1861cfe4d2cd544cb3a09d3a271de1b656ab Author: Dongjoon Hyun <dongjoon@...> Date: 2018-02-12T21:46:49Z [SPARK-23399][SQL] Register a task completion listner first for OrcColumnarBatchReader ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org