Ngone51 opened a new pull request #24467: [SPARK-27568][CORE] Fix readLock leak while calling take()/first() on a cached rdd URL: https://github.com/apache/spark/pull/24467 ## What changes were proposed in this pull request? Currently, if we run the code below in Spark: ``` sc.parallelize(Range(0, 10), 1).cache().take(1) ``` we'll see the line below in log: **19/04/25 23:48:54 INFO Executor: 1 block locks were not released by TID = 0: [rdd_0_0]** and, If we set "spark.storage.exceptionOnPinLeak"=true, job will fail. Normally, we'll always release readLock for the block once we consumed all elements in a `CompletionIterator`. However, operation like take()/first() do not need to consume all, which lead to the release behaviour can't be triggered. This pr suggests to manually call `completion()` for the `CompletionIterator` if the iterator still has next element after task finished, so that readLock could be released within `competion()`. ## How was this patch tested? Added.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
