Ngone51 opened a new pull request #24467: [SPARK-27568][CORE] Fix readLock leak 
while calling take()/first() on a cached rdd
URL: https://github.com/apache/spark/pull/24467
 
 
   ## What changes were proposed in this pull request?
   
   Currently, if we run the code below in Spark:
   ```
   sc.parallelize(Range(0, 10), 1).cache().take(1)
   ```
    we'll see the line below in log:
   
   **19/04/25 23:48:54 INFO Executor: 1 block locks were not released by TID = 
0:
   [rdd_0_0]**
   
    and, If we set "spark.storage.exceptionOnPinLeak"=true, job will fail.
   
   Normally, we'll always release readLock for the block once we consumed all 
elements in a `CompletionIterator`. 
   However, operation like take()/first() do not need to consume all, which 
lead to the release behaviour
   can't be triggered.
   
   This pr suggests to manually call `completion()` for the 
`CompletionIterator` if the iterator still has next
   element after task finished, so that readLock could be released within 
`competion()`.
   
   ## How was this patch tested?
   
   Added.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to