JoshRosen commented on PR #36751:
URL: https://github.com/apache/spark/pull/36751#issuecomment-1145291954

   If I recall, I think the original motivation for this "release all locks at 
the end of the task" code was to prevent indefinite "pin leaks" if tasks fail 
to properly release locks (e.g. because an iterator isn't fully consumed or 
because the task crashes). 
   
   I see how this is a problem in the case of multi-threaded tasks.
   
   As you pointed out in your comment at 
https://github.com/apache/spark/pull/35991#discussion_r887524913, 
   
   > The good news is that the code that takes out the write locks is only in 
the BlockManager and that it properly cleans up after itself (release locks, 
remove data if needed). This means we could fix this issue by removing the 
release of write locks from `releaseAllLocksForTask`.
   
   It might be a good idea to mention this "write locks are only managed in the 
BlockManager" property in a code comment near where you're changing the the 
release all locks code,.
   
   ---
   
   In principle, aren't we also vulnerable to race conditions for read locks? 
Let's say that I've configured Spark to use off-heap memory and I have a task 
where a secondary task thread is reading from that block. If the main task 
thread exits and releases the read locks, then the block manager could decide 
to evict the off-heap block while it's still being read in the secondary 
thread. I think that would be a use-after-free bug which could trigger a 
segfault.
   
   Perhaps that read lock race condition is rarer and less severe from a 
correctness point of view: 
   
   - In this scenario, the reader is a task thread belonging to the finished 
task.
   - If the reader segfaults then the JVM will die and we'll reschedule the 
tasks. This isn't ideal, but compared to the writer scenario 
   - If the reader reads corrupt data because the memory region was re-used in 
a subsequent allocation then I think we're okay as long as the reading thread 
isn't performing external side effects based on the possibly-corrupt data (e.g. 
posting data to an external REST API).
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to