cloud-fan opened a new pull request, #53914:
URL: https://github.com/apache/spark/pull/53914

   ### What changes were proposed in this pull request?
   
   This PR fixes a race condition in `IsolatedSessionState` lifecycle 
management that could cause flakiness when 
`spark.executor.isolatedSessionCache.size` is set to a small value.
   
   Key changes:
   - Introduced `IsolatedSessionState.sessions` as the authoritative store for 
all isolated sessions, ensuring only one session exists per UUID at any time
   - Changed `refCount` and `evicted` from lock-free to synchronized access via 
a shared lock object to prevent race conditions between `acquire()`, 
`release()`, and `markEvicted()`
   - Added `acquire()` return value to indicate if the session was successfully 
acquired (returns false if already evicted)
   - Added `tryUnEvict()` method to allow reusing a deferred session that was 
evicted but still in use
   - Updated the cache loader to check the authoritative sessions map first and 
reuse existing sessions when possible
   
   ### Why are the changes needed?
   
   When the isolated session cache is full and sessions are evicted, there's a 
race condition between:
   1. A task acquiring a session from the cache
   2. Another task triggering eviction of that session
   3. The evicted session being cleaned up (classloader closed, files deleted)
   
   This could cause:
   - `RemoteClassLoaderError` when trying to load classes with a closed 
classloader
   - `NoSuchFileException` when session files are deleted while still in use
   
   The fix ensures that:
   - Sessions are tracked in an authoritative map from creation until cleanup 
completes
   - Evicted sessions can be reused if still in use by other tasks
   - A task cannot acquire a session that's being cleaned up
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to