JoshRosen commented on PR #37522:
URL: https://github.com/apache/spark/pull/37522#issuecomment-1218380759

   If the primary intent is to clean up cache entries associated with finished 
Spark applications, is there a way that we could do this more directly? 
   
   In `ExternalShuffleBlockResolver.applicationRemoved` it looks like we have 
some logic that can start an asynchronous background task to clean up the 
shuffle files. Maybe we could somehow remove the cache entries in a similar 
manner (e.g. either by doing this during the file deletion, taking advantage of 
the fact that the cache keys are filenames, or by iterating over the cache's 
keys and deleting files in the deleted application's directory (we'd have to 
check the iteration semantic to make sure this is safe)).
   
   Doing this cleanup at application-removal time would avoid the need to have 
a time-based config (which could be hard to tune appropriately).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to