JoshRosen commented on PR #37522: URL: https://github.com/apache/spark/pull/37522#issuecomment-1218380759
If the primary intent is to clean up cache entries associated with finished Spark applications, is there a way that we could do this more directly? In `ExternalShuffleBlockResolver.applicationRemoved` it looks like we have some logic that can start an asynchronous background task to clean up the shuffle files. Maybe we could somehow remove the cache entries in a similar manner (e.g. either by doing this during the file deletion, taking advantage of the fact that the cache keys are filenames, or by iterating over the cache's keys and deleting files in the deleted application's directory (we'd have to check the iteration semantic to make sure this is safe)). Doing this cleanup at application-removal time would avoid the need to have a time-based config (which could be hard to tune appropriately). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
