dongjoon-hyun commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1480402673
@shrprasa . 1. It seems that you have an assumption that Shutdown hook is magically reliable. However, shutdown hook has a well-known limitation where JVM can be destroyed abruptly and K8s Pod can be deleted also without giving the inside processes enough time to handle business logic. 2. As I mentioned in the above, public cloud storage systems have a better and complete TTL-based solution for that issue. In that context, this PR is only trying to mitigate HDFS issue, https://issues.apache.org/jira/browse/HDFS-6382, partially. > The change to clean up the upload directory is not specific to HDFS. The reason we should do cleanup is because if the spark job is creating new directories/files, it should clean them up too just like it's being done in YARN and also for other files like shuffle spill. Also, can you please explain why the approach seems to be incomplete? How is unable to prevent leftover from upload directory? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
