LantaoJin opened a new pull request #26711: [SPARK-30069][CORE][YARN] Clean up non-shuffle disk block manager files following executor exists on YARN URL: https://github.com/apache/spark/pull/26711 ### What changes were proposed in this pull request? Currently we only clean up the local directories on application removed. However, when executors die and restart repeatedly, many temp files are left untouched in the local directories, which is undesired behavior and could cause disk space used up gradually. Especially, in long running service like Spark thrift-server with dynamic resource allocation disabled, it's very easy causes local disk full. #21390 fixed the same problem on Standalone mode. On YARN, this issue still exists. From https://github.com/apache/spark/pull/21390#issuecomment-391695376, YARN only cleans container local dirs when container (executor) is exited. But these files are not in container local dirs. <img width="1527" alt="Screen Shot 2019-11-29 at 4 52 56 PM" src="https://user-images.githubusercontent.com/1853780/69856506-c66cce00-12c8-11ea-9e62-058aa2d3c12e.png"> So this patch is very straightforward: We create these "temp_xxx " files under the container dirs when the executor is running in YARN container. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Add an UT and manually test.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
