LantaoJin opened a new pull request #29378: URL: https://github.com/apache/spark/pull/29378
### What changes were proposed in this pull request? This is a reopen for PR #26711 which closed as stale. #21390 fixed the same problem on Standalone mode. On YARN, this issue still exists. This patch is very straightforward: We create these "temp_xxx " files under the container dirs when the executor is running in YARN container. ### Why are the changes needed? Currently, we only clean up the local directories on an application removed. However, when executors die and restart repeatedly, many temp files are left untouched in the local directories, which is undesired behavior and could cause disk space used up gradually. Especially, in a long-running service like Spark thrift-server with dynamic resource allocation disabled, it's very easy to cause local disk full. From https://github.com/apache/spark/pull/21390#issuecomment-391695376, YARN only cleans container local dirs when a container (executor) is exited. But these files are not in container local dirs. <img width="1527" alt="Screen Shot 2019-11-29 at 4 52 56 PM" src="https://user-images.githubusercontent.com/1853780/69856506-c66cce00-12c8-11ea-9e62-058aa2d3c12e.png"> ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Add a UT and manually test. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
