LantaoJin opened a new pull request #29378:
URL: https://github.com/apache/spark/pull/29378


   ### What changes were proposed in this pull request?
   
   This is a reopen for PR #26711 which closed as stale.
   
   #21390 fixed the same problem on Standalone mode. On YARN, this issue still 
exists.
   This patch is very straightforward:
   We create these "temp_xxx " files under the container dirs when the executor 
is running in YARN container.
   
   
   ### Why are the changes needed?
   Currently, we only clean up the local directories on an application removed. 
However, when executors die and restart repeatedly, many temp files are left 
untouched in the local directories, which is undesired behavior and could cause 
disk space used up gradually. Especially, in a long-running service like Spark 
thrift-server with dynamic resource allocation disabled, it's very easy to 
cause local disk full.
   
   From https://github.com/apache/spark/pull/21390#issuecomment-391695376, YARN 
only cleans container local dirs when a container (executor) is exited. But 
these files are not in container local dirs.
   <img width="1527" alt="Screen Shot 2019-11-29 at 4 52 56 PM" 
src="https://user-images.githubusercontent.com/1853780/69856506-c66cce00-12c8-11ea-9e62-058aa2d3c12e.png";>
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Add a UT and manually test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to