shrprasa opened a new pull request, #40128:
URL: https://github.com/apache/spark/pull/40128

   ### What changes were proposed in this pull request?
   1. Instead of creating multiple sub directories in k8s upload directory 
(spark.kubernetes.file.upload.path) for each file to be uploaded, create a 
single subdirectory to be used for all file uploads of a specific application. 
This directory will be named using the spark application id.
   2. Delete the sub directory and it's content when job terminates
   
   ### Why are the changes needed?
   The change is required to cleanup the files and directories which are 
created under the k8s upload path to prevent space getting full. without this 
change, user needs to manually clean up these files.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, Users submitting spark on k8s job, won't need to manually cleanup the 
files under upload directory.
   
   ### How was this patch tested?
   Through git action and Manually by running Spark on K8s jobs.
   
   Some logs for verification:
   
   23/02/22 18:41:16 INFO SparkContext: Successfully stopped SparkContext
   23/02/22 18:41:16 INFO ShutdownHookManager: Shutdown hook called
   23/02/22 18:41:16 INFO ShutdownHookManager: Deleting directory 
/spark-local2/spark-f58ecb91-0bb6-4b10-a372-405f69780c15
   23/02/22 18:41:16 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-c10b27ca-fe68-4382-913f-4a535833f64e
   23/02/22 18:41:16 INFO ShutdownHookManager: Deleting directory 
/spark-local1/spark-8c4ef9d8-7688-4f51-b587-16dee1d264c7
   23/02/22 18:41:16 INFO UploadDirManager: Shutdown hook called
   23/02/22 18:41:17 INFO UploadDirManager: Upload dir deleted successfully.: 
hdfs://****:8020/user/*****/k8s/spark-upload-b6609808729a49e9b6f53f20ed50d05b
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to