AngersZhuuuu commented on PR #36207: URL: https://github.com/apache/spark/pull/36207#issuecomment-1177772928
> Note you have to take into account all running modes: cluster, client managed and unmanaged. If you are just thinking of client here, what about cluster mode. If the application is in cluster mode and it fails to unregistered and the Application master process dies, I would expect YARN to assume something bad happened and to rerun it. If you remove the staging directory that rerun is going to fail. That may not be ideal but it is better than the job showing up as failed to the user. In both cluster or client mode, rerun will upload the staging dir again, event the staging dir exists <img width="1084" alt="截屏2022-07-07 下午11 13 24" src="https://user-images.githubusercontent.com/46485123/177808821-57a3ba3e-1876-456d-ad3a-c9376e29c0cf.png"> ``` public static boolean copy(FileSystem srcFS, Path src, FileSystem dstFS, Path dst, boolean deleteSource, Configuration conf) throws IOException { return copy(srcFS, src, dstFS, dst, deleteSource, true, conf); } ``` so I don't think it will cause any problem to delete the staging dir here... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
