The clean-up of the staging directory is best effort. If the JobManager crashed and killed externally, then it does not have any chance to do the staging directory clean-up. AFAIK, we do not have such Flink options to guarantee the clean-up.
Best, Yang David Clutter <[email protected]> 于2022年1月11日周二 22:59写道: > Ok, that makes sense. I did see some job failures. However failures > could happen occasionally. Is there any option to have the job manager > clean-up these directories when the job has failed? > > On Mon, Jan 10, 2022 at 8:58 PM Yang Wang <[email protected]> wrote: > >> IIRC, the staging directory(/user/{name}/.flink/application_xxx) will be >> deleted automatically if the Flink job reaches global terminal state(e.g. >> FINISHED, CANCELED, FAILED). >> So I assume you have stopped the yarn application via "yarn application >> -kill", not via "bin/flink cancel". >> If it is the case, then having the residual staging directory is an >> expected behavior since Flink JobManager does not have a chance to do the >> clean-up. >> >> >> >> Best, >> Yang >> >> David Clutter <[email protected]> 于2022年1月11日周二 10:08写道: >> >>> I'm seeing files orphaned in HDFS and wondering how to clean them up >>> when the job is completed. The directory is /user/yarn/.flink so I am >>> assuming this is created by flink? The HDFS in my cluster eventually fills >>> up. >>> >>> Here is my setup: >>> >>> - Flink 1.13.1 on AWS EMR >>> - Executing flink in per-job mode >>> - Job is submitted every 5m >>> >>> In HDFS under /user/yarn/.flink I see a directory created for every >>> flink job submitted/yarn application. Each application directory contains >>> my user jar file, flink-dist jar, /lib with various flink jars, >>> log4j.properties. >>> >>> Is there a property to tell flink to clean up this directory when the >>> job is completed? >>> >>
