Re: [E] Re: Orphaned job files in HDFS

Yang Wang Mon, 17 Jan 2022 03:16:32 -0800

The clean-up of the staging directory is best effort. If the JobManager
crashed and killed externally, then it does not have any chance to do the
staging directory clean-up.
AFAIK, we do not have such Flink options to guarantee the clean-up.



Best,
Yang

David Clutter <[email protected]> 于2022年1月11日周二 22:59写道：

> Ok, that makes sense.  I did see some job failures.  However failures
> could happen occasionally.  Is there any option to have the job manager
> clean-up these directories when the job has failed?
>
> On Mon, Jan 10, 2022 at 8:58 PM Yang Wang <[email protected]> wrote:
>
>> IIRC, the staging directory(/user/{name}/.flink/application_xxx) will be
>> deleted automatically if the Flink job reaches global terminal state(e.g.
>> FINISHED, CANCELED, FAILED).
>> So I assume you have stopped the yarn application via "yarn application
>> -kill", not via "bin/flink cancel".
>> If it is the case, then having the residual staging directory is an
>> expected behavior since Flink JobManager does not have a chance to do the
>> clean-up.
>>
>>
>>
>> Best,
>> Yang
>>
>> David Clutter <[email protected]> 于2022年1月11日周二 10:08写道：
>>
>>> I'm seeing files orphaned in HDFS and wondering how to clean them up
>>> when the job is completed.  The directory is /user/yarn/.flink so I am
>>> assuming this is created by flink?  The HDFS in my cluster eventually fills
>>> up.
>>>
>>> Here is my setup:
>>>
>>>    - Flink 1.13.1 on AWS EMR
>>>    - Executing flink in per-job mode
>>>    - Job is submitted every 5m
>>>
>>> In HDFS under /user/yarn/.flink I see a directory created for every
>>> flink job submitted/yarn application.  Each application directory contains
>>> my user jar file, flink-dist jar, /lib with various flink jars,
>>> log4j.properties.
>>>
>>> Is there a property to tell flink to clean up this directory when the
>>> job is completed?
>>>
>>

Re: [E] Re: Orphaned job files in HDFS

Reply via email to