Re: Cleanup for high-availability.storageDir

Alexis Sarda-Espinosa Tue, 06 Dec 2022 08:42:17 -0800

One concrete question, under the HA folder I also see these sample entries:


- job_name/blob/job_uuid/blob_...
- job_name/submittedJobGraphX
- job_name/submittedJobGraphY

Is it safe to clean these up when the job is in a healthy state?

Regards,
Alexis.

Am Mo., 5. Dez. 2022 um 20:09 Uhr schrieb Alexis Sarda-Espinosa <
sarda.espin...@gmail.com>:

> Hi Gyula,
>
> that certainly helps, but to set up automatic cleanup (in my case, of
> azure blob storage), the ideal option would be to be able to set a simple
> policy that deletes blobs that haven't been updated in some time, but that
> would assume that anything that's actually relevant for the latest state is
> "touched" by the JM on every checkpoint, and since I also see blobs
> referencing "submitted job graphs", I imagine that might not be a safe
> assumption.
>
> I understand the life cycle of those blobs isn't directly managed by the
> operator, but in that regard it could make things more cumbersome.
>
> Ideally, Flink itself would guarantee this sort of allowable TTL for HA
> files, but I'm sure that's not trivial.
>
> Regards,
> Alexis.
>
> On Mon, 5 Dec 2022, 19:19 Gyula Fóra, <gyula.f...@gmail.com> wrote:
>
>> Hi!
>>
>> There are some files that are not cleaned up over time in the HA dir that
>> need to be cleaned up by the user:
>>
>>
>> https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/concepts/overview/#jobresultstore-resource-leak
>>
>>
>> Hope this helps
>> Gyula
>>
>> On Mon, 5 Dec 2022 at 11:56, Alexis Sarda-Espinosa <
>> sarda.espin...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I see the number of entries in the directory configured for HA increases
>>> over time, particularly in the context of job upgrades in a Kubernetes
>>> environment managed by the operator. Would it be safe to assume that any
>>> files that haven't been updated in a while can be deleted? Assuming the
>>> checkpointing interval is much smaller than the period used to determine if
>>> files are too old.
>>>
>>> Regards,
>>> Alexis.
>>>
>>>

Re: Cleanup for high-availability.storageDir

Reply via email to