Re: Cleaning old incremental checkpoint files

Robert Metzger Tue, 03 Aug 2021 04:29:03 -0700

Hi Robin,

Let's say you have two checkpoints #1 and #2, where #1 has been created by
an old version or your job, and #2 has been created by the new version.
When can you delete #1?
In #1, there's a directory "/shared" that contains data that is also used
by #2, because of the incremental nature of the checkpoints.


You can not delete the data in the /shared directory, as this data is
potentially still in use.

I know this is only a partial answer to your question. I'll try to find out
more details and extend my answer later.


On Thu, Jul 29, 2021 at 2:31 PM Robin Cassan <robin.cas...@contentsquare.com>
wrote:

> Hi all!
>
> We've happily been running a Flink job in production for a year now, with
> the RocksDB state backend and incremental retained checkpointing on S3. We
> often release new versions of our jobs, which means we cancel the running
> one and submit another while restoring the previous jobId's last retained
> checkpoint.
>
> This works fine, but we also need to clean old files from S3 which are
> starting to pile up. We are wondering two things:
> - once the newer job has restored the older job's checkpoint, is it safe
> to delete it? Or will the newer job's checkpoints reference files from the
> older job, in which case deleting the old checkpoints might cause errors
> during the next restore?
> - also, since all our state has a 7 days TTL, is it safe to set a 7 or 8
> days retention policy on S3 which would automatically clean old files, or
> could we still need to retain files older than 7 days even with the TTL?
>
> Don't hesitate to ask me if anything is not clear enough!
>
> Thanks,
> Robin
>

Re: Cleaning old incremental checkpoint files

Reply via email to