"the shared subfolder still grows" - while upgrading job, we cancel job with savepoint, my expectations that Flink will clean checkpoint including shared directory, since checkpoints are not reatained, then we start upgraded job from savepoint, however when I look into shared folder I see older files from previous version of job. This upgrade process repeated again, as result the shared subfolder grows and grows
Thanks, Alexey ________________________________ From: Alexey Trenikhun <yen...@msn.com> Sent: Thursday, August 26, 2021 6:37:27 PM To: Matthias Pohl <matth...@ververica.com> Cc: Flink User Mail List <user@flink.apache.org>; sjwies...@gmail.com <sjwies...@gmail.com> Subject: Re: checkpoints/.../shared cleanup Hi Matthias, I don't use externalized checkpoints (from Flink UI Persist Checkpoints Externally: Disabled), why do you think checkpoint(s) should be retained? It kind of contradicts with documentation [1] - Checkpoints are by default not retained and are only used to resume a job from failures. [1] - https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/checkpoints/#retained-checkpoints Checkpoints | Apache Flink<https://ci.apache.org/projects/flink/flink-docs-master/docs/ops/state/checkpoints/#retained-checkpoints> Checkpoints # Overview # Checkpoints make state in Flink fault tolerant by allowing state and the corresponding stream positions to be recovered, thereby giving the application the same semantics as a failure-free execution. See Checkpointing for how to enable and configure checkpoints for your program. Checkpoint Storage # When checkpointing is enabled, managed state is persisted to ensure ... ci.apache.org Thanks, Alexey ________________________________ From: Matthias Pohl <matth...@ververica.com> Sent: Thursday, August 26, 2021 5:42 AM To: Alexey Trenikhun <yen...@msn.com> Cc: Flink User Mail List <user@flink.apache.org>; sjwies...@gmail.com <sjwies...@gmail.com> Subject: Re: checkpoints/.../shared cleanup Hi Alexey, thanks for reaching out to the community. I have a question: What do you mean by "the shared subfolder still grows"? As far as I understand, the shared folder contains the state of incremental checkpoints. If you cancel the corresponding job and start a new job from one of the retained incremental checkpoints, it is required for the shared folder of the previous job to be still around since it contains the state. The new job would then create its own shared subfolder. Any new incremental checkpoints will write their state into the new job's shared subfolder while still relying on shared state of the previous job for older data. The RocksDB Backend is in charge of consolidating the incremental state. Hence, you should be careful with removing the shared folder in case you're planning to restart the job later on. I'm adding Seth to this thread. He might have more insights and/or correct my limited knowledge of the incremental checkpoint process. Best, Matthias On Wed, Aug 25, 2021 at 1:39 AM Alexey Trenikhun <yen...@msn.com<mailto:yen...@msn.com>> wrote: Hello, I use incremental checkpoints, not externalized, should content of checkpoint/.../shared be removed when I cancel job (or cancel with savepoint). Looks like in our case shared continutes to grow... Thanks, Alexey