Thanks, Shammon and Alex, for the pointers. The Rocsdb state backend is
being used but without an incremental checkpoint. I will enable incremental
checkpoints and see if it works. Thanks.


On Tue, Jun 20, 2023 at 5:25 PM Shammon FY <zjur...@gmail.com> wrote:

> Hi Prabhu,
>
> I found that the size of `Full Checkpoint Data Size` is equal to
> `Checkpointed Data Size`. So what's the state backend you are using? I
> recommend you to use rocksdb state backed for your job, and if so, you can
> turn on incremental checkpoint [1] which will reduce the state size for the
> checkpoint.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/large_state_tuning/#incremental-checkpoints
>
> Best,
> Shammon FY
>
> On Tue, Jun 20, 2023 at 4:50 PM Alex Nitavsky <alexnitav...@gmail.com>
> wrote:
>
>> Hello Prabhu,
>>
>> On your place I would check:
>>
>> 1. That there is no "state leak" in your job, because it seems that state
>> only accumulates for the job and is never cleaned, e.g. probably some timer
>> which cleans the state for some key is not configured correctly.
>>
>> 2. Probably you accumulate the state in a big window, e.g. in a 2 hour
>> Tumbling window the maximum job state will be reached in two hours only. So
>> your job should be scaled or optimized.
>>
>> Best
>> Alex
>>
>> On Tue, Jun 20, 2023 at 10:39 AM Prabhu Joseph <
>> prabhujose.ga...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Flink Checkpoint times out with checkpointed data size doubles every
>>> checkpoint. Any ideas on what could be wrong in the application or how to
>>> debug this?
>>>
>>> [image: checkpoint_issue.png]
>>>
>>>
>>>

Reply via email to