Hi Kai,

I think unaligned checkpoint + alignment timeout [1] might also help you in 
this case. You could leverage unaligned checkpoint to help reduce the 
checkpoint duration.


[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/deployment/config/#execution-checkpointing-alignment-timeout

Best
Yun Tang

________________________________
From: Senhong Liu <senhong...@gmail.com>
Sent: Monday, May 31, 2021 10:33
To: JING ZHANG <beyond1...@gmail.com>
Cc: Kai Fu <zzfu...@gmail.com>; user <user@flink.apache.org>
Subject: Re: Dynamic configuration of Flink checkpoint interval

Hi all,

In fact, a pretty similar JIRA has been created, which is 
https://issues.apache.org/jira/browse/FLINK-18578 and I am working on it. In 
the near future, I will publish a FLIP and start a discussion about that. We 
look forward to your participation.

Best,
Senhong Liu

JING ZHANG <beyond1...@gmail.com<mailto:beyond1...@gmail.com>> 于2021年5月31日周一 
上午10:21写道:
Hi Kai,

Happy to hear that.
Would you please paste the JIRA link in the email after you create it. Maybe it 
could help other users who encounter the same problem. Thanks very much.

Best regards,
JING ZHANG

Kai Fu <zzfu...@gmail.com<mailto:zzfu...@gmail.com>> 于2021年5月30日周日 下午11:19写道:
Hi Jing,

Yup, what you're describing is what I want. I also tried the approach you 
suggested and it works. I'm going to take that approach for the moment and 
create a Jira issue for this feature.

On Sun, May 30, 2021 at 8:57 PM JING ZHANG 
<beyond1...@gmail.com<mailto:beyond1...@gmail.com>> wrote:
Hi Kai,

Do you try to find a way to hot update checkpoint interval or disable/enable 
checkpoint without stop and restart job?
Unfortunately, it is not supported yet, AFAIK.
You're very welcome to create an issue and describe your needs here (Flink’s 
Jira<http://issues.apache.org/jira/browse/FLINK>) .
At present, you may would like to use the following temporary solution:
  1. set a bigger value as checkpoint interval, start your job
  2. do a savepoint after cold start is completed
  3. set a normal value as checkpoint interval, restart the job from savepoint

Best regards,
JING ZHANG

Kai Fu <zzfu...@gmail.com<mailto:zzfu...@gmail.com>> 于2021年5月30日周日 下午7:13写道:
Hi team,

We want to know if Flink has some dynamic configuration of the checkpoint 
interval. Our use case has a cold start phase where the entire dataset is 
replayed from the beginning until the most recent ones.

In the cold start phase, the resources are fully utilized and the backpressure 
is high for all upstream operators, causing the checkpoint timeout constantly. 
The real production traffic is far less than that and the current provisioned 
resource is capable of handling it.

We're thinking if Flink can support the dynamic checkpoint config to bypass the 
checkpoint operation or make it less frequent on the cold start phase to speed 
up the process, while making the checkpoint normal again once the cold start is 
completed.

--
Best wishes,
- Kai


--
Best wishes,
- Kai

Reply via email to