Hi,

Checkpoint duration sync, that’s only the time taken for the “synchronous” part 
of taking a snapshot of your operator. Your 11m time probably comes from the 
fact that before this snapshot, checkpoint barrier was stuck somewhere in your 
pipeline for that amount of time processing some record or bunch of records.

If you write a simple function that only performs `Thread.sleep(new 
Random().randomInt(3600000))` and nothing else, your checkpoints will be taking 
random amount of time, since snapshots can not be taken while your function is 
also executing some code. You can read about some of those concepts in the 
documentation

https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html

Piotrek

Btw, Flink 1.2.1 is very old and not supported anymore version. One reason to 
upgrade are improvements in the network stack in Flink 1.5.x, which were in 
part aiming to reduce checkpoint duration.

> On 5 Nov 2018, at 21:33, PranjalChauhan <pranjalhchau...@gmail.com> wrote:
> 
> Hi,
> 
> I am new Fink user and currently, using Flink 1.2.1 version. I am trying to
> understand how checkpoints actually work when Window operator is processing
> events.
> 
> My pipeline has the following flow where each operator's parallelism is 1.
> source -> flatmap -> tumbling window -> sink
> In this pipeline, I had configured the window to be evaluated every 1 hour
> (3600 seconds) and the checkpoint interval was 5 mins. The checkpoint
> timeout was set to 1 hour as I wanted the checkpoints to complete.
> 
> In my window function, the job makes https call to another service so window
> function may take some time to evaluate/process all events.
> 
> Please refer the following image. In this case, the window was triggered at
> 23:00:00. Checkpoint 12 was triggered soon after that and I notice that
> checkpoint 12 takes long time to complete (compared to other checkpoints
> when window function is not processing events).
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1766/overall_checkpoint_duration_summary_when_waiting_for_window_operator.png>
>  
> 
> Following images shows checkpoint 12 details of window & sink operators.
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1766/window_operator_checkpoint_duration_after_window_interval.png>
>  
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1766/sink_operator_checkpoint_duration_after_window_interval.png>
>  
> 
> I see that the time spent for checkpoint was actually just 5 ms & 8 ms
> (checkpoint duration sync) for window & sink operators. However, End to End
> Duration for checkpoint was 11m 12s for both window & sink operator.
> 
> Is this expected behavior? If yes, do you have any suggestion to reduce the
> end to end checkpoint duration?
> 
> Please let me know if any more information is needed.
> 
> Thanks.
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to