Re: Flink fault tolerance guarantees

Fabian Paul Wed, 13 Oct 2021 05:13:20 -0700

Hi Yuval,

If the pipeline fails before the next checkpoint all the records in the buffer 
should be replayed beginning from the last taken checkpoint. The 
replay usually starts from the source and reading records again from the 
external system.
The assumption is always that after a successful checkpoint all the records 
received until this point do not need to be replayed.


You are right when it comes to the overall guarantee of the pipeline it is 
bounded by the lowest guarantee of any operator in your pipeline. If for 
example your custom stateful operator can loose records during a recovery then 
your pipeline cannot guarantee anything.

I think the situation you are concerned cannot happen if everything is 
implemented correctly because on recovery the stateful operator will resend 
records based on the recovered state from the last checkpoint.

Best,
Fabian

Re: Flink fault tolerance guarantees

Reply via email to