Hi Yuval, If the pipeline fails before the next checkpoint all the records in the buffer should be replayed beginning from the last taken checkpoint. The replay usually starts from the source and reading records again from the external system. The assumption is always that after a successful checkpoint all the records received until this point do not need to be replayed.
You are right when it comes to the overall guarantee of the pipeline it is bounded by the lowest guarantee of any operator in your pipeline. If for example your custom stateful operator can loose records during a recovery then your pipeline cannot guarantee anything. I think the situation you are concerned cannot happen if everything is implemented correctly because on recovery the stateful operator will resend records based on the recovered state from the last checkpoint. Best, Fabian