Thanks for the update Piotr.

The reason it prevents us from using checkpoints is this:
We are relying on the checkpoints to trigger commit of Kafka offsets for our
source (kafka consumers).
When there is no backpressure this works fine. When there is backpressure,
checkpoints fail because they take too long, and our Kafka offsets are never
committed to Kafka brokers (as we just learned the hard way).

Normally there is no backpressure in our jobs, but when there is some
outage, then the jobs do experience 
backpressure when catching up. And when you're already trying to recover
from an incident, that is not the ideal time for kafka offsets commits to
stop working.




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to