Hei, We are tuning some of the flink jobs we have in production and we would like to know what are the best numbers/considerations for checkpoint interval. We have set a default of 30 seconds for checkpoint interval and the checkpoint operation takes around 2 seconds. We have also enabled incremental checkpoint. I understand there is a tradeoff between recovery from failure time vs performance degradation on having an aggressive checkpoint policy but would like to know about what you guys think it is a good compromise.
I read this article as reference: https://shopify.engineering/optimizing-apache-flink-applications-tips But what I would like is some formula or recipe in order to find out the best value for checkpoint interval. Regards, Oscar