Stephan Ewen created FLINK-4818: ----------------------------------- Summary: RestartStrategy should track how many failed restore attempts the same checkpoint has and fall back to earlier checkpoints Key: FLINK-4818 URL: https://issues.apache.org/jira/browse/FLINK-4818 Project: Flink Issue Type: Sub-task Components: Distributed Coordination Reporter: Stephan Ewen
The restart strategies can use the exception information from FLINK-4816 to keep track of how often a checkpoint restore has failed. After a certain number of consecutive failures, they should take earlier completed checkpoints as recovery points. It is up to discussion whether the restart strategies are the right place to implement that, or whether this is an orthogonal feature that should go into the checkpoint coordinator (which knows how many checkpoints are available) or a separate class altogether. -- This message was sent by Atlassian JIRA (v6.3.4#6332)