Stephan Ewen created FLINK-4818:
-----------------------------------

             Summary: RestartStrategy should track how many failed restore 
attempts the same checkpoint has and fall back to earlier checkpoints
                 Key: FLINK-4818
                 URL: https://issues.apache.org/jira/browse/FLINK-4818
             Project: Flink
          Issue Type: Sub-task
          Components: Distributed Coordination
            Reporter: Stephan Ewen


The restart strategies can use the exception information from FLINK-4816 to 
keep track of how often a checkpoint restore has failed. After a certain number 
of consecutive failures, they should take earlier completed checkpoints as 
recovery points.

It is up to discussion whether the restart strategies are the right place to 
implement that, or whether this is an orthogonal feature that should go into 
the checkpoint coordinator (which knows how many checkpoints are available) or 
a separate class altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to