[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910534#comment-16910534 ] Stephan Ewen commented on FLINK-4256: - @Thomas - I think the approach to manually split jobs with a pubsub is an option. It could be interesting to add some tooling for that. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.9.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16910168#comment-16910168 ] Thomas Weise commented on FLINK-4256: - Thanks for the clarification, this is excellent news. Perhaps clarify that on FLIP-1? Also, until the even finer grained recovery of streaming jobs becomes available, it may be possible for users to decompose a pipeline into smaller segments with intermediate pubsub topics if partial availability across a shuffle step is needed. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.9.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909980#comment-16909980 ] Stephan Ewen commented on FLINK-4256: - This is in fact working for streaming as well, not only for batch. It works for both on the granularity of "pipelined regions". However, with blocking "batch" shuffles, a batch job decomposes into many small pipelined regions, which can be individually recovered. Streaming programs only decompose into multiple pipelined regions when they do not have an all-to-all shuffle ({{keyBy()}} or {{rebalance()}}). Anything beyond that, like more fine grained recovery of streaming jobs is not in the scope here, because it would need a mechanism different from Flink's current checkpointing mechanism. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.9.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909157#comment-16909157 ] Thomas Weise commented on FLINK-4256: - I'm surprised to see this closed when there is still no support for streaming? Please see [https://lists.apache.org/thread.html/cdb315f4b71a915c4c598b580f71cad11bc3f5bd146b916378765f2a@%3Cdev.flink.apache.org%3E] > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: Runtime / Coordination >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.9.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16247272#comment-16247272 ] Sihua Zhou commented on FLINK-4256: --- Partial recovery seem to more useful for Batch job, for stream job is it also fine to place `local recovery` under this umbrella? [~StephanEwen] > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165537#comment-16165537 ] Eron Wright commented on FLINK-4256: - Is the scope of FLINK-4256 covering batch jobs only? There are various TODOs related to the interplay between local failover and checkpointing, for example. Wondering whether additional subtasks should be opened here or elsewhere. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925615#comment-15925615 ] ASF GitHub Bot commented on FLINK-4256: --- GitHub user shuai-xu opened a pull request: https://github.com/apache/flink/pull/3539 [FLINK-4256] Flip1: fine gained recovery This is an informal pr for the implementation of flip1 version 1. It enable that when a task fail, only restart the minimal pipelined connected executions instead of the whole execution graph. Main changes: 1. ExecutionGraph doesn't manage the failover any more, it only record the finished JobVertex number and turn to FINISHED when all vertexes finish(maybe later FailoverCoordinator will take over this). Its state can only be CREATED, RUNNING, FAILED, FINISHED or SUSPENDED now. 2. FailoverCoordinator will manage the failover now. It will generate several FailoverRegions when the EG is attached. It listens for the fail of executions. When an execution fail, it finds a FailoverRegion to finish the failover. 3. When JM need the EG to be canceled or failed, EG will also notice FailoverCoordinator, FailoverCoordinator will notice all FailoverRegions to cancel their executions and when all executions are canceled, FailoverCoordinator will notice EG to be CANCELED or FAILED. 4. FailoverCoordinator has server state, RUNNING, FAILING, CANCELLING, FAILED, CANCELED. 5. FailoverRegion contains the minimal pipelined connected executions and manager the failover of them. 6. FailoverRegion has CREATED, RUNNING, CANCELLING, CANCELLED. 7. One FailoverRegion may be the succeeding or preceding of others. When a preceding region failover, its all succeedings should failover too. And the succeedings should just reset its executions and wait for the preceding to start it when preceding finish. Preceding should wait for its succeedings to be CREATED and then schedule again. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shuai-xu/flink jira-4256 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3539.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3539 commit 363f1536838064edbdd5f39e41f3f19f6c511fc4 Author: shuai.xus Date: 2017-03-15T03:36:11Z [FLINK-4256] Flip1: fine gained recovery > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures > The detail desgin for version1 is > https://docs.google.com/document/d/1_PqPLA1TJgjlqz8fqnVE3YSisYBDdFsrRX_URgRSj74/edit# -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413090#comment-15413090 ] wenlong.lyu commented on FLINK-4256: thanks for explaining, you are right about pre-computing. Still have another concern, I think it is quite a special case for a job to be ExecutionJobVertex level splittable, it may only happen in batch job graphs with blocking edges in practice. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.2.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407870#comment-15407870 ] Stephan Ewen commented on FLINK-4256: - [~zjwang] Preventing downstream restarts would be a followup optimization. In order to not make this issue here more complicated than it already is, I would first solve this, and then approach this as a separate followup. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.2.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407866#comment-15407866 ] Stephan Ewen commented on FLINK-4256: - [~wenlong.lwl] True, one has to find the entire "connected component" for restart. That one is, however, dynamic, so I would not pre-compute it: - We may introduce best-effort caching that means in some cases the program must backtrack further, in others less - Downstream canceling is only necessary if an input has already been supplied to the downstream task. Especially in batch, this is often not the case and can reduce the tasks to look at for canceling. We can make this quite a bit more efficient in my opinion by operating on {{ExecutionJobVertex}} level for many cases, rather than on each individual vertex. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.2.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405484#comment-15405484 ] Zhijiang Wang commented on FLINK-4256: -- In further improvement, if task c failed, the following downstream tasks like d and e should not be restarted. We already make some works related with it. I am interested in the issue of caching intermediate result, it can solve the problem of restarting upstream tasks of failed one. Is it the PIPELINED_PERSISTENT type in result partition? Wish further plan for it. > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.2.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4256) Fine-grained recovery
[ https://issues.apache.org/jira/browse/FLINK-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15391849#comment-15391849 ] Wenlong Lyu commented on FLINK-4256: hi, Stephan, we have implemented similar solution before, but simple back tracking and forward cannot work well in situation following: Assuming job graph has A/B/C job vertices, A is connected to C with forward strategy, and B is connected to C all-to-all strategy, when a task of A failed, only one C task will be added to restart node set. I suggesting divide the job graph in maximal connected sub-graphs treating the job graph as an undirected graph, when a job graph is submitted. Besides, when the job graph is in large scale because extracting related nodes according to a given node can be time costly and will be repeatedly used in long running, using sub-graphs can avoid the problem > Fine-grained recovery > - > > Key: FLINK-4256 > URL: https://issues.apache.org/jira/browse/FLINK-4256 > Project: Flink > Issue Type: Improvement > Components: JobManager >Affects Versions: 1.1.0 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 1.2.0 > > > When a task fails during execution, Flink currently resets the entire > execution graph and triggers complete re-execution from the last completed > checkpoint. This is more expensive than just re-executing the failed tasks. > In many cases, more fine-grained recovery is possible. > The full description and design is in the corresponding FLIP. > https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures -- This message was sent by Atlassian JIRA (v6.3.4#6332)