Re: Batch Processing Fault Tolerance (DataSet API)

Ovidiu-Cristian MARCU Mon, 22 Feb 2016 09:34:33 -0800

Thank you, Till!

The current (in progress) implementation is considering also the problem 
related to losing the task's slots of the failed node(s), something related to 
[2] ?


[2] https://issues.apache.org/jira/browse/FLINK-3047

Best,
Ovidiu

> On 22 Feb 2016, at 18:13, Till Rohrmann <trohrm...@apache.org> wrote:
> 
> Hi Ovidiu,
> 
> at the moment Flink's batch fault tolerance restarts the whole job in case of 
> a failure. However, parts of the logic to do partial backtracking such as 
> intermediate result partitions and the backtracking algorithm are already 
> implemented or exist as a PR [1]. So we hope to complete the partial 
> backtracking soon.
> 
> [1] https://github.com/apache/flink/pull/640 
> <https://github.com/apache/flink/pull/640>
> 
> Cheers,
> Till
> 
> On Mon, Feb 22, 2016 at 6:00 PM, Ovidiu-Cristian MARCU 
> <ovidiu-cristian.ma...@inria.fr <mailto:ovidiu-cristian.ma...@inria.fr>> 
> wrote:
> Hi
> 
> In case of failure of a node what does it mean 'Fault tolerance for programs 
> in the DataSet API works by retrying failed executions’ [1] ?
> -work already done by the rest of the nodes is not lost, only work of the 
> lost node is recomputed, job execution will continue
> or
> -entire job execution is retried
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html
>  
> <https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html>
> 
> Best,
> Ovidiu 
>

Re: Batch Processing Fault Tolerance (DataSet API)

Reply via email to