Hi Ovidiu, at the moment Flink's batch fault tolerance restarts the whole job in case of a failure. However, parts of the logic to do partial backtracking such as intermediate result partitions and the backtracking algorithm are already implemented or exist as a PR [1]. So we hope to complete the partial backtracking soon.
[1] https://github.com/apache/flink/pull/640 Cheers, Till On Mon, Feb 22, 2016 at 6:00 PM, Ovidiu-Cristian MARCU < ovidiu-cristian.ma...@inria.fr> wrote: > Hi > > In case of failure of a node what does it mean 'Fault tolerance for > programs in the *DataSet API* works by retrying failed executions’ [1] ? > -work already done by the rest of the nodes is not lost, only work of the > lost node is recomputed, job execution will continue > or > -entire job execution is retried > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/fault_tolerance.html > > Best, > Ovidiu >