Hi Hayden,

in Yarn mode, Flink will tolerate as many TM failures as you have
configured `yarn.maximum-failed-containers`. Per default this is set to the
initial number of requested TMs. So in your case, the Flink cluster would
restart twice a TM and then fail the cluster once a TM fails for the third
time.

Cheers,
Till

On Mon, Dec 4, 2017 at 11:31 AM, Marchant, Hayden <hayden.march...@citi.com>
wrote:

> Hi,
>
> WE are currently start to test Flink running on YARN. Till now, we've been
> testing on Standalone Cluster. One thing lacking in standalone is that we
> have to manually restart a Task Manager if it dies. I looked at
> https://ci.apache.org/projects/flink/flink-docs-
> release-1.3/setup/jobmanager_high_availability.html#yarn-
> cluster-high-availability , and see that YARN deals with HA for Job
> Manager. How does it deal with a Task Manager if it dies? I would like the
> Task Manager to be dealt with similarly to Job Manager on failure. For
> example, let's say I have a cluster with two Task Managers, and one task
> manager dies. Will YARN restart the dead Task Manager, or would that need
> to be a manual restart?
>
> What actually would happen in the above case?
>
> Thanks,
> Hayden
>
>
>

Reply via email to