This question has been asked on StackOverflow:
https://stackoverflow.com/questions/48262080/how-to-get-automatic-fail-over-working-in-flink

I am using Apache Flink 1.4 on a cluster of 3 machines, out of which one is
the JobManager and the other 2 host TaskManagers.

I start flink in cluster mode and submit a flink job. I have configured 24
task slots in the flink config, and for the job I use 6 task slots.

When I submit the job, I see 3 tasks are assigned to Worker machine 1 and 3
are assigned to Worker machine 2. Now, when I kill the TaskManager on
WorkerMachine 2, I see that the entire job fails.

Is this the expected behaviour, or does it have automatic failover as in
Spark.

Do we need to use YARN/Mesos to achieve automatic failover?

We tried the Restart Strategy, but when it restarts we get an exception
saying that no task slots are available and then the job fails. We think
that 24 slots is enough to take over. What could we be doing wrong here?

Regards,
James

Reply via email to