Hi James,
In this scenario, with the restart strategy set, the job should restart
(without YARN/Mesos) as long as you have enough slots available.

Can you check with the web interface on http://<jobmanager>:8081/ that
enough slots are available after killing one TaskManager?

Can you provide JobManager and TaskManager logs and some more details on
the job you are running?


Nico

On 16/01/18 07:04, Data Engineer wrote:
> This question has been asked on StackOverflow:
> https://stackoverflow.com/questions/48262080/how-to-get-automatic-fail-over-working-in-flink
> 
> I am using Apache Flink 1.4 on a cluster of 3 machines, out of which one
> is the JobManager and the other 2 host TaskManagers.
> 
> I start flink in cluster mode and submit a flink job. I have configured
> 24 task slots in the flink config, and for the job I use 6 task slots.
> 
> When I submit the job, I see 3 tasks are assigned to Worker machine 1
> and 3 are assigned to Worker machine 2. Now, when I kill the TaskManager
> on WorkerMachine 2, I see that the entire job fails.
> 
> Is this the expected behaviour, or does it have automatic failover as in
> Spark.
> 
> Do we need to use YARN/Mesos to achieve automatic failover?
> 
> We tried the Restart Strategy, but when it restarts we get an exception
> saying that no task slots are available and then the job fails. We think
> that 24 slots is enough to take over. What could we be doing wrong here?
> 
> Regards,
> James

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to