Re: Issues testing Flink HA w/ ZooKeeper

Ufuk Celebi Mon, 15 Feb 2016 04:52:46 -0800

> On 15 Feb 2016, at 13:40, Stefano Baghino <stefano.bagh...@radicalbit.io> 
> wrote:
> 
> Hi Ufuk, thanks for replying. 
> 
> Regarding the masters file: yes, I've specified all the masters and checked 
> out that they were actually running after the start-cluster.sh. I'll gladly 
> share the logs as soon as I get to see them.
> 
> Regarding the state backend: how does having a non-distributed storage as the 
> state backend influence the HA features? I thought it would have meant that 
> the job state couldn't be restored but the job itself could've been started 
> after the backup job manager started. Does not having a reliable distributed 
> storage service as the state backend mean that the HA features don't work?


No, the submitted job is also stored in the state backend and it is recovered 
from there. ZooKeeper has a pointer to the state handle of the configured 
backend. Since all job managers run on the same host it should work as you 
expected. The requirement is that all job managers need to be able to access 
the state backend.

Recovery of a job manager failure is actually independent of the execution 
retries right now.

I think as soon as we have a look at the logs, we will figure it out. ;)

– Ufuk

Re: Issues testing Flink HA w/ ZooKeeper

Reply via email to