Re: Old job resurrected during HA failover

Elias Levy Wed, 01 Aug 2018 09:50:39 -0700

Vino,

Thanks for the reply.  Looking in ZK I see:

[zk: localhost:2181(CONNECTED) 5] ls /flink/cluster_1/jobgraphs
[d77948df92813a68ea6dfd6783f40e7e, 2a4eff355aef849c5ca37dbac04f2ff1]

Again we see HA state for job 2a4eff355aef849c5ca37dbac04f2ff1, even though
that job is no longer running (it was canceled while it was in a loop
attempting to restart, but failing because of a lack of cluster slots).

Any idea why that may be the case?

On Wed, Aug 1, 2018 at 8:38 AM vino yang <yanghua1...@gmail.com> wrote:

> If a job is explicitly canceled, its jobgraph node on ZK will be deleted.
> However, it is worth noting here that Flink enables a background thread to
> asynchronously delete the jobGraph node,
> so there may be cases where it cannot be deleted.
> On the other hand, the jobgraph node on ZK is the only basis for the JM
> leader to restore the job.
> There may be an unexpected recovery or an old job resurrection.
>

Re: Old job resurrected during HA failover

Reply via email to