RE: Flink HA mode

Emmanuel Wed, 09 Sep 2015 16:12:59 -0700

Been playing with the HA...I find the UIs confusing here: in the dashboard on 
one side I see 0 slots 0 taskmanagers, but a job running, while on the other 
side I see my taskmanagers and slots but no jobs... putting the UI being a 
proxy, it's load balanced to the JM, so I can't tell which is which (my bad) 
but how would I be able to tell which is the leader anyway? Do I need to query 
ZK to know?
Looking at ZK, I actually don't see any keys for Flink... so I'm wondering if 
this is working as expected.The config lists the optionsrecovery.mode: zookeeper
ha.zookeeper.quorum: zookeeper:2181but I don't see any mention of Zookeeper in 
the JM logs.
I start the JMs and TMs manually with ./jobmanager.sh start 
streaming./taskmanager.sh start streaming
so i guess there is no need for the masters file.
What else am I missing here?Do I need to list the multiple JMs IPs in the 
jobmanager.rpc.address?
Thanks
Date: Wed, 9 Sep 2015 10:19:36 +0200
Subject: Re: Flink HA mode
From: trohrm...@apache.org
To: user@flink.apache.org

The only necessary information for the JobManager and TaskManager is to know
where to find the ZooKeeper quorum to do leader election and retrieve the
leader address from. This will be configured via the config parameter
`ha.zookeeper.quorum`.
On Wed, Sep 9, 2015 at 10:15 AM, Stephan Ewen <se...@apache.org> wrote:
TL;DR is that you are right, it is only the initial list. If a JobManager comes
back with a new IP address, it will be available.
On Wed, Sep 9, 2015 at 8:35 AM, Ufuk Celebi <u...@apache.org> wrote:

> On 09 Sep 2015, at 04:48, Emmanuel <ele...@msn.com> wrote:

> my questions is: how critical is the bootstrap ip list in masters?

Hey Emmanuel,

good questions. I read over the docs for this again [1] and you are right that
we should make this clearer.

The “masters" file is only relevant for the start/stop cluster scripts (Flink
standalone mode).

If you specify hosts in the “masters" file the start-cluster scripts will use
these hosts to start job managers. After that all coordination happens via
ZooKeeper via a leader election and retrieval service. All job managers elect a
single leader and task managers and clients (submitting programs) retrieve this
leader via ZooKeeper. If a job manager fails and becomes available again, it
will publish itself via this mechanism (if it becomes leader at some point
again). There was a recent PR [2] which introduced this. You can read over the
very good PR description for more info for now.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html

[2] https://github.com/apache/flink/pull/1016

> does this get updated or does it have to be updated by some other service?

If you start a new cluster on GCE with different hosts and use Flink’s
standalone mode you have to set this up again. This is the same for the
“slaves” file.

Does this answer your question? If anything is unclear, please post here. :)

– Ufuk

RE: Flink HA mode

Reply via email to