Hi Emmanuel, yes Master HA is currently under development and only available in 0.10 snapshot. AFAIK, it is almost but not completely done yet.
Best, Fabian On Sep 10, 2015 01:29, "Emmanuel" <ele...@msn.com> wrote: > is this a 0.10 snapshot feature only? I'm using 0.9.1 right now > > > ------------------------------ > From: ele...@msn.com > To: user@flink.apache.org > Subject: RE: Flink HA mode > Date: Wed, 9 Sep 2015 16:11:38 -0700 > > Been playing with the HA... > I find the UIs confusing here: > in the dashboard on one side I see 0 slots 0 taskmanagers, but a job > running, while on the other side I see my taskmanagers and slots but no > jobs... > putting the UI being a proxy, it's load balanced to the JM, so I can't > tell which is which (my bad) but how would I be able to tell which is the > leader anyway? Do I need to query ZK to know? > > Looking at ZK, I actually don't see any keys for Flink... so I'm wondering > if this is working as expected. > The config lists the options > > recovery.mode: zookeeper > ha.zookeeper.quorum: zookeeper:2181 > > but I don't see any mention of Zookeeper in the JM logs. > > I start the JMs and TMs manually with > ./jobmanager.sh start streaming > ./taskmanager.sh start streaming > > so i guess there is no need for the masters file. > > What else am I missing here? > Do I need to list the multiple JMs IPs in the jobmanager.rpc.address? > > Thanks > > ------------------------------ > Date: Wed, 9 Sep 2015 10:19:36 +0200 > Subject: Re: Flink HA mode > From: trohrm...@apache.org > To: user@flink.apache.org > > The only necessary information for the JobManager and TaskManager is to > know where to find the ZooKeeper quorum to do leader election and retrieve > the leader address from. This will be configured via the config parameter > `ha.zookeeper.quorum`. > > On Wed, Sep 9, 2015 at 10:15 AM, Stephan Ewen <se...@apache.org> wrote: > > TL;DR is that you are right, it is only the initial list. If a JobManager > comes back with a new IP address, it will be available. > > On Wed, Sep 9, 2015 at 8:35 AM, Ufuk Celebi <u...@apache.org> wrote: > > > > On 09 Sep 2015, at 04:48, Emmanuel <ele...@msn.com> wrote: > > > > my questions is: how critical is the bootstrap ip list in masters? > > Hey Emmanuel, > > good questions. I read over the docs for this again [1] and you are right > that we should make this clearer. > > The “masters" file is only relevant for the start/stop cluster scripts > (Flink standalone mode). > > If you specify hosts in the “masters" file the start-cluster scripts will > use these hosts to start job managers. After that all coordination happens > via ZooKeeper via a leader election and retrieval service. All job managers > elect a single leader and task managers and clients (submitting programs) > retrieve this leader via ZooKeeper. If a job manager fails and becomes > available again, it will publish itself via this mechanism (if it becomes > leader at some point again). There was a recent PR [2] which introduced > this. You can read over the very good PR description for more info for now. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html > > [2] https://github.com/apache/flink/pull/1016 > > > > does this get updated or does it have to be updated by some other > service? > > If you start a new cluster on GCE with different hosts and use Flink’s > standalone mode you have to set this up again. This is the same for the > “slaves” file. > > > Does this answer your question? If anything is unclear, please post here. > :) > > – Ufuk > > > >