TL;DR is that you are right, it is only the initial list. If a JobManager comes back with a new IP address, it will be available.
On Wed, Sep 9, 2015 at 8:35 AM, Ufuk Celebi <u...@apache.org> wrote: > > > On 09 Sep 2015, at 04:48, Emmanuel <ele...@msn.com> wrote: > > > > my questions is: how critical is the bootstrap ip list in masters? > > Hey Emmanuel, > > good questions. I read over the docs for this again [1] and you are right > that we should make this clearer. > > The “masters" file is only relevant for the start/stop cluster scripts > (Flink standalone mode). > > If you specify hosts in the “masters" file the start-cluster scripts will > use these hosts to start job managers. After that all coordination happens > via ZooKeeper via a leader election and retrieval service. All job managers > elect a single leader and task managers and clients (submitting programs) > retrieve this leader via ZooKeeper. If a job manager fails and becomes > available again, it will publish itself via this mechanism (if it becomes > leader at some point again). There was a recent PR [2] which introduced > this. You can read over the very good PR description for more info for now. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html > > [2] https://github.com/apache/flink/pull/1016 > > > > does this get updated or does it have to be updated by some other > service? > > If you start a new cluster on GCE with different hosts and use Flink’s > standalone mode you have to set this up again. This is the same for the > “slaves” file. > > > Does this answer your question? If anything is unclear, please post here. > :) > > – Ufuk > >