That seems weird then. A new agent coming up on a new ip and host, shouldn't affect other agents running on different hosts. Can you share master logs that surface the issue?
On Tue, Nov 14, 2017 at 12:51 PM, Dan Leary <[email protected]> wrote: > Just one mesos-master (no zookeeper) with --ip=127.0.0.1 > --hostname=localhost. > In /etc/hosts are > 127.1.1.1 agent1 > 127.1.1.2 agent2 > etc. and mesos-agent gets passed --ip=127.1.1.1 --hostname=agent1 etc. > > > On Tue, Nov 14, 2017 at 3:41 PM, Vinod Kone <[email protected]> wrote: > >> ```Experiments thus far are with a cluster all on a single host, master >> on 127.0.0.1, agents have their own ip's and hostnames and ports.``` >> >> What does this mean? How are all your masters and agents on the same host >> but still get different ips and hostnames? >> >> >> On Tue, Nov 14, 2017 at 12:22 PM, Dan Leary <[email protected]> wrote: >> >>> So I have a bespoke framework that runs under 1.4.0 using the v1 HTTP >>> API, custom executor, checkpointing disabled. >>> When the framework is running happily and a new agent is added to the >>> cluster all the existing executors immediately get terminated. >>> The scheduler is told of the lost executors and tasks and then receives >>> offers about agents old and new and carries on normally. >>> >>> I would expect however that the existing executors should keep running >>> and the scheduler should just receive offers about the new agent. >>> It's as if agent recovery is being performed when the new agent is >>> launched even though no old agent has exited. >>> Experiments thus far are with a cluster all on a single host, master on >>> 127.0.0.1, agents have their own ip's and hostnames and ports. >>> >>> Am I missing a configuration parameter? Or is this correct behavior? >>> >>> -Dan >>> >>> >> >

