here is a hacking way to fix it in the current version. backup the boot_id(it should exist in your $work_dir/meta/boot_id) file when mesos agent(or slave) start, and restore it with the backup file when agent/slave restart, slave id will not change. it works fine for ours cluster.
i hope it could help you. 2016-11-15 23:37 GMT+08:00 Megha Sharma <[email protected]>: > Hi All, > > We have been working on the design for Restartable tasks ( > MESOS-3545) and allowing agents to recover and re-register post reboot is a > pre-requisite for that. > Agent today doesn’t recover its state that includes its SlaveID post a > host reboot, it short-circuits the recovery upon discovering the reboot and > registers with the master as a new agent. With Partition Awareness, the > mesos master even allows agents which have failed master’s health check > pings (unreachable agents) to re-register with it and reconcile the > tasks/executors. The executors on a rebooted host are anyway terminated so > there is no harm in letting such an agent recover and re-register with the > master using its old SlaveID. > Would like to hear from the folks here if you see any operational concerns > with letting the agents recover post a host reboot. > > MESOS JIRA: https://issues.apache.org/jira/browse/MESOS-6223 > > Many Thanks > Megha Sharma > > >

