Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-12-12 Thread Joris Van Remoortere
> > So one thing that was brought up during offline conversations was that if > the host reboot is associated with hardware change (e.g., a new memory > stick): >- With the change: the agent could run into incompatible agent info >due to resource change and flap > >

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-12-04 Thread haosdent
> we can have the agent remove `rm -f /meta/slaves/latest` automatically upon recovery failure but only after the host has rebooted. This sounds dangerous. When the different of AgentInfo is caused by operator's typo, I think the operator would prefer to correct them and try to start agent again. R

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-29 Thread tommy xiao
agree with james's options. 2016-11-30 0:48 GMT+08:00 James Peach : > > > On Nov 28, 2016, at 6:09 PM, Yan Xu wrote: > > > > So one thing that was brought up during offline conversations was that > if the host reboot is associated with hardware change (e.g., a new memory > stick): > > > >

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-29 Thread James Peach
> On Nov 28, 2016, at 6:09 PM, Yan Xu wrote: > > So one thing that was brought up during offline conversations was that if the > host reboot is associated with hardware change (e.g., a new memory stick): > > • Currently: the agent would skip the recovery (and the chance of > running int

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-28 Thread Yan Xu
So one thing that was brought up during offline conversations was that if the host reboot is associated with hardware change (e.g., a new memory stick): - Currently: the agent would skip the recovery (and the chance of running into incompatible agent info) and register as a new agent. -

Re: MESOS-6233 Allow agents to re-register post a host reboot

2016-11-21 Thread X Brick
here is a hacking way to fix it in the current version. backup the boot_id(it should exist in your $work_dir/meta/boot_id) file when mesos agent(or slave) start, and restore it with the backup file when agent/slave restart, slave id will not change. it works fine for ours cluster. i hope it could

MESOS-6233 Allow agents to re-register post a host reboot

2016-11-15 Thread Megha Sharma
Hi All, We have been working on the design for Restartable tasks ( MESOS-3545) and allowing agents to recover and re-register post reboot is a pre-requisite for that. Agent today doesn’t recover its state that includes its SlaveID post a host reboot, it short-circuits the recovery upon dis