Hello,

I also hit a very similar problem recently, perhaps it is related. There is 
special logic inside of the Mesos agent that checks if the machine has 
rebooted; if it has rebooted it will short circuit the recovery and register 
with a new agent ID. This is especially problematic with the new 
--agent_removal_rate_limit and --recovery_agent_removal_limit flags. We hit a 
power outage and which caused this to happen on every machine in our lab at 
once, since every agent had a new ID 50% of the ids were considered lost and 
these safe guards caused our master to kill itself every 15 minutes even after 
all of the agents were back up and running. Is there any advantage to throwing 
out the agent ID when rebooting?


Thanks,

Justin



________________________________
From: Hendrik Haddorp <[email protected]>
Sent: Tuesday, November 8, 2016 12:59 PM
To: user
Subject: Slave gets new ID

Hi,

when we take slaves down for maintenance, as described in
http://mesos.apache.org/documentation/latest/maintenance/, the slave
[http://mesos.apache.org/assets/img/mesos_logo_fb_preview.png]<http://mesos.apache.org/documentation/latest/maintenance/>

Apache Mesos - Maintenance 
Primitives<http://mesos.apache.org/documentation/latest/maintenance/>
mesos.apache.org
Maintenance Primitives. Operators regularly need to perform maintenance tasks 
on machines that comprise a Mesos cluster. Most Mesos upgrades can be done 
without ...



gets a new ID on start up. Why is that and can it be changed? We are
using Mesos 0.28.2. I'm so far only aware of the
slave_reregister_timeout. Our restart was within that time frame. When
we restart a slave it keeps its ID. However when we wait a few minutes,
less then the reregistration timeout, before we restart the slave the ID
also changes.

regards,
Hendrik

Reply via email to