Currently, changing any --attributes or --resources requires draining the
agent and killing all running tasks.
See https://issues.apache.org/jira/browse/MESOS-1739
You could do a `mesos-slave --recovery=cleanup` which essentially kills all
the tasks and clears the work_dir; then restart with a `mesos-slave
--attributes=new_attributes`
Note that even adding a new attribute is the kind of change that could
cause a framework scheduler to no longer want its task on that node. For
example, you add "public_ip=true" and now my scheduler no longer wants to
run private tasks there. As such, any attribute change needs to notify all
schedulers of the change.


On Mon, Feb 22, 2016 at 2:01 PM, Marco Massenzio <[email protected]>
wrote:

> IIRC you can avoid the issue by either using a different work_dir for the
> agent, or removing (and, possibly, re-creating) it.
>
> I'm afraid I don't have a running instance of Mesos on this machine and
> can't test it out.
>
> Also (and this is strictly my opinion :) I would consider a change of
> attribute a "material" change for the Agent and I would avoid trying to
> recover state from previous runs; but, again, there may be perfectly
> legitimate cases in which this is desirable.
>
> --
> *Marco Massenzio*
> http://codetrips.com
>
> On Mon, Feb 22, 2016 at 12:11 PM, Zhitao Li <[email protected]> wrote:
>
>> Hi,
>>
>> We recently discovered that updating attributes on Mesos agents is a very
>> risk operation, and has a potential to send agent(s) into a crash loop if
>> not done properly with errors like "Failed to perform recovery:
>> Incompatible slave info detected". This combined with --recovery_timeout
>> made the situation even worse.
>>
>> In our setup, some of the attributes are generated from automated
>> configuration management system, so this opens a possibility that "bad"
>> configuration could be left on the machine and causing big trouble on next
>> agent upgrade, if the USR1 signal was not sent on time.
>>
>> Some questions:
>>
>> 1. Does anyone have a good practice recommended on managing these
>> attributes safely?
>> 2. Has Mesos considered to fallback to old metadata if it detects
>> incompatibility, so agents would keep running with old attributes instead
>> of falling into crash loop?
>>
>> Thanks.
>>
>> --
>> Cheers,
>>
>> Zhitao Li
>>
>
>

Reply via email to