Re: Question on slave recovery

Pradeep Chhetri Fri, 01 Apr 2016 03:13:46 -0700

Hello Benjamin,

Thank you for the reply. I too think that changing the default working
directory to something else maybe /var/lib/mesos would be better. In most
of the linux distributions, generally /tmp is mounted as tmpfs which is a
volatile filesystem.

Sorry that i didn't frame my original question properly.

Some months back, when I was running older mesos version probably 0.23.0
and whenever i used to add/edit any attribute, i had to cleanup the working
directory. I didn't enable the cgroup isolation. Sorry I don't have logs of
that time.

Currently, I am running 0.28.0 and when i did the same activity, I can see
the older tasks still running and mesos spawns fresh new set of tasks. So
there are two copies of tasks running now. I haven't enabled the cgroup
isolation. Is there any way i can do the recovery of the older tasks by
modifying some defaults.

By looking at slave configuration options, I can see that there are two
options --strict and --recover but their defaults looks good.

On Fri, Apr 1, 2016 at 2:40 AM, Benjamin Mahler <[email protected]> wrote:

> I'd recommend not using /tmp to store the meta-information because if
> there is a tmpwatch it will remove things that we need for agent recovery.
> We probably should change the default --work_dir, or require that the user
> specify one.
>
> It's expected that wiping the work directory will cause the newly started
> agent to destroy any orphaned tasks, if cgroup isolation is enabled. Are
> you using cgroup isolation? Can you include logs?
>
> On Fri, Mar 25, 2016 at 6:17 AM, Pradeep Chhetri <
> [email protected]> wrote:
>
>>
>> Hello,
>>
>> I remember when i was running some older mesos version (maybe 0.23.0),
>> whenever slave restart used to fail either due to adding some new attribute
>> or announcing different resource than default, I used to cleanup the
>> /tmp/mesos (mesos working dir) & this used to bring down the existing
>> executors/tasks.
>>
>> Yesterday, I noticed that even after cleaning up /tmp/mesos and starting
>> slaves (registered with different slave id) didn't bring down the existing
>> executor/tasks. I am running 0.28.0.
>>
>> I would like to know what has improved in slave recovery process because
>> i was assuming that i deleted all the information related to checkpointing
>> by cleaning up /tmp/mesos.
>>
>> --
>> Regards,
>> Pradeep Chhetri
>>
>
>

-- 
Regards,
Pradeep Chhetri

Re: Question on slave recovery

Reply via email to