Re: Question on slave recovery

Benjamin Mahler Thu, 31 Mar 2016 18:41:26 -0700

I'd recommend not using /tmp to store the meta-information because if there
is a tmpwatch it will remove things that we need for agent recovery. We
probably should change the default --work_dir, or require that the user
specify one.


It's expected that wiping the work directory will cause the newly started
agent to destroy any orphaned tasks, if cgroup isolation is enabled. Are
you using cgroup isolation? Can you include logs?

On Fri, Mar 25, 2016 at 6:17 AM, Pradeep Chhetri <
[email protected]> wrote:

>
> Hello,
>
> I remember when i was running some older mesos version (maybe 0.23.0),
> whenever slave restart used to fail either due to adding some new attribute
> or announcing different resource than default, I used to cleanup the
> /tmp/mesos (mesos working dir) & this used to bring down the existing
> executors/tasks.
>
> Yesterday, I noticed that even after cleaning up /tmp/mesos and starting
> slaves (registered with different slave id) didn't bring down the existing
> executor/tasks. I am running 0.28.0.
>
> I would like to know what has improved in slave recovery process because i
> was assuming that i deleted all the information related to checkpointing by
> cleaning up /tmp/mesos.
>
> --
> Regards,
> Pradeep Chhetri
>

Re: Question on slave recovery

Reply via email to