Hello Benjamin, Thank you for the reply. I too think that changing the default working directory to something else maybe /var/lib/mesos would be better. In most of the linux distributions, generally /tmp is mounted as tmpfs which is a volatile filesystem.
Sorry that i didn't frame my original question properly. Some months back, when I was running older mesos version probably 0.23.0 and whenever i used to add/edit any attribute, i had to cleanup the working directory. I didn't enable the cgroup isolation. Sorry I don't have logs of that time. Currently, I am running 0.28.0 and when i did the same activity, I can see the older tasks still running and mesos spawns fresh new set of tasks. So there are two copies of tasks running now. I haven't enabled the cgroup isolation. Is there any way i can do the recovery of the older tasks by modifying some defaults. By looking at slave configuration options, I can see that there are two options --strict and --recover but their defaults looks good. On Fri, Apr 1, 2016 at 2:40 AM, Benjamin Mahler <[email protected]> wrote: > I'd recommend not using /tmp to store the meta-information because if > there is a tmpwatch it will remove things that we need for agent recovery. > We probably should change the default --work_dir, or require that the user > specify one. > > It's expected that wiping the work directory will cause the newly started > agent to destroy any orphaned tasks, if cgroup isolation is enabled. Are > you using cgroup isolation? Can you include logs? > > On Fri, Mar 25, 2016 at 6:17 AM, Pradeep Chhetri < > [email protected]> wrote: > >> >> Hello, >> >> I remember when i was running some older mesos version (maybe 0.23.0), >> whenever slave restart used to fail either due to adding some new attribute >> or announcing different resource than default, I used to cleanup the >> /tmp/mesos (mesos working dir) & this used to bring down the existing >> executors/tasks. >> >> Yesterday, I noticed that even after cleaning up /tmp/mesos and starting >> slaves (registered with different slave id) didn't bring down the existing >> executor/tasks. I am running 0.28.0. >> >> I would like to know what has improved in slave recovery process because >> i was assuming that i deleted all the information related to checkpointing >> by cleaning up /tmp/mesos. >> >> -- >> Regards, >> Pradeep Chhetri >> > > -- Regards, Pradeep Chhetri

