Are there any special files (mounts etc) in your slave directory? The logic <https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp#L383> Mesos uses to delete a directory is likely different from the shell utility 'rm'.
On Wed, Jul 8, 2015 at 11:09 AM, Tom Arnfeld <[email protected]> wrote: > In this instance there were three old slave directories, and there are > three log lines in the mesos-slave.INFO file; > > I0708 11:24:52.023453 2425 slave.cpp:3499] Garbage collecting old slave > 20150515-105200-84152492-5050-9915-S46 > I0708 11:24:52.023923 2425 slave.cpp:3499] Garbage collecting old slave > 20150217-184553-67375276-5050-18563-S74 > I0708 11:24:52.023921 2428 gc.cpp:56] Scheduling > '/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S46' for > gc 6.99999972599407days in the future > I0708 11:24:52.054704 2425 slave.cpp:3499] Garbage collecting old slave > 20150515-105200-84152492-5050-9915-S22 > I0708 11:24:52.054723 2424 gc.cpp:56] Scheduling > '/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S74' for > gc 6.99999937182815days in the future > I0708 11:24:52.067934 2425 gc.cpp:56] Scheduling > '/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S22' for > gc 6.99999922252444days in the future > > This happens right after the recovery process finishes after the slave > boots up. I've looked at another slave that's currently at 99% disk > capacity and the slave has been up since 27th May 2015, it also has the > "Garbage collecting old slave" log lines just after boot for ~6 days. > Looking a little deeper in to this slave logs; this looks like an > interesting error; > > W0527 17:35:08.935755 1749 gc.cpp:139] Failed to delete > '/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S72': > Directory not empty > > I think I actually discussed this with BenH a while back, we're running > 0.21.0 on this cluster. > > Anyone else seen this before? Using the standard `rm` unix tool clears out > the directories fine currently, running as the same user as the slave > (root). > > -- > > Tom Arnfeld > Senior Developer // DueDil > > > On Wed, Jul 8, 2015 at 7:00 PM, Vinod Kone <[email protected]> wrote: > >> >> On Wed, Jul 8, 2015 at 10:54 AM, Tom Arnfeld <[email protected]> wrote: >> >>> When this happens the old slave directories appear not to be tracked by >>> the mesos GC process, and stay around indefinitely. Over time if enough >>> full slave restarts happen (say, due to reconfiguration) the disks can be >>> completely filled and the mesos slave won't do anything about it. >>> >> >> This shouldn't happen. Old slave directories should be gc'ed by the slave >> based on their last modification time >> <https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L4059>. >> Do you see any log lines with "Garbage collecting old slave" ? >> >> >

