Are there any special files (mounts etc) in your slave directory? The logic
<https://github.com/apache/mesos/blob/master/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp#L383>
Mesos uses to delete a directory is likely different from the shell utility
'rm'.

On Wed, Jul 8, 2015 at 11:09 AM, Tom Arnfeld <[email protected]> wrote:

> In this instance there were three old slave directories, and there are
> three log lines in the mesos-slave.INFO file;
>
>  I0708 11:24:52.023453  2425 slave.cpp:3499] Garbage collecting old slave
> 20150515-105200-84152492-5050-9915-S46
> I0708 11:24:52.023923  2425 slave.cpp:3499] Garbage collecting old slave
> 20150217-184553-67375276-5050-18563-S74
> I0708 11:24:52.023921  2428 gc.cpp:56] Scheduling
> '/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S46' for
> gc 6.99999972599407days in the future
> I0708 11:24:52.054704  2425 slave.cpp:3499] Garbage collecting old slave
> 20150515-105200-84152492-5050-9915-S22
> I0708 11:24:52.054723  2424 gc.cpp:56] Scheduling
> '/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S74' for
> gc 6.99999937182815days in the future
> I0708 11:24:52.067934  2425 gc.cpp:56] Scheduling
> '/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S22' for
> gc 6.99999922252444days in the future
>
> This happens right after the recovery process finishes after the slave
> boots up. I've looked at another slave that's currently at 99% disk
> capacity and the slave has been up since 27th May 2015, it also has the
> "Garbage collecting old slave" log lines just after boot for ~6 days.
> Looking a little deeper in to this slave logs; this looks like an
> interesting error;
>
>  W0527 17:35:08.935755  1749 gc.cpp:139] Failed to delete
> '/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S72':
> Directory not empty
>
> I think I actually discussed this with BenH a while back, we're running
> 0.21.0 on this cluster.
>
> Anyone else seen this before? Using the standard `rm` unix tool clears out
> the directories fine currently, running as the same user as the slave
> (root).
>
> --
>
> Tom Arnfeld
> Senior Developer // DueDil
>
>
> On Wed, Jul 8, 2015 at 7:00 PM, Vinod Kone <[email protected]> wrote:
>
>>
>> On Wed, Jul 8, 2015 at 10:54 AM, Tom Arnfeld <[email protected]> wrote:
>>
>>> When this happens the old slave directories appear not to be tracked by
>>> the mesos GC process, and stay around indefinitely. Over time if enough
>>> full slave restarts happen (say, due to reconfiguration) the disks can be
>>> completely filled and the mesos slave won't do anything about it.
>>>
>>
>> This shouldn't happen. Old slave directories should be gc'ed by the slave
>> based on their last modification time
>> <https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L4059>.
>> Do you see any log lines with  "Garbage collecting old slave" ?
>>
>>
>

Reply via email to