Re: Cleaning out old mesos-slave sandbox directories

Tom Arnfeld Wed, 08 Jul 2015 11:10:42 -0700

In this instance there were three old slave directories, and there are three 
log lines in the mesos-slave.INFO file;






I0708 11:24:52.023453  2425 slave.cpp:3499] Garbage collecting old slave 
20150515-105200-84152492-5050-9915-S46

I0708 11:24:52.023923  2425 slave.cpp:3499] Garbage collecting old slave 
20150217-184553-67375276-5050-18563-S74

I0708 11:24:52.023921  2428 gc.cpp:56] Scheduling 
'/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S46' for gc 
6.99999972599407days in the future

I0708 11:24:52.054704  2425 slave.cpp:3499] Garbage collecting old slave 
20150515-105200-84152492-5050-9915-S22

I0708 11:24:52.054723  2424 gc.cpp:56] Scheduling 
'/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S74' for gc 
6.99999937182815days in the future

I0708 11:24:52.067934  2425 gc.cpp:56] Scheduling 
'/mnt/mesos/mesos-slave/slaves/20150515-105200-84152492-5050-9915-S22' for gc 
6.99999922252444days in the future




This happens right after the recovery process finishes after the slave boots 
up. I've looked at another slave that's currently at 99% disk capacity and the 
slave has been up since 27th May 2015, it also has the "Garbage collecting old 
slave" log lines just after boot for ~6 days. Looking a little deeper in to 
this slave logs; this looks like an interesting error;





W0527 17:35:08.935755  1749 gc.cpp:139] Failed to delete 
'/mnt/mesos/mesos-slave/slaves/20150217-184553-67375276-5050-18563-S72': 
Directory not empty




I think I actually discussed this with BenH a while back, we're running 0.21.0 
on this cluster.




Anyone else seen this before? Using the standard `rm` unix tool clears out the 
directories fine currently, running as the same user as the slave (root).






--


Tom Arnfeld

Senior Developer // DueDil

On Wed, Jul 8, 2015 at 7:00 PM, Vinod Kone <vinodk...@gmail.com> wrote:

> On Wed, Jul 8, 2015 at 10:54 AM, Tom Arnfeld <t...@duedil.com> wrote:
>> When this happens the old slave directories appear not to be tracked by
>> the mesos GC process, and stay around indefinitely. Over time if enough
>> full slave restarts happen (say, due to reconfiguration) the disks can be
>> completely filled and the mesos slave won't do anything about it.
>>
> This shouldn't happen. Old slave directories should be gc'ed by the slave
> based on their last modification time
> <https://github.com/apache/mesos/blob/master/src/slave/slave.cpp#L4059>. Do
> you see any log lines with  "Garbage collecting old slave" ?

Re: Cleaning out old mesos-slave sandbox directories

Reply via email to