Re: Force a slave to garbage collect framework/executors

Benjamin Mahler Thu, 31 Jul 2014 17:38:19 -0700

Everything is scheduled for the garbage collection delay (1 week by
default) from when it was last modified, but as the disk fills up we'll
start pruning the older directories ahead of schedule.


This means that things should be removed in the same order that they were
scheduled.

You can think of this as follows, everything gets scheduled for 1 week in
the future, but we'll "speed up" the existing schedule when we need to make
room. Make sense?


On Thu, Jul 31, 2014 at 4:18 PM, Tom Arnfeld <[email protected]> wrote:

> Yeah, specifically the docker issue was related to volumes not being
> removed with `docker rm` but that's a separate issue.
>
> So right now mesos won't remove older work directories to make room for
> new ones (old ones that have already been scheduled for removal in a few
> days time)? This means when the disk gets quite full, newer work
> directories will be removed much faster than older ones. Is that correct?
>
>
>
> On 31 July 2014 23:56, Benjamin Mahler <[email protected]> wrote:
>
>> Apologies for the lack of documentation, in the default setup, the slave
>> will schedule the work directories for garbage collection when:
>>
>> (1) Executors terminate.
>> (2) The slave recovers and discovers work directories for terminal
>> executors.
>>
>> Sounds like the docker integration code you're using has a bug in this
>> respect, either by not scheduling docker directories for garbage collection
>> during (1) and/or (2).
>>
>>
>> On Thu, Jul 31, 2014 at 3:40 PM, Tom Arnfeld <[email protected]> wrote:
>>
>>> I don't have them to hand now, but I recall it saying something in the
>>> high 90's and 0ns for the max allowed age. I actually found the root cause
>>> of the probably, docker related and out of mesos's control... though i'm
>>> still curious about the expected behaviour of the GC process. It doesn't
>>> seem to be well documented anywhere.
>>>
>>> Tom.
>>>
>>>
>>> On 31 July 2014 23:33, Benjamin Mahler <[email protected]>
>>> wrote:
>>>
>>>> What do the slave logs say?
>>>>
>>>> E.g.
>>>>
>>>> I0731 22:22:17.851347 23525 slave.cpp:2879] Current usage 7.84%. Max
>>>> allowed age: 5.751197441470081days
>>>>
>>>>
>>>> On Wed, Jul 30, 2014 at 8:55 AM, Tom Arnfeld <[email protected]> wrote:
>>>>
>>>>> I'm not sure if this is something already supported by mesos, and if
>>>>> so it'd be great if someone could point me in the right direction.
>>>>>
>>>>> Is there a way of asking a slave to garbage collect old executors
>>>>> manually?
>>>>>
>>>>> Maybe i'm misunderstanding things, but as each executor does (insert
>>>>> knowledge gap) mesos works out how long it is able to keep the sandbox for
>>>>> and schedules it for garbage collection appropriately, also taking into
>>>>> account the command line
>>>>>
>>>>> The disk on one of my slaves is getting quite full (98%) and i'm
>>>>> curious how mesos is going to behave in this situation. Should it start
>>>>> clearing things up, given a task could launch that needs to use an amount
>>>>> of disk space, but that disk is being eaten up by old executor sandboxes.
>>>>>
>>>>> It may be worth noting i'm not specifying --gc_delay on any slave
>>>>> right now, perhaps I should be?
>>>>>
>>>>> Any input would be much appreciated.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Tom.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Force a slave to garbage collect framework/executors

Reply via email to