Re: Mesos slave GC clarification

Vinod Kone Thu, 26 Dec 2013 13:27:01 -0800

>
> We have been running a bunch of long-running services and a few cron jobs
> on our 6 node (c1.xlarge) cluster. We added a large number of additional
> cron jobs on the 20th, which maxed out our available resources. I added 3
> more slaves and things seemed to be happy. Since the original 6 slaves were
> mostly allocated to the long-running services, the 3 new slaves ended up
> handling most of the cron tasks. Not sure what your definition of a very
> high rate is, but the new jobs were starting 20 new tasks/min max.
>
> Does each task gets its own executor? Also, are these cron jobs long
lived? What is their lifetime? In other words, in steady state, how many
cron jobs (or more precisely mesos executors) are alive? Note that GC only
affects terminated executors. Also what is the typical size of an executor
sandbox?




> The disk space errors happened to the three new slaves all within a few
> hours of each other early the next morning. Grepping the slave logs as you
> suggested showed that during the last 24 hours, each slave's disk usage
> steadily increased 1% every ~10 minutes until it hit ~76% disk usage. The
> disk space error occurs when starting a new task right after (10-20 sec)
> the last Current usage report.
>
> The growing disk usage makes sense because most of the cron tasks had a
> large slug, and we were being lazy about cleaning up (under the assumption
> that the GC would do it for us), but all three slaves erroring out at 76%
> disk usage (on a 1.7TB mount) seems a little suspect.
>
>
Based on the 1%/minute rate it is indeed surprising that you get disk space
full errors at 76%. Slave calls fs::usage which in turn calls statvfs to
calculate disk utilization (see stout/fs.hpp). IIRC, statvfs uses cached
filesystem information to calculate the disk usage, but given 1%/minute the
rate you mentioned I would be surprised if statvfs shows 76% when the
actual disk utilization is close to 100%.

Is there something else on the filesystem besides the slave work directory?


> Have you (or anyone else on the list) seen anything like this before? Any
> advice on what to do to diagnose this further?
>
> Thanks!
> Tom
>
>
> On Thu, Dec 26, 2013 at 2:26 PM, Vinod Kone <[email protected]> wrote:
>
>> Hi Thomas,
>>
>> The GC in mesos slave works as follows:
>>
>> --> Whenever an executor terminates, its sandbox directory is scheduled
>> for gc for "--gc_delay" seconds into the future by the slave.
>>
>> --> However the slave also periodically ("--disk_watch_interval")
>> monitors the disk utilization and expedites the gc based on the usage.
>>
>> For example if gc_delay is 1 week and the current disk utilization is 80%
>> then instead of waiting for a week to gc a terminated executor's sandbox
>> the slave gc'es it after 16.8 hours (= (1- GC_DISK_HEADROOM - 0.8) *
>> 7days). GC_DISK_HEADROOM is currently set to 0.1.
>>
>> However it might happen that executors are getting launched (and
>> sandboxes created) at a very high rate. In this case the slave might not
>> able to react quickly enough to gc sandboxes.
>>
>> You could grep for "Current usage" in the slave log to see how the disk
>> utilization varies over time.
>>
>> HTH,
>>
>>
>> On Thu, Dec 26, 2013 at 10:56 AM, Thomas Petr <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> We're running Mesos 0.14.0-rc4 on CentOS from the mesosphere repository.
>>> Last week we had an issue where the mesos-slave process died due running
>>> out of disk space. [1]
>>>
>>> The mesos-slave usage docs mention the "[GC] delay may be shorter
>>> depending on the available disk usage." Does anyone have any insight into
>>> how the GC logic works? Is there a configurable threshold percentage or
>>> amount that will force it to clean up more often?
>>>
>>> If the mesos-slave process is going to die due to lack of disk space,
>>> would it make sense for it to attempt one last GC run before giving up?
>>>
>>> Thanks,
>>> Tom
>>>
>>>
>>> [1]
>>> Could not create logging file: No space left on device
>>> COULD NOT CREATE A LOGGINGFILE 20131221-120618.20562!F1221
>>> 12:06:18.978813 20567 paths.hpp:333] CHECK_SOME(mkdir): Failed to create
>>> executor directory
>>> '/usr/share/hubspot/mesos/slaves/201311111611-3792629514-5050-11268-18/frameworks/Singularity11/executors/singularity-ContactsHadoopDynamicListSegJobs-contacts-wal-dynamic-list-seg-refresher-1387627577839-1-littleslash-us_east_1e/runs/457a8df0-baa7-4d22-a5ac-ba5935ea6032'No
>>> space left on device
>>> *** Check failure stack trace: ***
>>> I1221 12:06:19.008946 20564 cgroups_isolator.cpp:1275] Successfully
>>> destroyed cgroup
>>> mesos/framework_Singularity11_executor_singularity-ContactsTasks-parallel-machines:6988:list-intersection-count:1387565552709-1387627447707-1-littleslash-us_east_1e_tag_fc028903-d303-468d-902a-dade8c22e206
>>>     @     0x7f2c806bcb5d  google::LogMessage::Fail()
>>>     @     0x7f2c806c0b77  google::LogMessage::SendToLog()
>>>     @     0x7f2c806be9f9  google::LogMessage::Flush()
>>>     @     0x7f2c806becfd  google::LogMessageFatal::~LogMessageFatal()
>>>     @           0x40f6cf  _CheckSome::~_CheckSome()
>>>     @     0x7f2c804492e3
>>>  mesos::internal::slave::paths::createExecutorDirectory()
>>>     @     0x7f2c80418a6d
>>>  mesos::internal::slave::Framework::launchExecutor()
>>>     @     0x7f2c80419dd3  mesos::internal::slave::Slave::_runTask()
>>>     @     0x7f2c8042d5d1  std::tr1::_Function_handler<>::_M_invoke()
>>>     @     0x7f2c805d3ae8  process::ProcessManager::resume()
>>>     @     0x7f2c805d3e8c  process::schedule()
>>>     @     0x7f2c7fe41851  start_thread
>>>     @     0x7f2c7e78794d  clone
>>>
>>
>>
>

Re: Mesos slave GC clarification

Reply via email to