Re: How Mesos limits resources used by the executors

Vinod Kone Wed, 22 Jan 2014 23:05:33 -0800

Hey Lin. Mind filing a ticket for this issue? This is definitely a bug we
would like to get fixed.



@vinodkone


On Tue, Jan 21, 2014 at 2:00 PM, Benjamin Mahler
<[email protected]>wrote:

> TLDR: Specify resources in your *executor*, rather than only in your
> *task*.
>
> No OOM is occurring in the logs. The "triggered" log line is misleading,
> you can see that the notification was merely discarded:
>
> I0121 19:44:07.180585  8577 cgroups_isolator.cpp:1183] OOM notifier is
> triggered for executor default of framework
> 201401171812-2907575306-5050-19011-0020 with uuid
> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed
> I0121 19:44:07.181037  8577 cgroups_isolator.cpp:1188] Discarded OOM
> notifier for executor default of framework
> 201401171812-2907575306-5050-19011-0020 with uuid
> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed
>
>
> This looks like a bug in Mesos. What's happening is that you're launching
> an executor with no resources, consequently before we fork, we attempt to
> update the memory control but we don't call the memory handler since the
> executor has no memory resources:
>
> I0121 19:39:01.660071  8566 cgroups_isolator.cpp:516] Launching default
> (/home/lin/test-executor) in
> /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
> with resources  for framework 201401171812-2907575306-5050-19011-0020 in
> cgroup
> mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
> I0121 19:39:01.663082  8566 cgroups_isolator.cpp:709] Changing cgroup
> controls for executor default of framework
> 201401171812-2907575306-5050-19011-0020 with resources
> I0121 19:39:01.667129  8566 cgroups_isolator.cpp:1163] Started listening
> for OOM events for executor default of framework
> 201401171812-2907575306-5050-19011-0020
> I0121 19:39:01.681857  8566 cgroups_isolator.cpp:568] Forked executor at
> = 27609
>
> Then, later, when we are updating the resources for your 128MB task, we
> set the soft limit, but we don't set the hard limit because the following
> buggy check is not satisfied:
>
>   // Determine whether to set the hard limit. If this is the first
>   // time (info->pid.isNone()), or we're raising the existing limit,
>   // then we can update the hard limit safely. Otherwise, if we need
>   // to decrease 'memory.limit_in_bytes' we may induce an OOM if too
>   // much memory is in use. As a result, we only update the soft
>   // limit when the memory reservation is being reduced. This is
>   // probably okay if the machine has available resources.
>   // TODO(benh): Introduce a MemoryWatcherProcess which monitors the
>   // discrepancy between usage and soft limit and introduces a
>   // "manual oom" if necessary.
>   if (info->pid.isNone() || limit > currentLimit.get()) {
>
> The assumption here was that there would always be an initial call with
> info->pid.isNone(), however, since your executor has no resources we did
> not update the control before forking the executor. And limit was left as
> the inherited value. I've cc'ed Ian Downes on this since he's re-working
> the Isolator, I'll leave it to him to determine whether this is a bug that
> should be filed or not.
>
>
> On Tue, Jan 21, 2014 at 12:51 PM, Lin Zhao <[email protected]> wrote:
>
>> Vinod,
>>
>> Correction to my message, when my job is sleeping below values are 500+
>> MB as expected. I was looking at the kmem values. OOM notifier is triggered
>> much later when the executor is killed. Would appreciate it if you have an
>> idea where to look.
>>
>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.
>> usage_in_bytes
>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_
>> usage_in_bytes
>>
>>
>> On Tue, Jan 21, 2014 at 2:54 PM, Lin Zhao <[email protected]> wrote:
>>
>>> Interesting. Looking at the log, it seems that OOM is fired when the
>>> executor is shut down (19:44:07.180585), which is 300 seconds after the job
>>> launch and memory use. Within the 300 seconds usage_in_bytes and
>>> max_usage_in_bytes are 0.
>>>
>>> Attaching the log. Any idea of the slow OOM? As you can see at
>>> https://gist.github.com/lin-zhao/8544495#file-testexecutor-java-L80,
>>> 512M mem is used before the sleep.
>>>
>>>
>>> On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone <[email protected]> wrote:
>>>
>>>> The way you set task resources looks correct.
>>>>
>>>> Can you paste what the slave logs say regarding the task/executor, esp.
>>>> the lines that are from the cgroups isolator? Also, what is the command
>>>> line of the slave?
>>>>
>>>>
>>>> @vinodkone
>>>>
>>>>
>>>> On Tue, Jan 21, 2014 at 11:18 AM, Lin Zhao <[email protected]> wrote:
>>>>
>>>>>
>>>>> *[lin@mesos2 ~]$ cat
>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.limit_in_bytes
>>>>>  *
>>>>>
>>>>> *9223372036854775807*
>>>>>
>>>>> *[lin@mesos2 ~]$ cat
>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.usage_in_bytes
>>>>>  *
>>>>>
>>>>> *584146944*
>>>>>
>>>>> *[lin@mesos2 ~]$ cat
>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.max_usage_in_bytes
>>>>>  *
>>>>>
>>>>> *585809920*
>>>>>
>>>>> Hmm the limit is weird. Can you find anything wrong about the way my
>>>>> mem is defined?
>>>>>
>>>>>
>>>>> .addResources(Resource.newBuilder()
>>>>>
>>>>>                                     .setName("mem")
>>>>>
>>>>>                                     .setType(Value.Type.SCALAR)
>>>>>
>>>>>
>>>>> .setScalar(Value.Scalar.newBuilder().setValue(128)))
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jan 21, 2014 at 2:02 PM, Vinod Kone <[email protected]> wrote:
>>>>>
>>>>>> Mesos uses 
>>>>>> cgroups<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>to 
>>>>>> limit cpu and memory.
>>>>>>
>>>>>> It is indeed surprising that your executor in not OOMing when using
>>>>>> more memory than requested.
>>>>>>
>>>>>> Can you tell us what the following values look like in the executor's
>>>>>> cgroup? These are the values the kernel uses to decide whether the cgroup
>>>>>> is hitting its limit.
>>>>>>
>>>>>> cat
>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.limit_in_bytes
>>>>>>
>>>>>> cat
>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.usage_in_bytes
>>>>>>
>>>>>> cat
>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_usage_in_bytes
>>>>>>
>>>>>>
>>>>>>
>>>>>> @vinodkone
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 21, 2014 at 9:58 AM, Lin Zhao <[email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm new to Mesos and have some questions about resource management.
>>>>>>> I want to understand how Mesos limits resources used by each executors,
>>>>>>> given resources defined in TaskInfo. I did some tests and have seen
>>>>>>> different behavior for different types of resources. It appears that 
>>>>>>> Mesos
>>>>>>> caps CPU usage for the executors, but doesn't limit the memory 
>>>>>>> accessible
>>>>>>> to each executor.
>>>>>>>
>>>>>>> I created an example java framework, which is largely taken from the
>>>>>>> mesos example:
>>>>>>>
>>>>>>> https://gist.github.com/lin-zhao/8544495
>>>>>>>
>>>>>>> Basically,
>>>>>>>
>>>>>>> 1. the Scheduler launches tasks with *2* cpus, and *128 mb* memory.
>>>>>>> 2. The executor launches java with *-Xms 1500m* and *-Xmx 1500m*.
>>>>>>> 3. The java executor creates a byte array that uses *512 MB* memory.
>>>>>>> 4. The java executor starts 3 threads that loops forever, which
>>>>>>> potentially uses *3* full cpus.
>>>>>>>
>>>>>>> The framework is launched in a 3 slave Mesos (v0.14.2) cluster and
>>>>>>> finished without error.
>>>>>>>
>>>>>>> CPU: on the slaves, the cpu usage for the TestExecutor process is
>>>>>>> capped at 199%, indicating that Mesos does cap CPU usage. When the 
>>>>>>> executor
>>>>>>> are assigned 1 cpu instead of 2, the cpu usage is capped at 99%.
>>>>>>>
>>>>>>> Memory: There is no error thrown. The executors used > 512 MB memory
>>>>>>> and get away with it.
>>>>>>>
>>>>>>> Can someone confirm this? I haven't tested the other resource types
>>>>>>> (ports, disk). Is the behavior documented somewhere?
>>>>>>>
>>>>>>> --
>>>>>>> Lin Zhao
>>>>>>>
>>>>>>> https://wiki.groupondev.com/Message_Bus
>>>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>>>
>>>>>>> Temporarily based in NY
>>>>>>> 33 W 19th St.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lin Zhao
>>>>>
>>>>> https://wiki.groupondev.com/Message_Bus
>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>
>>>>> Temporarily based in NY
>>>>> 33 W 19th St.
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Lin Zhao
>>>
>>> https://wiki.groupondev.com/Message_Bus
>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>
>>> Temporarily based in NY
>>> 33 W 19th St.
>>>
>>>
>>
>>
>> --
>> Lin Zhao
>>
>> https://wiki.groupondev.com/Message_Bus
>> 3101 Park Blvd, Palo Alto, CA 94306
>>
>> Temporarily based in NY
>> 33 W 19th St.
>>
>>
>

Re: How Mesos limits resources used by the executors

Reply via email to