TLDR: Specify resources in your *executor*, rather than only in your *task*.
No OOM is occurring in the logs. The "triggered" log line is misleading,
you can see that the notification was merely discarded:
I0121 19:44:07.180585 8577 cgroups_isolator.cpp:1183] OOM notifier is
triggered for executor default of framework
201401171812-2907575306-5050-19011-0020 with uuid
8bc2ab10-8988-4b22-afa2-3433bbedc3ed
I0121 19:44:07.181037 8577 cgroups_isolator.cpp:1188] Discarded OOM
notifier for executor default of framework
201401171812-2907575306-5050-19011-0020 with uuid
8bc2ab10-8988-4b22-afa2-3433bbedc3ed
This looks like a bug in Mesos. What's happening is that you're launching
an executor with no resources, consequently before we fork, we attempt to
update the memory control but we don't call the memory handler since the
executor has no memory resources:
I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default
(/home/lin/test-executor) in
/tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
with resources for framework 201401171812-2907575306-5050-19011-0020 in
cgroup
mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup
controls for executor default of framework
201401171812-2907575306-5050-19011-0020 with resources
I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening
for OOM events for executor default of framework
201401171812-2907575306-5050-19011-0020
I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor at =
27609
Then, later, when we are updating the resources for your 128MB task, we set
the soft limit, but we don't set the hard limit because the following buggy
check is not satisfied:
// Determine whether to set the hard limit. If this is the first
// time (info->pid.isNone()), or we're raising the existing limit,
// then we can update the hard limit safely. Otherwise, if we need
// to decrease 'memory.limit_in_bytes' we may induce an OOM if too
// much memory is in use. As a result, we only update the soft
// limit when the memory reservation is being reduced. This is
// probably okay if the machine has available resources.
// TODO(benh): Introduce a MemoryWatcherProcess which monitors the
// discrepancy between usage and soft limit and introduces a
// "manual oom" if necessary.
if (info->pid.isNone() || limit > currentLimit.get()) {
The assumption here was that there would always be an initial call with
info->pid.isNone(), however, since your executor has no resources we did
not update the control before forking the executor. And limit was left as
the inherited value. I've cc'ed Ian Downes on this since he's re-working
the Isolator, I'll leave it to him to determine whether this is a bug that
should be filed or not.
On Tue, Jan 21, 2014 at 12:51 PM, Lin Zhao <[email protected]> wrote:
> Vinod,
>
> Correction to my message, when my job is sleeping below values are 500+ MB
> as expected. I was looking at the kmem values. OOM notifier is triggered
> much later when the executor is killed. Would appreciate it if you have an
> idea where to look.
>
> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.
> usage_in_bytes
> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_
> usage_in_bytes
>
>
> On Tue, Jan 21, 2014 at 2:54 PM, Lin Zhao <[email protected]> wrote:
>
>> Interesting. Looking at the log, it seems that OOM is fired when the
>> executor is shut down (19:44:07.180585), which is 300 seconds after the job
>> launch and memory use. Within the 300 seconds usage_in_bytes and
>> max_usage_in_bytes are 0.
>>
>> Attaching the log. Any idea of the slow OOM? As you can see at
>> https://gist.github.com/lin-zhao/8544495#file-testexecutor-java-L80,
>> 512M mem is used before the sleep.
>>
>>
>> On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone <[email protected]> wrote:
>>
>>> The way you set task resources looks correct.
>>>
>>> Can you paste what the slave logs say regarding the task/executor, esp.
>>> the lines that are from the cgroups isolator? Also, what is the command
>>> line of the slave?
>>>
>>>
>>> @vinodkone
>>>
>>>
>>> On Tue, Jan 21, 2014 at 11:18 AM, Lin Zhao <[email protected]> wrote:
>>>
>>>>
>>>> *[lin@mesos2 ~]$ cat
>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.limit_in_bytes
>>>> *
>>>>
>>>> *9223372036854775807*
>>>>
>>>> *[lin@mesos2 ~]$ cat
>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.usage_in_bytes
>>>> *
>>>>
>>>> *584146944*
>>>>
>>>> *[lin@mesos2 ~]$ cat
>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.max_usage_in_bytes
>>>> *
>>>>
>>>> *585809920*
>>>>
>>>> Hmm the limit is weird. Can you find anything wrong about the way my
>>>> mem is defined?
>>>>
>>>>
>>>> .addResources(Resource.newBuilder()
>>>>
>>>> .setName("mem")
>>>>
>>>> .setType(Value.Type.SCALAR)
>>>>
>>>>
>>>> .setScalar(Value.Scalar.newBuilder().setValue(128)))
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jan 21, 2014 at 2:02 PM, Vinod Kone <[email protected]> wrote:
>>>>
>>>>> Mesos uses
>>>>> cgroups<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>to
>>>>> limit cpu and memory.
>>>>>
>>>>> It is indeed surprising that your executor in not OOMing when using
>>>>> more memory than requested.
>>>>>
>>>>> Can you tell us what the following values look like in the executor's
>>>>> cgroup? These are the values the kernel uses to decide whether the cgroup
>>>>> is hitting its limit.
>>>>>
>>>>> cat
>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.limit_in_bytes
>>>>>
>>>>> cat
>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.usage_in_bytes
>>>>>
>>>>> cat
>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_usage_in_bytes
>>>>>
>>>>>
>>>>>
>>>>> @vinodkone
>>>>>
>>>>>
>>>>> On Tue, Jan 21, 2014 at 9:58 AM, Lin Zhao <[email protected]> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm new to Mesos and have some questions about resource management. I
>>>>>> want to understand how Mesos limits resources used by each executors,
>>>>>> given
>>>>>> resources defined in TaskInfo. I did some tests and have seen different
>>>>>> behavior for different types of resources. It appears that Mesos caps CPU
>>>>>> usage for the executors, but doesn't limit the memory accessible to each
>>>>>> executor.
>>>>>>
>>>>>> I created an example java framework, which is largely taken from the
>>>>>> mesos example:
>>>>>>
>>>>>> https://gist.github.com/lin-zhao/8544495
>>>>>>
>>>>>> Basically,
>>>>>>
>>>>>> 1. the Scheduler launches tasks with *2* cpus, and *128 mb* memory.
>>>>>> 2. The executor launches java with *-Xms 1500m* and *-Xmx 1500m*.
>>>>>> 3. The java executor creates a byte array that uses *512 MB* memory.
>>>>>> 4. The java executor starts 3 threads that loops forever, which
>>>>>> potentially uses *3* full cpus.
>>>>>>
>>>>>> The framework is launched in a 3 slave Mesos (v0.14.2) cluster and
>>>>>> finished without error.
>>>>>>
>>>>>> CPU: on the slaves, the cpu usage for the TestExecutor process is
>>>>>> capped at 199%, indicating that Mesos does cap CPU usage. When the
>>>>>> executor
>>>>>> are assigned 1 cpu instead of 2, the cpu usage is capped at 99%.
>>>>>>
>>>>>> Memory: There is no error thrown. The executors used > 512 MB memory
>>>>>> and get away with it.
>>>>>>
>>>>>> Can someone confirm this? I haven't tested the other resource types
>>>>>> (ports, disk). Is the behavior documented somewhere?
>>>>>>
>>>>>> --
>>>>>> Lin Zhao
>>>>>>
>>>>>> https://wiki.groupondev.com/Message_Bus
>>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>>
>>>>>> Temporarily based in NY
>>>>>> 33 W 19th St.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Lin Zhao
>>>>
>>>> https://wiki.groupondev.com/Message_Bus
>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>
>>>> Temporarily based in NY
>>>> 33 W 19th St.
>>>>
>>>>
>>>
>>
>>
>> --
>> Lin Zhao
>>
>> https://wiki.groupondev.com/Message_Bus
>> 3101 Park Blvd, Palo Alto, CA 94306
>>
>> Temporarily based in NY
>> 33 W 19th St.
>>
>>
>
>
> --
> Lin Zhao
>
> https://wiki.groupondev.com/Message_Bus
> 3101 Park Blvd, Palo Alto, CA 94306
>
> Temporarily based in NY
> 33 W 19th St.
>
>