Re: How Mesos limits resources used by the executors

Lin Zhao Thu, 23 Jan 2014 08:58:20 -0800

More resources question: how does Mesos control "ports" and "disk"
resources? I started a framework that claims port1, yet listens to port2,
and doesn't have problem doing so. And it claims 10 units (mb, i assume) of
disk, then writes 512 mb data to the work directory, and succeeds too. Is
this expected? I can provide source/log if requested.



On Thu, Jan 23, 2014 at 11:10 AM, Lin Zhao <[email protected]> wrote:

> Entered https://issues.apache.org/jira/browse/MESOS-941. Thanks everyone
> for the help!
>
>
> On Thu, Jan 23, 2014 at 2:03 AM, Vinod Kone <[email protected]> wrote:
>
>> Hey Lin. Mind filing a ticket for this issue? This is definitely a bug we
>> would like to get fixed.
>>
>>
>> @vinodkone
>>
>>
>> On Tue, Jan 21, 2014 at 2:00 PM, Benjamin Mahler <
>> [email protected]> wrote:
>>
>>> TLDR: Specify resources in your *executor*, rather than only in your
>>> *task*.
>>>
>>> No OOM is occurring in the logs. The "triggered" log line is misleading,
>>> you can see that the notification was merely discarded:
>>>
>>> I0121 19:44:07.180585  8577 cgroups_isolator.cpp:1183] OOM notifier is
>>> triggered for executor default of framework
>>> 201401171812-2907575306-5050-19011-0020 with uuid
>>> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed
>>> I0121 19:44:07.181037  8577 cgroups_isolator.cpp:1188] Discarded OOM
>>> notifier for executor default of framework
>>> 201401171812-2907575306-5050-19011-0020 with uuid
>>> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed
>>>
>>>
>>> This looks like a bug in Mesos. What's happening is that you're
>>> launching an executor with no resources, consequently before we fork, we
>>> attempt to update the memory control but we don't call the memory handler
>>> since the executor has no memory resources:
>>>
>>> I0121 19:39:01.660071  8566 cgroups_isolator.cpp:516] Launching default
>>> (/home/lin/test-executor) in
>>> /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
>>> with resources  for framework 201401171812-2907575306-5050-19011-0020 in
>>> cgroup
>>> mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
>>> I0121 19:39:01.663082  8566 cgroups_isolator.cpp:709] Changing cgroup
>>> controls for executor default of framework
>>> 201401171812-2907575306-5050-19011-0020 with resources
>>> I0121 19:39:01.667129  8566 cgroups_isolator.cpp:1163] Started listening
>>> for OOM events for executor default of framework
>>> 201401171812-2907575306-5050-19011-0020
>>> I0121 19:39:01.681857  8566 cgroups_isolator.cpp:568] Forked executor
>>> at = 27609
>>>
>>> Then, later, when we are updating the resources for your 128MB task, we
>>> set the soft limit, but we don't set the hard limit because the following
>>> buggy check is not satisfied:
>>>
>>>   // Determine whether to set the hard limit. If this is the first
>>>   // time (info->pid.isNone()), or we're raising the existing limit,
>>>   // then we can update the hard limit safely. Otherwise, if we need
>>>   // to decrease 'memory.limit_in_bytes' we may induce an OOM if too
>>>   // much memory is in use. As a result, we only update the soft
>>>   // limit when the memory reservation is being reduced. This is
>>>   // probably okay if the machine has available resources.
>>>   // TODO(benh): Introduce a MemoryWatcherProcess which monitors the
>>>   // discrepancy between usage and soft limit and introduces a
>>>   // "manual oom" if necessary.
>>>   if (info->pid.isNone() || limit > currentLimit.get()) {
>>>
>>> The assumption here was that there would always be an initial call with
>>> info->pid.isNone(), however, since your executor has no resources we did
>>> not update the control before forking the executor. And limit was left as
>>> the inherited value. I've cc'ed Ian Downes on this since he's re-working
>>> the Isolator, I'll leave it to him to determine whether this is a bug that
>>> should be filed or not.
>>>
>>>
>>> On Tue, Jan 21, 2014 at 12:51 PM, Lin Zhao <[email protected]> wrote:
>>>
>>>> Vinod,
>>>>
>>>> Correction to my message, when my job is sleeping below values are 500+
>>>> MB as expected. I was looking at the kmem values. OOM notifier is triggered
>>>> much later when the executor is killed. Would appreciate it if you have an
>>>> idea where to look.
>>>>
>>>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.
>>>> usage_in_bytes
>>>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_
>>>> usage_in_bytes
>>>>
>>>>
>>>> On Tue, Jan 21, 2014 at 2:54 PM, Lin Zhao <[email protected]> wrote:
>>>>
>>>>> Interesting. Looking at the log, it seems that OOM is fired when the
>>>>> executor is shut down (19:44:07.180585), which is 300 seconds after the 
>>>>> job
>>>>> launch and memory use. Within the 300 seconds usage_in_bytes and
>>>>> max_usage_in_bytes are 0.
>>>>>
>>>>> Attaching the log. Any idea of the slow OOM? As you can see at
>>>>> https://gist.github.com/lin-zhao/8544495#file-testexecutor-java-L80,
>>>>> 512M mem is used before the sleep.
>>>>>
>>>>>
>>>>> On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone <[email protected]> wrote:
>>>>>
>>>>>> The way you set task resources looks correct.
>>>>>>
>>>>>> Can you paste what the slave logs say regarding the task/executor,
>>>>>> esp. the lines that are from the cgroups isolator? Also, what is the
>>>>>> command line of the slave?
>>>>>>
>>>>>>
>>>>>> @vinodkone
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 21, 2014 at 11:18 AM, Lin Zhao <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>> *[lin@mesos2 ~]$ cat
>>>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.limit_in_bytes
>>>>>>>  *
>>>>>>>
>>>>>>> *9223372036854775807*
>>>>>>>
>>>>>>> *[lin@mesos2 ~]$ cat
>>>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.usage_in_bytes
>>>>>>>  *
>>>>>>>
>>>>>>> *584146944*
>>>>>>>
>>>>>>> *[lin@mesos2 ~]$ cat
>>>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.max_usage_in_bytes
>>>>>>>  *
>>>>>>>
>>>>>>> *585809920*
>>>>>>>
>>>>>>> Hmm the limit is weird. Can you find anything wrong about the way my
>>>>>>> mem is defined?
>>>>>>>
>>>>>>>
>>>>>>> .addResources(Resource.newBuilder()
>>>>>>>
>>>>>>>                                     .setName("mem")
>>>>>>>
>>>>>>>                                     .setType(Value.Type.SCALAR)
>>>>>>>
>>>>>>>
>>>>>>> .setScalar(Value.Scalar.newBuilder().setValue(128)))
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 21, 2014 at 2:02 PM, Vinod Kone <[email protected]>wrote:
>>>>>>>
>>>>>>>> Mesos uses 
>>>>>>>> cgroups<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>to
>>>>>>>>  limit cpu and memory.
>>>>>>>>
>>>>>>>> It is indeed surprising that your executor in not OOMing when using
>>>>>>>> more memory than requested.
>>>>>>>>
>>>>>>>> Can you tell us what the following values look like in the
>>>>>>>> executor's cgroup? These are the values the kernel uses to decide 
>>>>>>>> whether
>>>>>>>> the cgroup is hitting its limit.
>>>>>>>>
>>>>>>>> cat
>>>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.limit_in_bytes
>>>>>>>>
>>>>>>>> cat
>>>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.usage_in_bytes
>>>>>>>>
>>>>>>>> cat
>>>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_usage_in_bytes
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> @vinodkone
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 21, 2014 at 9:58 AM, Lin Zhao <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I'm new to Mesos and have some questions about resource
>>>>>>>>> management. I want to understand how Mesos limits resources used by 
>>>>>>>>> each
>>>>>>>>> executors, given resources defined in TaskInfo. I did some tests and 
>>>>>>>>> have
>>>>>>>>> seen different behavior for different types of resources. It appears 
>>>>>>>>> that
>>>>>>>>> Mesos caps CPU usage for the executors, but doesn't limit the memory
>>>>>>>>> accessible to each executor.
>>>>>>>>>
>>>>>>>>> I created an example java framework, which is largely taken from
>>>>>>>>> the mesos example:
>>>>>>>>>
>>>>>>>>> https://gist.github.com/lin-zhao/8544495
>>>>>>>>>
>>>>>>>>> Basically,
>>>>>>>>>
>>>>>>>>> 1. the Scheduler launches tasks with *2* cpus, and *128 mb*memory.
>>>>>>>>> 2. The executor launches java with *-Xms 1500m* and *-Xmx 1500m*.
>>>>>>>>> 3. The java executor creates a byte array that uses *512 MB*memory.
>>>>>>>>> 4. The java executor starts 3 threads that loops forever, which
>>>>>>>>> potentially uses *3* full cpus.
>>>>>>>>>
>>>>>>>>> The framework is launched in a 3 slave Mesos (v0.14.2) cluster and
>>>>>>>>> finished without error.
>>>>>>>>>
>>>>>>>>> CPU: on the slaves, the cpu usage for the TestExecutor process is
>>>>>>>>> capped at 199%, indicating that Mesos does cap CPU usage. When the 
>>>>>>>>> executor
>>>>>>>>> are assigned 1 cpu instead of 2, the cpu usage is capped at 99%.
>>>>>>>>>
>>>>>>>>> Memory: There is no error thrown. The executors used > 512 MB
>>>>>>>>> memory and get away with it.
>>>>>>>>>
>>>>>>>>> Can someone confirm this? I haven't tested the other resource
>>>>>>>>> types (ports, disk). Is the behavior documented somewhere?
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Lin Zhao
>>>>>>>>>
>>>>>>>>> https://wiki.groupondev.com/Message_Bus
>>>>>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>>>>>
>>>>>>>>> Temporarily based in NY
>>>>>>>>> 33 W 19th St.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Lin Zhao
>>>>>>>
>>>>>>> https://wiki.groupondev.com/Message_Bus
>>>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>>>
>>>>>>> Temporarily based in NY
>>>>>>> 33 W 19th St.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Lin Zhao
>>>>>
>>>>> https://wiki.groupondev.com/Message_Bus
>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>
>>>>> Temporarily based in NY
>>>>> 33 W 19th St.
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Lin Zhao
>>>>
>>>> https://wiki.groupondev.com/Message_Bus
>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>
>>>> Temporarily based in NY
>>>> 33 W 19th St.
>>>>
>>>>
>>>
>>
>
>
> --
> Lin Zhao
>
> https://wiki.groupondev.com/Message_Bus
> 3101 Park Blvd, Palo Alto, CA 94306
>
> Temporarily based in NY
> 33 W 19th St.
>
>


-- 
Lin Zhao

https://wiki.groupondev.com/Message_Bus
3101 Park Blvd, Palo Alto, CA 94306

Temporarily based in NY
33 W 19th St.

Re: How Mesos limits resources used by the executors

Reply via email to