Vinod,

Correction to my message, when my job is sleeping below values are 500+ MB
as expected. I was looking at the kmem values. OOM notifier is triggered
much later when the executor is killed. Would appreciate it if you have an
idea where to look.

cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.
usage_in_bytes
cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_
usage_in_bytes


On Tue, Jan 21, 2014 at 2:54 PM, Lin Zhao <[email protected]> wrote:

> Interesting. Looking at the log, it seems that OOM is fired when the
> executor is shut down (19:44:07.180585), which is 300 seconds after the job
> launch and memory use. Within the 300 seconds usage_in_bytes and
> max_usage_in_bytes are 0.
>
> Attaching the log. Any idea of the slow OOM? As you can see at
> https://gist.github.com/lin-zhao/8544495#file-testexecutor-java-L80, 512M
> mem is used before the sleep.
>
>
> On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone <[email protected]> wrote:
>
>> The way you set task resources looks correct.
>>
>> Can you paste what the slave logs say regarding the task/executor, esp.
>> the lines that are from the cgroups isolator? Also, what is the command
>> line of the slave?
>>
>>
>> @vinodkone
>>
>>
>> On Tue, Jan 21, 2014 at 11:18 AM, Lin Zhao <[email protected]> wrote:
>>
>>>
>>> *[lin@mesos2 ~]$ cat
>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.limit_in_bytes
>>>  *
>>>
>>> *9223372036854775807*
>>>
>>> *[lin@mesos2 ~]$ cat
>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.usage_in_bytes
>>>  *
>>>
>>> *584146944*
>>>
>>> *[lin@mesos2 ~]$ cat
>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.max_usage_in_bytes
>>>  *
>>>
>>> *585809920*
>>>
>>> Hmm the limit is weird. Can you find anything wrong about the way my mem
>>> is defined?
>>>
>>>
>>> .addResources(Resource.newBuilder()
>>>
>>>                                     .setName("mem")
>>>
>>>                                     .setType(Value.Type.SCALAR)
>>>
>>>
>>> .setScalar(Value.Scalar.newBuilder().setValue(128)))
>>>
>>>
>>>
>>>
>>> On Tue, Jan 21, 2014 at 2:02 PM, Vinod Kone <[email protected]> wrote:
>>>
>>>> Mesos uses 
>>>> cgroups<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>to 
>>>> limit cpu and memory.
>>>>
>>>> It is indeed surprising that your executor in not OOMing when using
>>>> more memory than requested.
>>>>
>>>> Can you tell us what the following values look like in the executor's
>>>> cgroup? These are the values the kernel uses to decide whether the cgroup
>>>> is hitting its limit.
>>>>
>>>> cat
>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.limit_in_bytes
>>>>
>>>> cat
>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.usage_in_bytes
>>>>
>>>> cat
>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_usage_in_bytes
>>>>
>>>>
>>>>
>>>> @vinodkone
>>>>
>>>>
>>>> On Tue, Jan 21, 2014 at 9:58 AM, Lin Zhao <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm new to Mesos and have some questions about resource management. I
>>>>> want to understand how Mesos limits resources used by each executors, 
>>>>> given
>>>>> resources defined in TaskInfo. I did some tests and have seen different
>>>>> behavior for different types of resources. It appears that Mesos caps CPU
>>>>> usage for the executors, but doesn't limit the memory accessible to each
>>>>> executor.
>>>>>
>>>>> I created an example java framework, which is largely taken from the
>>>>> mesos example:
>>>>>
>>>>> https://gist.github.com/lin-zhao/8544495
>>>>>
>>>>> Basically,
>>>>>
>>>>> 1. the Scheduler launches tasks with *2* cpus, and *128 mb* memory.
>>>>> 2. The executor launches java with *-Xms 1500m* and *-Xmx 1500m*.
>>>>> 3. The java executor creates a byte array that uses *512 MB* memory.
>>>>> 4. The java executor starts 3 threads that loops forever, which
>>>>> potentially uses *3* full cpus.
>>>>>
>>>>> The framework is launched in a 3 slave Mesos (v0.14.2) cluster and
>>>>> finished without error.
>>>>>
>>>>> CPU: on the slaves, the cpu usage for the TestExecutor process is
>>>>> capped at 199%, indicating that Mesos does cap CPU usage. When the 
>>>>> executor
>>>>> are assigned 1 cpu instead of 2, the cpu usage is capped at 99%.
>>>>>
>>>>> Memory: There is no error thrown. The executors used > 512 MB memory
>>>>> and get away with it.
>>>>>
>>>>> Can someone confirm this? I haven't tested the other resource types
>>>>> (ports, disk). Is the behavior documented somewhere?
>>>>>
>>>>> --
>>>>> Lin Zhao
>>>>>
>>>>> https://wiki.groupondev.com/Message_Bus
>>>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>>>
>>>>> Temporarily based in NY
>>>>> 33 W 19th St.
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Lin Zhao
>>>
>>> https://wiki.groupondev.com/Message_Bus
>>> 3101 Park Blvd, Palo Alto, CA 94306
>>>
>>> Temporarily based in NY
>>> 33 W 19th St.
>>>
>>>
>>
>
>
> --
> Lin Zhao
>
> https://wiki.groupondev.com/Message_Bus
> 3101 Park Blvd, Palo Alto, CA 94306
>
> Temporarily based in NY
> 33 W 19th St.
>
>


-- 
Lin Zhao

https://wiki.groupondev.com/Message_Bus
3101 Park Blvd, Palo Alto, CA 94306

Temporarily based in NY
33 W 19th St.

Reply via email to