Entered https://issues.apache.org/jira/browse/MESOS-941. Thanks everyone for the help!
On Thu, Jan 23, 2014 at 2:03 AM, Vinod Kone <[email protected]> wrote: > Hey Lin. Mind filing a ticket for this issue? This is definitely a bug we > would like to get fixed. > > > @vinodkone > > > On Tue, Jan 21, 2014 at 2:00 PM, Benjamin Mahler < > [email protected]> wrote: > >> TLDR: Specify resources in your *executor*, rather than only in your >> *task*. >> >> No OOM is occurring in the logs. The "triggered" log line is misleading, >> you can see that the notification was merely discarded: >> >> I0121 19:44:07.180585 8577 cgroups_isolator.cpp:1183] OOM notifier is >> triggered for executor default of framework >> 201401171812-2907575306-5050-19011-0020 with uuid >> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed >> I0121 19:44:07.181037 8577 cgroups_isolator.cpp:1188] Discarded OOM >> notifier for executor default of framework >> 201401171812-2907575306-5050-19011-0020 with uuid >> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed >> >> >> This looks like a bug in Mesos. What's happening is that you're launching >> an executor with no resources, consequently before we fork, we attempt to >> update the memory control but we don't call the memory handler since the >> executor has no memory resources: >> >> I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default >> (/home/lin/test-executor) in >> /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed >> with resources for framework 201401171812-2907575306-5050-19011-0020 in >> cgroup >> mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed >> I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup >> controls for executor default of framework >> 201401171812-2907575306-5050-19011-0020 with resources >> I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening >> for OOM events for executor default of framework >> 201401171812-2907575306-5050-19011-0020 >> I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor at >> = 27609 >> >> Then, later, when we are updating the resources for your 128MB task, we >> set the soft limit, but we don't set the hard limit because the following >> buggy check is not satisfied: >> >> // Determine whether to set the hard limit. If this is the first >> // time (info->pid.isNone()), or we're raising the existing limit, >> // then we can update the hard limit safely. Otherwise, if we need >> // to decrease 'memory.limit_in_bytes' we may induce an OOM if too >> // much memory is in use. As a result, we only update the soft >> // limit when the memory reservation is being reduced. This is >> // probably okay if the machine has available resources. >> // TODO(benh): Introduce a MemoryWatcherProcess which monitors the >> // discrepancy between usage and soft limit and introduces a >> // "manual oom" if necessary. >> if (info->pid.isNone() || limit > currentLimit.get()) { >> >> The assumption here was that there would always be an initial call with >> info->pid.isNone(), however, since your executor has no resources we did >> not update the control before forking the executor. And limit was left as >> the inherited value. I've cc'ed Ian Downes on this since he's re-working >> the Isolator, I'll leave it to him to determine whether this is a bug that >> should be filed or not. >> >> >> On Tue, Jan 21, 2014 at 12:51 PM, Lin Zhao <[email protected]> wrote: >> >>> Vinod, >>> >>> Correction to my message, when my job is sleeping below values are 500+ >>> MB as expected. I was looking at the kmem values. OOM notifier is triggered >>> much later when the executor is killed. Would appreciate it if you have an >>> idea where to look. >>> >>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory. >>> usage_in_bytes >>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_ >>> usage_in_bytes >>> >>> >>> On Tue, Jan 21, 2014 at 2:54 PM, Lin Zhao <[email protected]> wrote: >>> >>>> Interesting. Looking at the log, it seems that OOM is fired when the >>>> executor is shut down (19:44:07.180585), which is 300 seconds after the job >>>> launch and memory use. Within the 300 seconds usage_in_bytes and >>>> max_usage_in_bytes are 0. >>>> >>>> Attaching the log. Any idea of the slow OOM? As you can see at >>>> https://gist.github.com/lin-zhao/8544495#file-testexecutor-java-L80, >>>> 512M mem is used before the sleep. >>>> >>>> >>>> On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone <[email protected]> wrote: >>>> >>>>> The way you set task resources looks correct. >>>>> >>>>> Can you paste what the slave logs say regarding the task/executor, >>>>> esp. the lines that are from the cgroups isolator? Also, what is the >>>>> command line of the slave? >>>>> >>>>> >>>>> @vinodkone >>>>> >>>>> >>>>> On Tue, Jan 21, 2014 at 11:18 AM, Lin Zhao <[email protected]> wrote: >>>>> >>>>>> >>>>>> *[lin@mesos2 ~]$ cat >>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.limit_in_bytes >>>>>> * >>>>>> >>>>>> *9223372036854775807* >>>>>> >>>>>> *[lin@mesos2 ~]$ cat >>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.usage_in_bytes >>>>>> * >>>>>> >>>>>> *584146944* >>>>>> >>>>>> *[lin@mesos2 ~]$ cat >>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.max_usage_in_bytes >>>>>> * >>>>>> >>>>>> *585809920* >>>>>> >>>>>> Hmm the limit is weird. Can you find anything wrong about the way my >>>>>> mem is defined? >>>>>> >>>>>> >>>>>> .addResources(Resource.newBuilder() >>>>>> >>>>>> .setName("mem") >>>>>> >>>>>> .setType(Value.Type.SCALAR) >>>>>> >>>>>> >>>>>> .setScalar(Value.Scalar.newBuilder().setValue(128))) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 21, 2014 at 2:02 PM, Vinod Kone <[email protected]>wrote: >>>>>> >>>>>>> Mesos uses >>>>>>> cgroups<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>to >>>>>>> limit cpu and memory. >>>>>>> >>>>>>> It is indeed surprising that your executor in not OOMing when using >>>>>>> more memory than requested. >>>>>>> >>>>>>> Can you tell us what the following values look like in the >>>>>>> executor's cgroup? These are the values the kernel uses to decide >>>>>>> whether >>>>>>> the cgroup is hitting its limit. >>>>>>> >>>>>>> cat >>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.limit_in_bytes >>>>>>> >>>>>>> cat >>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.usage_in_bytes >>>>>>> >>>>>>> cat >>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_usage_in_bytes >>>>>>> >>>>>>> >>>>>>> >>>>>>> @vinodkone >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 21, 2014 at 9:58 AM, Lin Zhao <[email protected]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm new to Mesos and have some questions about resource management. >>>>>>>> I want to understand how Mesos limits resources used by each executors, >>>>>>>> given resources defined in TaskInfo. I did some tests and have seen >>>>>>>> different behavior for different types of resources. It appears that >>>>>>>> Mesos >>>>>>>> caps CPU usage for the executors, but doesn't limit the memory >>>>>>>> accessible >>>>>>>> to each executor. >>>>>>>> >>>>>>>> I created an example java framework, which is largely taken from >>>>>>>> the mesos example: >>>>>>>> >>>>>>>> https://gist.github.com/lin-zhao/8544495 >>>>>>>> >>>>>>>> Basically, >>>>>>>> >>>>>>>> 1. the Scheduler launches tasks with *2* cpus, and *128 mb*memory. >>>>>>>> 2. The executor launches java with *-Xms 1500m* and *-Xmx 1500m*. >>>>>>>> 3. The java executor creates a byte array that uses *512 MB*memory. >>>>>>>> 4. The java executor starts 3 threads that loops forever, which >>>>>>>> potentially uses *3* full cpus. >>>>>>>> >>>>>>>> The framework is launched in a 3 slave Mesos (v0.14.2) cluster and >>>>>>>> finished without error. >>>>>>>> >>>>>>>> CPU: on the slaves, the cpu usage for the TestExecutor process is >>>>>>>> capped at 199%, indicating that Mesos does cap CPU usage. When the >>>>>>>> executor >>>>>>>> are assigned 1 cpu instead of 2, the cpu usage is capped at 99%. >>>>>>>> >>>>>>>> Memory: There is no error thrown. The executors used > 512 MB >>>>>>>> memory and get away with it. >>>>>>>> >>>>>>>> Can someone confirm this? I haven't tested the other resource types >>>>>>>> (ports, disk). Is the behavior documented somewhere? >>>>>>>> >>>>>>>> -- >>>>>>>> Lin Zhao >>>>>>>> >>>>>>>> https://wiki.groupondev.com/Message_Bus >>>>>>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>>>>>> >>>>>>>> Temporarily based in NY >>>>>>>> 33 W 19th St. >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Lin Zhao >>>>>> >>>>>> https://wiki.groupondev.com/Message_Bus >>>>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>>>> >>>>>> Temporarily based in NY >>>>>> 33 W 19th St. >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Lin Zhao >>>> >>>> https://wiki.groupondev.com/Message_Bus >>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>> >>>> Temporarily based in NY >>>> 33 W 19th St. >>>> >>>> >>> >>> >>> -- >>> Lin Zhao >>> >>> https://wiki.groupondev.com/Message_Bus >>> 3101 Park Blvd, Palo Alto, CA 94306 >>> >>> Temporarily based in NY >>> 33 W 19th St. >>> >>> >> > -- Lin Zhao https://wiki.groupondev.com/Message_Bus 3101 Park Blvd, Palo Alto, CA 94306 Temporarily based in NY 33 W 19th St.

