More resources question: how does Mesos control "ports" and "disk" resources? I started a framework that claims port1, yet listens to port2, and doesn't have problem doing so. And it claims 10 units (mb, i assume) of disk, then writes 512 mb data to the work directory, and succeeds too. Is this expected? I can provide source/log if requested.
On Thu, Jan 23, 2014 at 11:10 AM, Lin Zhao <[email protected]> wrote: > Entered https://issues.apache.org/jira/browse/MESOS-941. Thanks everyone > for the help! > > > On Thu, Jan 23, 2014 at 2:03 AM, Vinod Kone <[email protected]> wrote: > >> Hey Lin. Mind filing a ticket for this issue? This is definitely a bug we >> would like to get fixed. >> >> >> @vinodkone >> >> >> On Tue, Jan 21, 2014 at 2:00 PM, Benjamin Mahler < >> [email protected]> wrote: >> >>> TLDR: Specify resources in your *executor*, rather than only in your >>> *task*. >>> >>> No OOM is occurring in the logs. The "triggered" log line is misleading, >>> you can see that the notification was merely discarded: >>> >>> I0121 19:44:07.180585 8577 cgroups_isolator.cpp:1183] OOM notifier is >>> triggered for executor default of framework >>> 201401171812-2907575306-5050-19011-0020 with uuid >>> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed >>> I0121 19:44:07.181037 8577 cgroups_isolator.cpp:1188] Discarded OOM >>> notifier for executor default of framework >>> 201401171812-2907575306-5050-19011-0020 with uuid >>> 8bc2ab10-8988-4b22-afa2-3433bbedc3ed >>> >>> >>> This looks like a bug in Mesos. What's happening is that you're >>> launching an executor with no resources, consequently before we fork, we >>> attempt to update the memory control but we don't call the memory handler >>> since the executor has no memory resources: >>> >>> I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default >>> (/home/lin/test-executor) in >>> /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed >>> with resources for framework 201401171812-2907575306-5050-19011-0020 in >>> cgroup >>> mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed >>> I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup >>> controls for executor default of framework >>> 201401171812-2907575306-5050-19011-0020 with resources >>> I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening >>> for OOM events for executor default of framework >>> 201401171812-2907575306-5050-19011-0020 >>> I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor >>> at = 27609 >>> >>> Then, later, when we are updating the resources for your 128MB task, we >>> set the soft limit, but we don't set the hard limit because the following >>> buggy check is not satisfied: >>> >>> // Determine whether to set the hard limit. If this is the first >>> // time (info->pid.isNone()), or we're raising the existing limit, >>> // then we can update the hard limit safely. Otherwise, if we need >>> // to decrease 'memory.limit_in_bytes' we may induce an OOM if too >>> // much memory is in use. As a result, we only update the soft >>> // limit when the memory reservation is being reduced. This is >>> // probably okay if the machine has available resources. >>> // TODO(benh): Introduce a MemoryWatcherProcess which monitors the >>> // discrepancy between usage and soft limit and introduces a >>> // "manual oom" if necessary. >>> if (info->pid.isNone() || limit > currentLimit.get()) { >>> >>> The assumption here was that there would always be an initial call with >>> info->pid.isNone(), however, since your executor has no resources we did >>> not update the control before forking the executor. And limit was left as >>> the inherited value. I've cc'ed Ian Downes on this since he's re-working >>> the Isolator, I'll leave it to him to determine whether this is a bug that >>> should be filed or not. >>> >>> >>> On Tue, Jan 21, 2014 at 12:51 PM, Lin Zhao <[email protected]> wrote: >>> >>>> Vinod, >>>> >>>> Correction to my message, when my job is sleeping below values are 500+ >>>> MB as expected. I was looking at the kmem values. OOM notifier is triggered >>>> much later when the executor is killed. Would appreciate it if you have an >>>> idea where to look. >>>> >>>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory. >>>> usage_in_bytes >>>> cat /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_ >>>> usage_in_bytes >>>> >>>> >>>> On Tue, Jan 21, 2014 at 2:54 PM, Lin Zhao <[email protected]> wrote: >>>> >>>>> Interesting. Looking at the log, it seems that OOM is fired when the >>>>> executor is shut down (19:44:07.180585), which is 300 seconds after the >>>>> job >>>>> launch and memory use. Within the 300 seconds usage_in_bytes and >>>>> max_usage_in_bytes are 0. >>>>> >>>>> Attaching the log. Any idea of the slow OOM? As you can see at >>>>> https://gist.github.com/lin-zhao/8544495#file-testexecutor-java-L80, >>>>> 512M mem is used before the sleep. >>>>> >>>>> >>>>> On Tue, Jan 21, 2014 at 2:28 PM, Vinod Kone <[email protected]> wrote: >>>>> >>>>>> The way you set task resources looks correct. >>>>>> >>>>>> Can you paste what the slave logs say regarding the task/executor, >>>>>> esp. the lines that are from the cgroups isolator? Also, what is the >>>>>> command line of the slave? >>>>>> >>>>>> >>>>>> @vinodkone >>>>>> >>>>>> >>>>>> On Tue, Jan 21, 2014 at 11:18 AM, Lin Zhao <[email protected]> wrote: >>>>>> >>>>>>> >>>>>>> *[lin@mesos2 ~]$ cat >>>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.limit_in_bytes >>>>>>> * >>>>>>> >>>>>>> *9223372036854775807* >>>>>>> >>>>>>> *[lin@mesos2 ~]$ cat >>>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.usage_in_bytes >>>>>>> * >>>>>>> >>>>>>> *584146944* >>>>>>> >>>>>>> *[lin@mesos2 ~]$ cat >>>>>>> /cgroup/mesos/framework_201401171812-2907575306-5050-19011-0019_executor_default_tag_72c003a3-f213-479e-a7e3-9b86930703a7/memory.max_usage_in_bytes >>>>>>> * >>>>>>> >>>>>>> *585809920* >>>>>>> >>>>>>> Hmm the limit is weird. Can you find anything wrong about the way my >>>>>>> mem is defined? >>>>>>> >>>>>>> >>>>>>> .addResources(Resource.newBuilder() >>>>>>> >>>>>>> .setName("mem") >>>>>>> >>>>>>> .setType(Value.Type.SCALAR) >>>>>>> >>>>>>> >>>>>>> .setScalar(Value.Scalar.newBuilder().setValue(128))) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Jan 21, 2014 at 2:02 PM, Vinod Kone <[email protected]>wrote: >>>>>>> >>>>>>>> Mesos uses >>>>>>>> cgroups<https://www.kernel.org/doc/Documentation/cgroups/cgroups.txt>to >>>>>>>> limit cpu and memory. >>>>>>>> >>>>>>>> It is indeed surprising that your executor in not OOMing when using >>>>>>>> more memory than requested. >>>>>>>> >>>>>>>> Can you tell us what the following values look like in the >>>>>>>> executor's cgroup? These are the values the kernel uses to decide >>>>>>>> whether >>>>>>>> the cgroup is hitting its limit. >>>>>>>> >>>>>>>> cat >>>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.limit_in_bytes >>>>>>>> >>>>>>>> cat >>>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.usage_in_bytes >>>>>>>> >>>>>>>> cat >>>>>>>> /cgroup/mesos/framework_<foo>_executor_<bar>_<uuid>/memory.max_usage_in_bytes >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> @vinodkone >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jan 21, 2014 at 9:58 AM, Lin Zhao <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm new to Mesos and have some questions about resource >>>>>>>>> management. I want to understand how Mesos limits resources used by >>>>>>>>> each >>>>>>>>> executors, given resources defined in TaskInfo. I did some tests and >>>>>>>>> have >>>>>>>>> seen different behavior for different types of resources. It appears >>>>>>>>> that >>>>>>>>> Mesos caps CPU usage for the executors, but doesn't limit the memory >>>>>>>>> accessible to each executor. >>>>>>>>> >>>>>>>>> I created an example java framework, which is largely taken from >>>>>>>>> the mesos example: >>>>>>>>> >>>>>>>>> https://gist.github.com/lin-zhao/8544495 >>>>>>>>> >>>>>>>>> Basically, >>>>>>>>> >>>>>>>>> 1. the Scheduler launches tasks with *2* cpus, and *128 mb*memory. >>>>>>>>> 2. The executor launches java with *-Xms 1500m* and *-Xmx 1500m*. >>>>>>>>> 3. The java executor creates a byte array that uses *512 MB*memory. >>>>>>>>> 4. The java executor starts 3 threads that loops forever, which >>>>>>>>> potentially uses *3* full cpus. >>>>>>>>> >>>>>>>>> The framework is launched in a 3 slave Mesos (v0.14.2) cluster and >>>>>>>>> finished without error. >>>>>>>>> >>>>>>>>> CPU: on the slaves, the cpu usage for the TestExecutor process is >>>>>>>>> capped at 199%, indicating that Mesos does cap CPU usage. When the >>>>>>>>> executor >>>>>>>>> are assigned 1 cpu instead of 2, the cpu usage is capped at 99%. >>>>>>>>> >>>>>>>>> Memory: There is no error thrown. The executors used > 512 MB >>>>>>>>> memory and get away with it. >>>>>>>>> >>>>>>>>> Can someone confirm this? I haven't tested the other resource >>>>>>>>> types (ports, disk). Is the behavior documented somewhere? >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Lin Zhao >>>>>>>>> >>>>>>>>> https://wiki.groupondev.com/Message_Bus >>>>>>>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>>>>>>> >>>>>>>>> Temporarily based in NY >>>>>>>>> 33 W 19th St. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Lin Zhao >>>>>>> >>>>>>> https://wiki.groupondev.com/Message_Bus >>>>>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>>>>> >>>>>>> Temporarily based in NY >>>>>>> 33 W 19th St. >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Lin Zhao >>>>> >>>>> https://wiki.groupondev.com/Message_Bus >>>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>>> >>>>> Temporarily based in NY >>>>> 33 W 19th St. >>>>> >>>>> >>>> >>>> >>>> -- >>>> Lin Zhao >>>> >>>> https://wiki.groupondev.com/Message_Bus >>>> 3101 Park Blvd, Palo Alto, CA 94306 >>>> >>>> Temporarily based in NY >>>> 33 W 19th St. >>>> >>>> >>> >> > > > -- > Lin Zhao > > https://wiki.groupondev.com/Message_Bus > 3101 Park Blvd, Palo Alto, CA 94306 > > Temporarily based in NY > 33 W 19th St. > > -- Lin Zhao https://wiki.groupondev.com/Message_Bus 3101 Park Blvd, Palo Alto, CA 94306 Temporarily based in NY 33 W 19th St.

