Hi Li, Why do you think the slave was OOM killed? Is there something that pointed you to that conclusion? All I see is the slave launched an executor, and the executor was killed by framework a few seconds after the task was launched.
Also, what version are you running? Ben On Thu, Aug 22, 2013 at 3:27 PM, Li Jin <ice.xell...@gmail.com> wrote: > Hello guys, > > I am implementing a mesos executor and see this behavior when I enabled > cgroups isolation. It seems the slave got oom killed. I didn't expect the > slave to be oom killed in any circumstance, am I wrong? > > Here are the slave log: > > I0822 21:22:09.168122 15557 cgroups_isolation_module.cpp:440] Launching > jobsystem (/net/ > hsljin.aoa.twosigma.com/userhome/ljin/cvs/JOBS-MESOS/ts/jobsystem/sbin/jobagent-c > dev) in > /tmp/mesos/slaves/201308201743-164210880-5050-16120-10/frameworks/201308201743-164210880-5050-16120-0046/executors/jobsystem/runs/9a3c2f9f-733b-4c68-a269-2f7f17d6ad06 > with resources for framework 201308201743-164210880-5050-16120-0046 in > cgroup > mesos/framework_201308201743-164210880-5050-16120-0046_executor_jobsystem_tag_96193e04-40d2-4911-9a68-6eb86c534f97 > I0822 21:22:09.169131 15557 cgroups_isolation_module.cpp:572] Changing > cgroup controls for executor jobsystem of framework > 201308201743-164210880-5050-16120-0046 with resources > I0822 21:22:09.169831 15557 cgroups_isolation_module.cpp:801] Started > listening for OOM events for executor jobsystem of framework > 201308201743-164210880-5050-16120-0046 > I0822 21:22:09.170280 15557 cgroups_isolation_module.cpp:469] Forked > executor at = 22708 > I0822 21:22:10.583222 15559 slave.cpp:487] Got assigned task 1 for > framework 201308201743-164210880-5050-16120-0046 > I0822 21:22:10.583271 15559 slave.cpp:523] Queuing task '1' for executor > jobsystem of framework '201308201743-164210880-5050-16120-0046 > I0822 21:22:11.730157 15555 slave.cpp:762] Got registration for executor > 'jobsystem' of framework 201308201743-164210880-5050-16120-0046 > I0822 21:22:11.730293 15560 cgroups_isolation_module.cpp:572] Changing > cgroup controls for executor jobsystem of framework > 201308201743-164210880-5050-16120-0046 with resources cpus=2; mem=2048 > I0822 21:22:11.730443 15555 slave.cpp:820] Flushing queued tasks for > framework 201308201743-164210880-5050-16120-0046 > I0822 21:22:11.732163 15560 cgroups_isolation_module.cpp:775] Updated > 'memory.soft_limit_in_bytes' to 2147483648 for executor jobsystem of > framework 201308201743-164210880-5050-16120-0046 > I0822 21:22:12.398077 15558 slave.cpp:1194] Killing executor 'jobsystem' > of framework 201308201743-164210880-5050-16120-0045 > I0822 21:22:12.398149 15559 cgroups_isolation_module.cpp:535] Killing > executor jobsystem of framework 201308201743-164210880-5050-16120-0045 > I0822 21:22:12.398381 15561 gc.cpp:97] Scheduling > /tmp/mesos/slaves/201308201743-164210880-5050-16120-10/frameworks/201308201743-164210880-5050-16120-0045/executors/jobsystem/runs/5c9cddd0-7b4c-464b-9b2d-17e8197970a1 > for removal > Killed > >