Hi Li,

Why do you think the slave was OOM killed? Is there something that pointed
you to that conclusion? All I see is the slave launched an executor, and
the executor was killed by framework a few seconds after the task was
launched.

Also, what version are you running?

Ben


On Thu, Aug 22, 2013 at 3:27 PM, Li Jin <ice.xell...@gmail.com> wrote:

> Hello guys,
>
> I am implementing a mesos executor and see this behavior when I enabled
> cgroups isolation. It seems the slave got oom killed. I didn't expect the
> slave to be oom killed in any circumstance, am I wrong?
>
> Here are the slave log:
>
> I0822 21:22:09.168122 15557 cgroups_isolation_module.cpp:440] Launching
> jobsystem (/net/
> hsljin.aoa.twosigma.com/userhome/ljin/cvs/JOBS-MESOS/ts/jobsystem/sbin/jobagent-c
>  dev) in
> /tmp/mesos/slaves/201308201743-164210880-5050-16120-10/frameworks/201308201743-164210880-5050-16120-0046/executors/jobsystem/runs/9a3c2f9f-733b-4c68-a269-2f7f17d6ad06
> with resources  for framework 201308201743-164210880-5050-16120-0046 in
> cgroup
> mesos/framework_201308201743-164210880-5050-16120-0046_executor_jobsystem_tag_96193e04-40d2-4911-9a68-6eb86c534f97
> I0822 21:22:09.169131 15557 cgroups_isolation_module.cpp:572] Changing
> cgroup controls for executor jobsystem of framework
> 201308201743-164210880-5050-16120-0046 with resources
> I0822 21:22:09.169831 15557 cgroups_isolation_module.cpp:801] Started
> listening for OOM events for executor jobsystem of framework
> 201308201743-164210880-5050-16120-0046
> I0822 21:22:09.170280 15557 cgroups_isolation_module.cpp:469] Forked
> executor at = 22708
> I0822 21:22:10.583222 15559 slave.cpp:487] Got assigned task 1 for
> framework 201308201743-164210880-5050-16120-0046
> I0822 21:22:10.583271 15559 slave.cpp:523] Queuing task '1' for executor
> jobsystem of framework '201308201743-164210880-5050-16120-0046
> I0822 21:22:11.730157 15555 slave.cpp:762] Got registration for executor
> 'jobsystem' of framework 201308201743-164210880-5050-16120-0046
> I0822 21:22:11.730293 15560 cgroups_isolation_module.cpp:572] Changing
> cgroup controls for executor jobsystem of framework
> 201308201743-164210880-5050-16120-0046 with resources cpus=2; mem=2048
> I0822 21:22:11.730443 15555 slave.cpp:820] Flushing queued tasks for
> framework 201308201743-164210880-5050-16120-0046
> I0822 21:22:11.732163 15560 cgroups_isolation_module.cpp:775] Updated
> 'memory.soft_limit_in_bytes' to 2147483648 for executor jobsystem of
> framework 201308201743-164210880-5050-16120-0046
> I0822 21:22:12.398077 15558 slave.cpp:1194] Killing executor 'jobsystem'
> of framework 201308201743-164210880-5050-16120-0045
> I0822 21:22:12.398149 15559 cgroups_isolation_module.cpp:535] Killing
> executor jobsystem of framework 201308201743-164210880-5050-16120-0045
> I0822 21:22:12.398381 15561 gc.cpp:97] Scheduling
> /tmp/mesos/slaves/201308201743-164210880-5050-16120-10/frameworks/201308201743-164210880-5050-16120-0045/executors/jobsystem/runs/5c9cddd0-7b4c-464b-9b2d-17e8197970a1
> for removal
> Killed
>
>

Reply via email to