It's a kernel bug from the looks of it, here's a similar stack trace: https://bugs.centos.org/view.php?id=7538
But apparently they fixed it in the kernel you're running. On Sat, Jan 24, 2015 at 8:26 AM, Gary Ogden <[email protected]> wrote: > I'm using this: > 3 Node Cluster 8GB 2 CPU each > CentOS 6.5 2.6.32-504.3.3.el6.x86_64 > > On each node we run Cassandra and Mesos where we run java spark jobs. > This is a testing environment so it's actually shared between test groups. > So we actually are running 3 instances of mesos-slave on each node. > (integration, qa and preprod). We want to ensure these sparks jobs don't > slow down Cassandra. > > If I don't use cgroups, we don't get a kernel panic. No matter how I try > to configure cgroups, I still get the panic and reboot. Is there an issue > with having multiple slaves on the same machine? Here's the kernel panic > text: > > <4>Process mesos-slave (pid: 19593, threadinfo ffff88023a224000, task > ffff8800aa1af540) > <4>Stack: > <4> ffff88023a225dd8 ffff8802395b6580 ffff88023a225e08 ffffffff810cdaa2 > <4><d> ffff88008f740440 ffff8800aa0df938 0000000000000000 ffff8800aa0df950 > <4><d> ffff88023a225e58 ffffffff810577e9 ffff88008f740440 0000000300000001 > <4>Call Trace: > <4> [<ffffffff810cdaa2>] cgroup_event_wake+0x42/0x70 > <4> [<ffffffff810577e9>] __wake_up_common+0x59/0x90 > <4> [<ffffffff8105bd18>] __wake_up+0x48/0x70 > <4> [<ffffffff811dad2d>] eventfd_release+0x2d/0x40 > <4> [<ffffffff8118f8d5>] __fput+0xf5/0x210 > <4> [<ffffffff8118fa15>] fput+0x25/0x30 > <4> [<ffffffff8118ac6d>] filp_close+0x5d/0x90 > <4> [<ffffffff8118ad45>] sys_close+0xa5/0x100 > <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b > <4>Code: 01 01 01 01 01 48 0f af c2 48 c1 e8 38 c3 90 90 90 90 90 90 90 90 > 90 90 90 90 90 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 8b 47 08 <4c> 8b 00 > 4c 39 c7 75 > 39 48 8b 03 4c 8b 40 08 4c 39 c3 75 4c 48 > <1>RIP [<ffffffff8129e870>] list_del+0x10/0xa0 > <4> RSP <ffff88023a225dc8> > > Here's the cgconfig.conf: > > mount { > cpu = /cgroup/cpu; > cpuacct = /cgroup/cpuacct; > memory = /cgroup/memory; > } > > group cassandra { > cpu { > cpu.shares="800"; > } > cpuacct { > cpuacct.usage="0"; > } > memory { > memory.limit_in_bytes="5G"; > memory.memsw.limit_in_bytes="5G"; > } > } > > > group mesos { > cpu { > cpu.shares="200"; > } > cpuacct { > cpuacct.usage="0"; > } > memory { > memory.limit_in_bytes="1G"; > memory.memsw.limit_in_bytes="1G"; > } > } > > Here's the cgrules.conf: > > @mesos cpu,cpuacct,memory mesos > @cassandra cpu,cpuacct,memory cassandra > > > And here's how we start each slave: > cgexec -g cpu,cpuacct,memory:mesos /usr/sbin/mesos-slave > --isolation=cgroups/cpu,cgroups/mem --cgroups_limit_swap > --cgroups_hierarchy=/cgroup --resource > s="mem(*):256;cpus(*):1;ports(*):[20000-25000];disk(*):5000" > --gc_delay=2days --cgroups_root=mesos --log_dir=/var/log/mesos/int > --master=zk://intMesosMaster01:2 > 181,intMesosMaster02:2181,intMesosMaster03:2181/mesos --port=5150 > --work_dir=/tmp/mesos/int > > > I've tried lots of different settings in cgroups, the startup of the > slaves, but nothing seems to matter. We've also disabled swap on these > boxes since Cassandra doesn't like swap. > > > >

