A hard crash with high i/o can be due to bad memory modules, I would run a memory burn program to make sure your hardware is actually stable.
Shank On Tue, Aug 14, 2012 at 2:42 AM, Jurgen Weber <jurgen.we...@theiconic.com.au > wrote: > Hi Guys > > I have a new KVM server, running software raid (mdadm) and the VM disk are > help in a raid 5 with 5 disks (the system is on SSDs in a mirror). > > So far I have about 10 VM's setup, but they are all unable to function > because after we have a few up, and then start to deploy/resubmit the VM's > which have never booted properly the disk IO will stop, the scp process > will hang and it all stops. You will then find the following error in dmesg: > > [ 1201.890311] INFO: task kworker/1:1:6185 blocked for more than 120 > seconds. > [ 1201.890430] "echo 0 > /proc/sys/kernel/hung_task_**timeout_secs" > disables this message. > [ 1201.890569] kworker/1:1 D ffff88203fc13740 0 6185 2 > 0x00000000 > [ 1201.890573] ffff881ffe510140 0000000000000046 0000000000000000 > ffff881039023590 > [ 1201.890580] 0000000000013740 ffff8820393fffd8 ffff8820393fffd8 > ffff881ffe510140 > [ 1201.890586] 0000000000000000 0000000100000000 0000000000000001 > 7fffffffffffffff > [ 1201.890593] Call Trace: > [ 1201.890597] [<ffffffff81349d2e>] ? schedule_timeout+0x2c/0xdb > [ 1201.890605] [<ffffffff810ebdbf>] ? kmem_cache_alloc+0x86/0xea > [ 1201.890610] [<ffffffff8134a58a>] ? __down_common+0x9b/0xee > [ 1201.890631] [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs] > [ 1201.890635] [<ffffffff81063111>] ? down+0x25/0x34 > [ 1201.890648] [<ffffffffa041566f>] ? xfs_buf_lock+0x65/0x9d [xfs] > [ 1201.890665] [<ffffffffa0452c57>] ? xfs_getsb+0x28/0x3b [xfs] > [ 1201.890685] [<ffffffffa045b957>] ? xfs_trans_getsb+0x64/0xb4 [xfs] > [ 1201.890704] [<ffffffffa0452a40>] ? xfs_mod_sb+0x21/0x77 [xfs] > [ 1201.890720] [<ffffffffa0422736>] ? xfs_reclaim_inode+0x22d/0x22d [xfs] > [ 1201.890734] [<ffffffffa041a43e>] ? xfs_fs_log_dummy+0x61/0x75 [xfs] > [ 1201.890754] [<ffffffffa04573a7>] ? xfs_log_need_covered+0x4d/0x8d [xfs] > [ 1201.890769] [<ffffffffa0422770>] ? xfs_sync_worker+0x3a/0x6a [xfs] > [ 1201.890773] [<ffffffff8105aeaa>] ? process_one_work+0x163/0x284 > [ 1201.890778] [<ffffffff8105be72>] ? worker_thread+0xc2/0x145 > [ 1201.890782] [<ffffffff8105bdb0>] ? manage_workers.isra.23+0x15b/** > 0x15b > [ 1201.890787] [<ffffffff8105efad>] ? kthread+0x76/0x7e > [ 1201.890794] [<ffffffff81351cf4>] ? kernel_thread_helper+0x4/0x10 > [ 1201.890799] [<ffffffff8105ef37>] ? kthread_worker_fn+0x139/0x139 > [ 1201.890804] [<ffffffff81351cf0>] ? gs_change+0x13/0x13 > > and lots of them. With this stack track the CPU load will just increase > and I have to power cycle it to get the system back. I have added the > following sysctls: > > fs.file-max = 262144 > kernel.pid_max = 262144 > net.ipv4.tcp_rmem = 4096 87380 8388608 > net.ipv4.tcp_wmem = 4096 87380 8388608 > net.core.rmem_max = 25165824 > net.core.rmem_default = 25165824 > net.core.wmem_max = 25165824 > net.core.wmem_default = 131072 > net.core.netdev_max_backlog = 8192 > net.ipv4.tcp_window_scaling = 1 > net.core.optmem_max = 25165824 > net.core.somaxconn = 65536 > net.ipv4.ip_local_port_range = 1024 65535 > kernel.shmmax = 4294967296 > vm.max_map_count = 262144 > > but the import part I found out was: > #http://blog.ronnyegner-**consulting.de/2011/10/13/info-** > task-blocked-for-more-than-**120-seconds/<http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/> > vm.dirty_ratio=10 > > which does not seem to help thou. > > Now some info on the disk: > #mount > /dev/md2 on /data type xfs (rw,noatime,attr2,delaylog,** > sunit=1024,swidth=4096,**noquota) > > cat /proc/meminfo > MemTotal: 132259720 kB > MemFree: 122111692 kB > > cat /proc/cpuinfo (32 v cores) > processor : 31 > vendor_id : GenuineIntel > cpu family : 6 > model : 45 > model name : Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz > > Some info the Host: > # cat /etc/debian_version > wheezy/sid > # uname -a > Linux chaos 3.2.0-3-amd64 #1 SMP Thu Jun 28 09:07:26 UTC 2012 x86_64 > GNU/Linux > ii kvm 1:1.1.0+dfsg-3 dummy transitional package > from kvm to qemu-kvm > ii qemu-kvm 1.1.0+dfsg-3 Full virtualization on > x86 hardware > ii libvirt-bin 0.9.12-4 programs for the > libvirt library > ii libvirt0 0.9.12-4 library for interfacing > with different virtualization systems > ii python-libvirt 0.9.12-4 libvirt Python > bindings > ii opennebula 3.4.1-3+b1 controller which > executes the OpenNebula cluster services > ii opennebula-common 3.4.1-3 empty package > to create OpenNebula users and directories > ii opennebula-sunstone 3.4.1-3 web interface > to which executes the OpenNebula cluster services > ii opennebula-tools 3.4.1-3 Command-line > tools for OpenNebula Cloud > ii ruby-opennebula 3.4.1-3 Ruby bindings for > OpenNebula Cloud API (OCA) > > Any ideas on how to get this working right now the server is a lemon! :0 > > -- > Jurgen Weber > > Systems Engineer > IT Infrastructure Team Leader > > THE ICONIC | E jurgen.we...@theiconic.com.au | www.theiconic.com.au > > ______________________________**_________________ > Users mailing list > Users@lists.opennebula.org > http://lists.opennebula.org/**listinfo.cgi/users-opennebula.**org<http://lists.opennebula.org/listinfo.cgi/users-opennebula.org> >
_______________________________________________ Users mailing list Users@lists.opennebula.org http://lists.opennebula.org/listinfo.cgi/users-opennebula.org