On Thu, Mar 24, 2016 at 7:51 PM, Christophe TREFOIS <christophe.tref...@uni.lu> wrote: > Hi Nir, > > And the second one is down now too. see some comments below. > >> On 13 Mar 2016, at 12:51, Nir Soffer <nsof...@redhat.com> wrote: >> >> On Sun, Mar 13, 2016 at 9:46 AM, Christophe TREFOIS >> <christophe.tref...@uni.lu> wrote: >>> Dear all, >>> >>> I have a problem since couple of weeks, where randomly 1 VM (not always the >>> same) becomes completely unresponsive. >>> We find this out because our Icinga server complains that host is down. >>> >>> Upon inspection, we find we can’t open a console to the VM, nor can we >>> login. >>> >>> In oVirt engine, the VM looks like “up”. The only weird thing is that RAM >>> usage shows 0% and CPU usage shows 100% or 75% depending on number of cores. >>> The only way to recover is to force shutdown the VM via 2-times shutdown >>> from the engine. >>> >>> Could you please help me to start debugging this? >>> I can provide any logs, but I’m not sure which ones, because I couldn’t see >>> anything with ERROR in the vdsm logs on the host. >> >> I would inspect this vm on the host when it happens. >> >> What is vdsm cpu usage? what is the qemu process (for this vm) cpu usage? > > vdsm cpu usage is going up and down to 15%. > > qemu process usage for the VM was 0, except for 1 of the threads “stuck” at > 100%, rest was idle.
0% may be a deadlock, 100% a thread stuck in endless loop, but this is just a wild guess. > >> >> strace output of this qemu process (all threads) or a core dump can help qemu >> developers to understand this issue. > > I attached an strace on the process for: > > qemu 15241 10.6 0.4 4742904 1934988 ? Sl Mar23 131:41 > /usr/libexec/qemu-kvm -name test-ubuntu-uni-lu -S -machine > pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu SandyBridge -m > size=4194304k,slots=16,maxmem=4294967296k -realtime mlock=off -smp > 4,maxcpus=64,sockets=16,cores=4,threads=1 -numa > node,nodeid=0,cpus=0-3,mem=4096 -uuid 754871ec-0339-4a65-b490-6a766aaea537 > -smbios type=1,manufacturer=oVirt,product=oVirt > Node,version=7-2.1511.el7.centos.2.10,serial=4C4C4544-0048-4610-8052-B4C04F575831,uuid=754871ec-0339-4a65-b490-6a766aaea537 > -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-test-ubuntu-uni-lu/monitor.sock,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc > base=2016-03-23T22:06:01,driftfix=slew -global > kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on > -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -device > virtio-serial-pci,id=virtio-serial0,max_ports=16,bus=pci.0,addr=0x5 -drive > if=none,id=drive-ide0-1-0,readonly=on,format=raw,serial= -device > ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive > file=/rhev/data-center/00000002-0002-0002-0002-0000000003d5/8253a89b-651e-4ff4-865b-57adef05d383/images/9d60ae41-bf17-48b4-b0e6-29625b248718/47a6916c-c902-4ea3-8dfb-a3240d7d9515,if=none,id=drive-virtio-disk0,format=qcow2,serial=9d60ae41-bf17-48b4-b0e6-29625b248718,cache=none,werror=stop,rerror=stop,aio=threads > -device > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 > -netdev tap,fd=108,id=hostnet0,vhost=on,vhostfd=109 -device > virtio-net-pci,netdev=hostnet0,id=net0,mac=00:1a:4a:e5:12:0f,bus=pci.0,addr=0x3,bootindex=2 > -chardev > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channels/754871ec-0339-4a65-b490-6a766aaea537.com.redhat.rhevm.vdsm,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.rhevm.vdsm > -chardev > socket,id=charchannel1,path=/var/lib/libvirt/qemu/channels/754871ec-0339-4a65-b490-6a766aaea537.org.qemu.guest_agent.0,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 > -device usb-tablet,id=input0 -vnc 10.79.2.2:76,password -device > cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on > > http://paste.fedoraproject.org/344756/84131214 You connected only to one thread. I would try to use -f to see all threads, or connect with gdb and get a backtrace of all threads. Adding Kevin to suggest how to continue. I think we need a qemu bug for this. Nir > > This is CentOS 7.2, latest patches and latest 3.6.4 oVirt. > > Thank you for any help / pointers. > > Could it be memory ballooning? > > Best, > >> >>> >>> The host is running >>> >>> OS Version: RHEL - 7 - 1.1503.el7.centos.2.8 >>> Kernel Version: 3.10.0 - 229.14.1.el7.x86_64 >>> KVM Version: 2.1.2 - 23.el7_1.8.1 >>> LIBVIRT Version: libvirt-1.2.8-16.el7_1.4 >>> VDSM Version: vdsm-4.16.26-0.el7.centos >>> SPICE Version: 0.12.4 - 9.el7_1.3 >>> GlusterFS Version: glusterfs-3.7.5-1.el7 >> >> You are running old versions, missing lot of fixes. Nothing specific >> to your problem >> but this lower the chance to get a working system. >> >> It would be nice if you can upgrade to ovirt-3.6 and report if it made >> any change. >> Or at lest latest ovirt-3.5. >> >>> We use a locally exported gluster as storage domain (eg, storage is on the >>> same machine exposed via gluster). No replica. >>> We run around 50 VMs on that host. >> >> Why use gluster for this? Do you plan to add more gluster servers in the >> future? >> >> Nir > _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users