On Wed, Oct 30, 2013 at 08:41:43PM +0100, Alessandro Bianchi wrote: > Il 30/10/2013 18:04, Dan Kenigsberg ha scritto: > >On Wed, Oct 30, 2013 at 02:40:02PM +0100, Alessandro Bianchi wrote: > >> Il 30/10/2013 13:58, Dan Kenigsberg ha scritto: > >> > >> On Wed, Oct 30, 2013 at 11:34:21AM +0100, Alessandro Bianchi wrote: > >> > >> Hi everyone > >> > >> I've set up a gluster storage with two replicated bricks > >> > >> DC is up and I created a VM to test gluster storage > >> > >> If I start the VM WITHOUT any disk attached (only one virtual DVD) it > >> starts fine. > >> > >> If I attach a gluster domain disk thin provisioning 30 Gb the Vm > >> stucks in > >> "waiting for launch" state > >> > >> I see no special activity on the gluster servers (they serve several > >> other > >> shares with no troubles at all and even the ISO domain is a NFS on > >> locally mounted gluster and works fine) > >> > >> I've double checked all the pre requisites and they look fine (F 19 - > >> gluster setup insecure in both glusterd.vol and volume options - > >> uid/gid/insecure ) > >> > >> Am I doing something wrong? > >> > >> I'm even unable to stop the VM from the engine GUI > >> > >> Any advise? > >> > >> Which version of ovirt are you using? Hopefully ovirt-3.3.0.1. > >> For how long is the VM stuck in its "wait for launch" state? > >> What does `virsh -r list` has to say while startup stalls? > >> Would you provide more content of your vdsm.log and possibly > >> libvirtd.log so we can understand what blocks the VM start-up? Please > >> use attachement of pastebin, as your mail agents wreaks havoc to the log > >> lines. > >> > >> > >> Thank you for your answer. > >> > >> Here are the "facts" > >> > >> In the GUI I see > >> > >> "waiting for launch 3 h" > >> > >> virsh -r list > >> Id Nome Stato > >> ---------------------------------------------------- > >> 3 CentOS_30 terminato > >> > >> vdsClient -s 0 list table > >> 200dfb05-461e-49d9-95a2-c0a7c7ced669 0 CentOS_30 > >> WaitForLaunch > >> > >> Packages: > >> > >> ovirt-engine-userportal-3.3.0.1-1.fc19.noarch > >> ovirt-log-collector-3.3.1-1.fc19.noarch > >> ovirt-engine-restapi-3.3.0.1-1.fc19.noarch > >> ovirt-engine-setup-3.3.0.1-1.fc19.noarch > >> ovirt-engine-backend-3.3.0.1-1.fc19.noarch > >> ovirt-host-deploy-java-1.1.1-1.fc19.noarch > >> ovirt-release-fedora-8-1.noarch > >> ovirt-engine-setup-plugin-allinone-3.3.0.1-1.fc19.noarch > >> ovirt-engine-webadmin-portal-3.3.0.1-1.fc19.noarch > >> ovirt-engine-sdk-python-3.3.0.7-1.fc19.noarch > >> ovirt-iso-uploader-3.3.1-1.fc19.noarch > >> ovirt-engine-websocket-proxy-3.3.0.1-1.fc19.noarch > >> ovirt-engine-dbscripts-3.3.0.1-1.fc19.noarch > >> ovirt-host-deploy-offline-1.1.1-1.fc19.noarch > >> ovirt-engine-cli-3.3.0.5-1.fc19.noarch > >> ovirt-engine-tools-3.3.0.1-1.fc19.noarch > >> ovirt-engine-lib-3.3.0.1-1.fc19.noarch > >> ovirt-image-uploader-3.3.1-1.fc19.noarch > >> ovirt-engine-3.3.0.1-1.fc19.noarch > >> ovirt-host-deploy-1.1.1-1.fc19.noarch > >> > >> I attach the full vdsm log > >> > >> Look around 30-10 10:30 to see all what happens > >> > >> Despite the "terminated" label in output from virsh I still see the VM > >> "waiting for launch" in the GUI, so I suspect the answer to "how long" > >> may > >> be "forever" > >> > >> Since this is a test VM I can do whatever test you may need to track the > >> problem included destroy and rebuild > >> > >> It would be great to have gluster support stable in ovirt! > >> > >> Thank you for your efforts > >The log has an ominous failed attempt to start the VM, followed by an > >immediate vdsm crash. Is it reproducible? > > > >We have plenty of issues lurking here: > >1. Why has libvirt failed to create the VM? For this, please find clues > > in the complete non-line-broken CentOS_30.log and libvirtd.log. > attached to this messages > > > >2. Why was vdsm killed? Does /var/log/message has a clue from systemd? > result of cat /var/log/messages | grep vdsm attached
I do not see an explicit attempt to take vdsmd down. Do you see any other incriminating message correlated with Oct 30 08:51:15 hypervisor respawn: slave '/usr/share/vdsm/vdsm' died, respawning slave > > > >3. We may have a nasty race: if Vdsm crashes just before it has > > registered that the VM is down. Actually, this is not the issue: vdsm tries (and fails, due to qemu/libvirt bug) to destroy the VM. > > > >4. We used to force Vdsm to run with LC_ALL=C. It seems that the grand > > service rewrite by Zhou (http://gerrit.ovirt.org/15578) has changed > > that. This may have adverse effects, since AFAIR we sometimes parse > > application output, and assume that it's in C. Having a non-English > > log file is problematic on it's own for support personal, used to > > grep for keywords. ybronhei, was it intensional? Can it be reverted > > or at least scrutinized? > currentely it still says "waiting for launch 9h" > > I don't abort it so if you need any other info I can have them libvirtd fails to connect to qemu's monitor. This smells like a qemu bug that is beyond my over-the-mailing-list debugging abilities :-( You may want to strace or gdb the running qemu process, or to try to re-attach to it by restarting libvirtd. 2013-10-30 07:51:10.045+0000: 8304: debug : qemuProcessStart:3804 : Waiting for monitor to show up 2013-10-30 07:51:10.045+0000: 8304: debug : qemuProcessWaitForMonitor:1707 : Connect monitor to 0x7fc1640eab80 'CentOS_30' 2013-10-30 07:51:10.246+0000: 8304: debug : qemuMonitorOpenInternal:751 : QEMU_MONITOR_NEW: mon=0x7fc1640ef6b0 refs=2 fd=27 2013-10-30 07:51:10.246+0000: 8304: debug : qemuMonitorSetCapabilities:1145 : mon=0x7fc1640ef6b0 2013-10-30 07:51:10.246+0000: 8304: debug : qemuMonitorSend:887 : QEMU_MONITOR_SEND_MSG: mon=0x7fc1640ef6b0 msg={"execute":"qmp_capabilities","id":"libvirt-1"} fd=-1 2013-10-30 07:51:13.097+0000: 8296: error : qemuMonitorIORead:505 : Unable to read from monitor: Connessione interrotta dal corrispondente 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 : Error on monitor Unable to read from monitor: Connessione interrotta dal corrispondente 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:672 : Triggering error callback 2013-10-30 07:51:13.097+0000: 8296: debug : qemuProcessHandleMonitorError:351 : Received error on 0x7fc1640eab80 'CentOS_30' 2013-10-30 07:51:13.097+0000: 8304: debug : qemuMonitorSend:899 : Send command resulted in error Unable to read from monitor: Connessione interrotta dal corrispondente 2013-10-30 07:51:13.097+0000: 8304: debug : qemuProcessStop:3992 : Shutting down VM 'CentOS_30' pid=7655 flags=0 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 : Error on monitor Unable to read from monitor: Connessione interrotta dal corrispondente 2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:661 : Triggering EOF callback _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

