Re: [Users] Gluster VM stuck in "waiting for launch" state

Sahina Bose Thu, 31 Oct 2013 00:03:11 -0700


On 10/31/2013 11:51 AM, Sahina Bose wrote:

On 10/31/2013 09:22 AM, Zhou Zheng Sheng wrote:
on 2013/10/31 06:24, Dan Kenigsberg wrote:
On Wed, Oct 30, 2013 at 08:41:43PM +0100, Alessandro Bianchi wrote:
Il 30/10/2013 18:04, Dan Kenigsberg ha scritto:
On Wed, Oct 30, 2013 at 02:40:02PM +0100, Alessandro Bianchi wrote:
    Il 30/10/2013 13:58, Dan Kenigsberg ha scritto:
On Wed, Oct 30, 2013 at 11:34:21AM +0100, Alessandro Bianchiwrote:
     Hi everyone

     I've set up a gluster storage with two replicated bricks

     DC is up and I created a VM to test gluster storage
If I start the VM WITHOUT any disk attached (only onevirtual DVD) it
     starts fine.
If I attach a gluster domain disk thin provisioning 30 Gbthe Vm stucks in
     "waiting for launch" state
I see no special activity on the gluster servers (they serveseveral othershares with no troubles at all and even the ISO domain is aNFS on
     locally mounted gluster and works fine)
I've double checked all the pre requisites and they lookfine (F 19 -gluster setup insecure in both glusterd.vol and volumeoptions -
     uid/gid/insecure )

     Am I doing something wrong?

     I'm even unable to stop the VM from the engine GUI

     Any advise?

  Which version of ovirt are you using? Hopefully ovirt-3.3.0.1.
  For how long is the VM stuck in its "wait for launch" state?
  What does `virsh -r list` has to say while startup stalls?
  Would you provide more content of your vdsm.log and possibly
libvirtd.log so we can understand what blocks the VM start-up?Pleaseuse attachement of pastebin, as your mail agents wreaks havocto the log
  lines.


    Thank you for your answer.

    Here are the "facts"

    In the GUI I see

    "waiting for launch 3 h"

     virsh -r list
     Id    Nome                           Stato
    ----------------------------------------------------
     3     CentOS_30                      terminato

    vdsClient -s 0 list table
    200dfb05-461e-49d9-95a2-c0a7c7ced669      0 CentOS_30
    WaitForLaunch

    Packages:

    ovirt-engine-userportal-3.3.0.1-1.fc19.noarch
    ovirt-log-collector-3.3.1-1.fc19.noarch
    ovirt-engine-restapi-3.3.0.1-1.fc19.noarch
    ovirt-engine-setup-3.3.0.1-1.fc19.noarch
    ovirt-engine-backend-3.3.0.1-1.fc19.noarch
    ovirt-host-deploy-java-1.1.1-1.fc19.noarch
    ovirt-release-fedora-8-1.noarch
ovirt-engine-setup-plugin-allinone-3.3.0.1-1.fc19.noarch
    ovirt-engine-webadmin-portal-3.3.0.1-1.fc19.noarch
    ovirt-engine-sdk-python-3.3.0.7-1.fc19.noarch
    ovirt-iso-uploader-3.3.1-1.fc19.noarch
    ovirt-engine-websocket-proxy-3.3.0.1-1.fc19.noarch
    ovirt-engine-dbscripts-3.3.0.1-1.fc19.noarch
    ovirt-host-deploy-offline-1.1.1-1.fc19.noarch
    ovirt-engine-cli-3.3.0.5-1.fc19.noarch
    ovirt-engine-tools-3.3.0.1-1.fc19.noarch
    ovirt-engine-lib-3.3.0.1-1.fc19.noarch
    ovirt-image-uploader-3.3.1-1.fc19.noarch
    ovirt-engine-3.3.0.1-1.fc19.noarch
    ovirt-host-deploy-1.1.1-1.fc19.noarch

    I attach the full vdsm log

    Look around 30-10 10:30 to see all what happens
Despite the "terminated" label in output from virsh I stillsee the VM"waiting for launch" in the GUI, so I suspect the answer to"how long" may
    be "forever"
Since this is a test VM I can do whatever test you may needto track the
    problem included destroy and rebuild

    It would be great to have gluster support stable in ovirt!

    Thank you for your efforts
The log has an ominous failed attempt to start the VM, followed by an
immediate vdsm crash. Is it reproducible?

We have plenty of issues lurking here:
1. Why has libvirt failed to create the VM? For this, please findclues
    in the complete non-line-broken CentOS_30.log and libvirtd.log.
attached to this messages
2. Why was vdsm killed? Does /var/log/message has a clue fromsystemd?
result of cat /var/log/messages | grep vdsm attached
I do not see an explicit attempt to take vdsmd down. Do you see anyother
incriminating message correlated with
Oct 30 08:51:15 hypervisor respawn: slave '/usr/share/vdsm/vdsm'died, respawning slave
3. We may have a nasty race: if Vdsm crashes just before it has
    registered that the VM is down.
Actually, this is not the issue: vdsm tries (and fails, due toqemu/libvirt
bug) to destroy the VM.
4. We used to force Vdsm to run with LC_ALL=C. It seems that thegrandservice rewrite by Zhou (http://gerrit.ovirt.org/15578) haschangedthat. This may have adverse effects, since AFAIR we sometimesparseapplication output, and assume that it's in C. Having anon-English
    log file is problematic on it's own for support personal, used to
grep for keywords. ybronhei, was it intensional? Can it bereverted
    or at least scrutinized?
currentely it still says "waiting for launch 9h"

I don't abort it so if you need any other info I can have them
libvirtd fails to connect to qemu's monitor. This smells like a qemubug that
is beyond my over-the-mailing-list debugging abilities :-(
You may want to strace or gdb the running qemu process, or to try tore-attach
to it by restarting libvirtd.
2013-10-30 07:51:10.045+0000: 8304: debug : qemuProcessStart:3804 :Waiting for monitor to show up2013-10-30 07:51:10.045+0000: 8304: debug :qemuProcessWaitForMonitor:1707 : Connect monitor to 0x7fc1640eab80'CentOS_30'2013-10-30 07:51:10.246+0000: 8304: debug :qemuMonitorOpenInternal:751 : QEMU_MONITOR_NEW: mon=0x7fc1640ef6b0refs=2 fd=272013-10-30 07:51:10.246+0000: 8304: debug :qemuMonitorSetCapabilities:1145 : mon=0x7fc1640ef6b02013-10-30 07:51:10.246+0000: 8304: debug : qemuMonitorSend:887 :QEMU_MONITOR_SEND_MSG: mon=0x7fc1640ef6b0msg={"execute":"qmp_capabilities","id":"libvirt-1"}
  fd=-1
2013-10-30 07:51:13.097+0000: 8296: error : qemuMonitorIORead:505 :Unable to read from monitor: Connessione interrotta dal corrispondente2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 :Error on monitor Unable to read from monitor: Connessione interrottadal corrispondente2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:672 :Triggering error callback2013-10-30 07:51:13.097+0000: 8296: debug :qemuProcessHandleMonitorError:351 : Received error on 0x7fc1640eab80'CentOS_30'2013-10-30 07:51:13.097+0000: 8304: debug : qemuMonitorSend:899 :Send command resulted in error Unable to read from monitor:Connessione interrotta dal corrispondente2013-10-30 07:51:13.097+0000: 8304: debug : qemuProcessStop:3992 :Shutting down VM 'CentOS_30' pid=7655 flags=02013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:638 :Error on monitor Unable to read from monitor: Connessione interrottadal corrispondente2013-10-30 07:51:13.097+0000: 8296: debug : qemuMonitorIO:661 :Triggering EOF callback
I noticed in the previous attached CentOS_30.log, qemu said "Gluster
connection failed for server=172.16.0.100" for each of the first few
attempts to start qemu. That's why libvirt said it could not connect to
qemu monitor, the real reason was that qemu process exited after it
checked the gluster storage. libvirt error message was not very friendly
in this case.

Then I see on "2013-10-30 08:28:51.047+0000" in CentOS_30.log, qemu
started successfully without reporting gluster error, this was fine. In
libvirtd.log on the corresponding time spot, it libvirt connected to the
monitor and reported it was about to issue a handshake.

Investigate the CentOS_30.log further, I find the attempts to start qemu
after "2013-10-30 08:28:51" were failed with the same error "Gluster
connection failed for server=172.16.0.100". Maybe there is a problem in
the gluster server or firewall? I'm not sure.
http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requisites- I assume these have been followed w.r.t setting permissions.


Sorry, I missed that you had set all the insecure options on the volume.

"Gluster connection failed for server" has a "Transport endpoint is notconnected" error message.


Are you able to manually mount the gluster volume?


_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users


_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Re: [Users] Gluster VM stuck in "waiting for launch" state

Reply via email to