On 10/31/2013 09:22 AM, Zhou Zheng Sheng wrote:
on 2013/10/31 06:24, Dan Kenigsberg wrote:
On
Wed, Oct 30, 2013 at 08:41:43PM +0100, Alessandro Bianchi
wrote:
Il
30/10/2013 18:04, Dan Kenigsberg ha scritto:
On
Wed, Oct 30, 2013 at 02:40:02PM +0100, Alessandro
Bianchi wrote:
Il 30/10/2013 13:58, Dan Kenigsberg ha scritto:
On Wed, Oct 30, 2013 at 11:34:21AM +0100, Alessandro
Bianchi wrote:
Hi everyone
I've set up a gluster storage with two replicated
bricks
DC is up and I created a VM to test gluster
storage
If I start the VM WITHOUT any disk attached (only
one virtual DVD) it
starts fine.
If I attach a gluster domain disk thin
provisioning 30 Gb the Vm stucks in
"waiting for launch" state
I see no special activity on the gluster servers
(they serve several other
shares with no troubles at all and even the ISO
domain is a NFS on
locally mounted gluster and works fine)
I've double checked all the pre requisites and
they look fine (F 19 -
gluster setup insecure in both glusterd.vol and
volume options -
uid/gid/insecure )
Am I doing something wrong?
I'm even unable to stop the VM from the engine
GUI
Any advise?
Which version of ovirt are you using? Hopefully
ovirt-3.3.0.1.
For how long is the VM stuck in its "wait for
launch" state?
What does `virsh -r list` has to say while startup
stalls?
Would you provide more content of your vdsm.log and
possibly
libvirtd.log so we can understand what blocks the VM
start-up? Please
use attachement of pastebin, as your mail agents
wreaks havoc to the log
lines.
Thank you for your answer.
Here are the "facts"
In the GUI I see
"waiting for launch 3 h"
virsh -r list
Id Nome Stato
----------------------------------------------------
3 CentOS_30 terminato
vdsClient -s 0 list table
200dfb05-461e-49d9-95a2-c0a7c7ced669 0
CentOS_30
WaitForLaunch
Packages:
ovirt-engine-userportal-3.3.0.1-1.fc19.noarch
ovirt-log-collector-3.3.1-1.fc19.noarch
ovirt-engine-restapi-3.3.0.1-1.fc19.noarch
ovirt-engine-setup-3.3.0.1-1.fc19.noarch
ovirt-engine-backend-3.3.0.1-1.fc19.noarch
ovirt-host-deploy-java-1.1.1-1.fc19.noarch
ovirt-release-fedora-8-1.noarch
ovirt-engine-setup-plugin-allinone-3.3.0.1-1.fc19.noarch
ovirt-engine-webadmin-portal-3.3.0.1-1.fc19.noarch
ovirt-engine-sdk-python-3.3.0.7-1.fc19.noarch
ovirt-iso-uploader-3.3.1-1.fc19.noarch
ovirt-engine-websocket-proxy-3.3.0.1-1.fc19.noarch
ovirt-engine-dbscripts-3.3.0.1-1.fc19.noarch
ovirt-host-deploy-offline-1.1.1-1.fc19.noarch
ovirt-engine-cli-3.3.0.5-1.fc19.noarch
ovirt-engine-tools-3.3.0.1-1.fc19.noarch
ovirt-engine-lib-3.3.0.1-1.fc19.noarch
ovirt-image-uploader-3.3.1-1.fc19.noarch
ovirt-engine-3.3.0.1-1.fc19.noarch
ovirt-host-deploy-1.1.1-1.fc19.noarch
I attach the full vdsm log
Look around 30-10 10:30 to see all what happens
Despite the "terminated" label in output from
virsh I still see the VM
"waiting for launch" in the GUI, so I suspect the
answer to "how long" may
be "forever"
Since this is a test VM I can do whatever test you
may need to track the
problem included destroy and rebuild
It would be great to have gluster support stable
in ovirt!
Thank you for your efforts
The log has an ominous failed attempt to start the VM,
followed by an
immediate vdsm crash. Is it reproducible?
We have plenty of issues lurking here:
1. Why has libvirt failed to create the VM? For this,
please find clues
in the complete non-line-broken CentOS_30.log and
libvirtd.log.
attached to this messages
2.
Why was vdsm killed? Does /var/log/message has a clue
from systemd?
result of cat /var/log/messages | grep vdsm attached
I do not see an explicit attempt to take vdsmd down. Do you
see any other
incriminating message correlated with
Oct 30 08:51:15 hypervisor respawn: slave
'/usr/share/vdsm/vdsm' died, respawning slave
3.
We may have a nasty race: if Vdsm crashes just before it
has
registered that the VM is down.
Actually, this is not the issue: vdsm tries (and fails, due
to qemu/libvirt
bug) to destroy the VM.
4.
We used to force Vdsm to run with LC_ALL=C. It seems
that the grand
service rewrite by Zhou
(http://gerrit.ovirt.org/15578) has changed
that. This may have adverse effects, since AFAIR we
sometimes parse
application output, and assume that it's in C.
Having a non-English
log file is problematic on it's own for support
personal, used to
grep for keywords. ybronhei, was it intensional? Can
it be reverted
or at least scrutinized?
currentely it still says "waiting for launch 9h"
I don't abort it so if you need any other info I can have
them
libvirtd fails to connect to qemu's monitor. This smells
like a qemu bug that
is beyond my over-the-mailing-list debugging abilities :-(
You may want to strace or gdb the running qemu process, or
to try to re-attach
to it by restarting libvirtd.
2013-10-30 07:51:10.045+0000: 8304: debug :
qemuProcessStart:3804 : Waiting for monitor to show up
2013-10-30 07:51:10.045+0000: 8304: debug :
qemuProcessWaitForMonitor:1707 : Connect monitor to
0x7fc1640eab80 'CentOS_30'
2013-10-30 07:51:10.246+0000: 8304: debug :
qemuMonitorOpenInternal:751 : QEMU_MONITOR_NEW:
mon=0x7fc1640ef6b0 refs=2 fd=27
2013-10-30 07:51:10.246+0000: 8304: debug :
qemuMonitorSetCapabilities:1145 : mon=0x7fc1640ef6b0
2013-10-30 07:51:10.246+0000: 8304: debug :
qemuMonitorSend:887 : QEMU_MONITOR_SEND_MSG:
mon=0x7fc1640ef6b0
msg={"execute":"qmp_capabilities","id":"libvirt-1"}
fd=-1
2013-10-30 07:51:13.097+0000: 8296: error :
qemuMonitorIORead:505 : Unable to read from monitor:
Connessione interrotta dal corrispondente
2013-10-30 07:51:13.097+0000: 8296: debug :
qemuMonitorIO:638 : Error on monitor Unable to read from
monitor: Connessione interrotta dal corrispondente
2013-10-30 07:51:13.097+0000: 8296: debug :
qemuMonitorIO:672 : Triggering error callback
2013-10-30 07:51:13.097+0000: 8296: debug :
qemuProcessHandleMonitorError:351 : Received error on
0x7fc1640eab80 'CentOS_30'
2013-10-30 07:51:13.097+0000: 8304: debug :
qemuMonitorSend:899 : Send command resulted in error Unable
to read from monitor: Connessione interrotta dal
corrispondente
2013-10-30 07:51:13.097+0000: 8304: debug :
qemuProcessStop:3992 : Shutting down VM 'CentOS_30' pid=7655
flags=0
2013-10-30 07:51:13.097+0000: 8296: debug :
qemuMonitorIO:638 : Error on monitor Unable to read from
monitor: Connessione interrotta dal corrispondente
2013-10-30 07:51:13.097+0000: 8296: debug :
qemuMonitorIO:661 : Triggering EOF callback
I noticed in the previous attached CentOS_30.log, qemu said
"Gluster
connection failed for server=172.16.0.100" for each of the
first few
attempts to start qemu. That's why libvirt said it could not
connect to
qemu monitor, the real reason was that qemu process exited
after it
checked the gluster storage. libvirt error message was not
very friendly
in this case.
Then I see on "2013-10-30 08:28:51.047+0000" in CentOS_30.log,
qemu
started successfully without reporting gluster error, this was
fine. In
libvirtd.log on the corresponding time spot, it libvirt
connected to the
monitor and reported it was about to issue a handshake.
Investigate the CentOS_30.log further, I find the attempts to
start qemu
after "2013-10-30 08:28:51" were failed with the same error
"Gluster
connection failed for server=172.16.0.100". Maybe there is a
problem in
the gluster server or firewall? I'm not sure.
http://www.ovirt.org/Features/GlusterFS_Storage_Domain#Important_Pre-requisites
- I assume these have been followed w.r.t setting permissions.
Yes have been followed for both glusterd,vol AND volume
--
SkyNet SRL
Via Maggiate 67/a - 28021 Borgomanero (NO)
- tel.
+39 0322-836487/834765 - fax +39 0322-836608
http://www.skynet.it
Autorizzazione Ministeriale n.197
Le informazioni contenute in questo messaggio
sono
riservate e confidenziali ed è vietata la diffusione in
qualunque
modo eseguita.
Qualora Lei non fosse la persona a cui il presente
messaggio è destinato, La invitiamo ad eliminarlo ed a
distruggerlo
non divulgandolo, dandocene gentilmente comunicazione.
Per
qualsiasi informazione si prega di contattare [email protected]
(e-mail dell'azienda). Rif. D.L. 196/2003
|