Hi Nir,
El 29/04/16 a las 22:34, Nir Soffer escribió:
On Fri, Apr 29, 2016 at 9:17 PM, <nico...@devels.es> wrote:
Hi,
We're running oVirt 3.6.5.3-1 and lately we're experiencing
some
issues
with
some VMs being paused because they're marked as non-responsive.
Mostly,
after a few seconds they recover, but we want to debug
precisely
this
problem so we can fix it consistently.
Our scenario is the following:
~495 VMs, of which ~120 are constantly up
3 datastores, all of them iSCSI-based:
* ds1: 2T, currently has 276 disks
* ds2: 2T, currently has 179 disks
* ds3: 500G, currently has 65 disks
7 hosts: All have mostly the same hardware. CPU and memory are
currently
very lowly used (< 10%).
ds1 and ds2 are physically the same backend which exports
two 2TB
volumes.
ds3 is a different storage backend where we're currently
migrating
some
disks from ds1 and ds2.
What the the storage backend behind ds1 and 2?
The storage backend for ds1 and ds2 is the iSCSI-based HP
LeftHand
P4000
G2.
Usually, when VMs become unresponsive, the whole host where
they run
gets
unresponsive too, so that gives a hint about the problem, my
bet is
the
culprit is somewhere on the host side and not on the VMs side.
Probably the vm became unresponsive because connection to the
host
was
lost.
I forgot to mention that less commonly we have situations where
the
host
doesn't get unresponsive but the VMs on it do and they don't
become
responsive ever again, so we have to forcibly power them off and
start
them
on a different host. But in this case the connection with the
host
doesn't
ever get lost (so basically the host is Up, but any VM run on
them is
unresponsive).
When that
happens, the host itself gets non-responsive and only
recoverable
after
reboot, since it's unable to reconnect.
Piotr, can you check engine log and explain why host is not
reconnected?
I must say this is not specific to
this oVirt version, when we were using v.3.6.4 the same
happened,
and
it's
also worthy mentioning we've not done any configuration changes
and
everything had been working quite well for a long time.
We were monitoring our ds1 and ds2 physical backend to see
performance
and
we suspect we've run out of IOPS since we're reaching the
maximum
specified
by the manufacturer, probably at certain times the host cannot
perform
a
storage operation within some time limit and it marks VMs as
unresponsive.
That's why we've set up ds3 and we're migrating ds1 and ds2 to
ds3.
When
we
run out of space on ds3 we'll create more smaller volumes to
keep
migrating.
On the host side, when this happens, we've run repoplot on the
vdsm
log
and
I'm attaching the result. Clearly there's a *huge* LVM response
time
(~30
secs.).
Indeed the log show very slow vgck and vgs commands - these are
called
every
5 minutes for checking the vg health and refreshing vdsm lvm
cache.
1. starting vgck
Thread-96::DEBUG::2016-04-29
13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd)
/usr/bin/taskset
--cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config '
devices
{ pre
ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter = [
'\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\
'', '\''r|.*|'\'' ] } global { locking_type=1
prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }
backup {
retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48
9c-8eee-10368647c413 (cwd None)
2. vgck ends after 55 seconds
Thread-96::DEBUG::2016-04-29
13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS:
<err> = '
WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!\n'; <rc> = 0
3. starting vgs
Thread-96::DEBUG::2016-04-29
13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd)
/usr/bin/taskset
--cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config '
devices
{ pref
erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1
write_cache_state=0 disable_after_error_count=3 filter = [
'\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de
v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'',
'\''r|.*|'\'' ] } global {
locking_type=1 prioritise_write_locks=1 wait_for_locks=1
use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } '
--noheadings --units b --nosuffix --separator '|
' --ignoreskippedcluster -o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
5de4a000-a9c4-489c-8eee-10368
647c413 (cwd None)
4. vgs finished after 37 seconds
Thread-96::DEBUG::2016-04-29
13:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS:
<err> = '
WARNING: lvmetad is running but disabled. Restart lvmetad before
enabling it!\n'; <rc> = 0
Zdenek, how do you suggest to debug this slow lvm commands?
Can you run the following commands on a host in trouble, and on
some
other
hosts in the same timeframe?
time vgck -vvvv --config ' devices { filter =
['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'',
'\''r|.*|'\'' ] } global { locking_type=1
prioritise_write_locks=1
wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50
retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413
time vgs -vvvv --config ' global { locking_type=1
prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 }
backup {
retain_min = 50 retain_days = 0 } '
5de4a000-a9c4-489c-8eee-10368647c413
Note that I added -vvvv to both commands, this will generate
huge
amount
of debugging info. Please share the output of these commands.
You may need to fix the commands. You can always copy and paste
directly
from vdsm log into the shell and add the -vvvv flag.
Indeed, there seems to be a big difference on hosts 5 and 6. I'm
attaching
the results of the execution of both commands on all hosts. Both
commands
show a pretty bigger output on hosts 5 and 6, and also a much
bigger
execution time. Times are also attached in a file called TIMES.
Our host storage network is correctly configured and on a 1G
interface, no errors on the host itself, switches, etc.
1G serving about 20-70 vms per host? (495 vms, 120 always up, 7
hosts)?
Do you have separate network for management and storage, or both
use this 1G interface?
Yes, we have separate networks for management, storage and
motion.
Storage
and motion have 1G each (plus, for storage we use a bond of 2
interfaces
in
ALB-mode (6)). Currently no host has more than 30 VMs at a time.
We've also limited storage in QoS to use 10MB/s and 40 IOPS,
How did you configure this limit?
In the Data Center tab, I chose our DC and a QoS sub-tab appears,
just
like
described here:
http://www.ovirt.org/documentation/sla/network-qos/
but this issue
still happens, which leads me to be concerned whether this is
not
just
an
IOPS issue; each host handles about cca. 600 LVs. Could this be
an
issue
too? I remark the LVM response times are low in normal
conditions
(~1-2
seconds).
We recommend to limit number lvs per vg to 350. If you have 276
disks
on
ds1, and the disks are using snapshots, you may have too many
lvs,
which
can cause slowdowns in lvm operations.
Can you share the output of:
vdsm-tool dump-volume-chains
5de4a000-a9c4-489c-8eee-10368647c413
Ok, right now no VG has more than 350 VGs so I guess this is not
currently
the problem. Unfortunately, I run the vdsm-tool command but it
didn't
end
nor provide any output in cca. 1 hour, so I guess it was hanged
and I
stopped it. If you confirm this is a normal execution time I can
leave
it
running whatever time it takes.
host5:~# vgs
VG #PV #LV #SN Attr VSize
VFree
0927c050-6fb6-463c-bb37-8b8da641dcd3 1 63 0 wz--n- 499,62g
206,50g
5de4a000-a9c4-489c-8eee-10368647c413 1 335 0 wz--n- 2,00t
518,62g
b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8 1 215 0 wz--n- 2,00t
495,25g
sys_vg 1 6 0 wz--n- 136,44g
122,94g
I'm attaching the vdsm.log, engine.log and repoplot PDF;
This is very useful, thanks. Can you send also the vdsm logs and
repoplots
from other hosts for the same timeframe?
if someone could
give a hint on some additional problems in them and shed some
light
on
the
above thoughts I'd be very grateful.
Do you have sar configured on the host? having sar logs can
reveal
more
info about this timeframe.
Do you have information about amount of io from vms during this
timeframe?
amount of io on the storage backend during this timeframe?
Not currently, but I'll be alert for the time this happens again
and
I'll
check some sar commands related to I/O and I'll provide feedback.
Nevertheless, by the time I run the commands above no machine was
unresponsive and I'm still getting such huge execution times. I
tried
running iotop now and I see there are 2 vgck and vgs processes
with a
rate
of ~500Kb/s each for reading.
root 24446 1.3 0.0 57008 6824 ? D< 09:15 0:00
/usr/sbin/lvm vgs --config devices { preferred_names =
["^/dev/mapper/"]
ignore_suspended_devices=1 write_cache_state=0
disable_after_error_count=3
filter = [
'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|',
'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1
wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50
retain_days
= 0
} --noheadings --units b --nosuffix --separator |
--ignoreskippedcluster
-o
uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name
b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d
I'm also attaching the iostat output for host 5 in a file called
iostat-host5.