El 2016-05-01 14:01, Nir Soffer escribió:
On Sun, May 1, 2016 at 3:31 PM, <nico...@devels.es> wrote:El 2016-04-30 23:22, Nir Soffer escribió:On Sun, May 1, 2016 at 12:48 AM, <nico...@devels.es> wrote:El 2016-04-30 22:37, Nir Soffer escribió:On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer <nsof...@redhat.com> wrote:On Sat, Apr 30, 2016 at 7:16 PM, <nico...@devels.es> wrote:El 2016-04-30 16:55, Nir Soffer escribió:On Sat, Apr 30, 2016 at 11:33 AM, Nicolás <nico...@devels.es> wrote:Hi Nir, El 29/04/16 a las 22:34, Nir Soffer escribió:On Fri, Apr 29, 2016 at 9:17 PM, <nico...@devels.es> wrote:Hi,We're running oVirt 3.6.5.3-1 and lately we're experiencing someissues withsome VMs being paused because they're marked as non-responsive.Mostly,after a few seconds they recover, but we want to debug preciselythis problem so we can fix it consistently. Our scenario is the following: ~495 VMs, of which ~120 are constantly up 3 datastores, all of them iSCSI-based: * ds1: 2T, currently has 276 disks * ds2: 2T, currently has 179 disks * ds3: 500G, currently has 65 disks7 hosts: All have mostly the same hardware. CPU and memory arecurrently very lowly used (< 10%).ds1 and ds2 are physically the same backend which exports two2TB volumes.ds3 is a different storage backend where we're currently migratingsome disks from ds1 and ds2.What the the storage backend behind ds1 and 2?The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHandP4000 G2.Usually, when VMs become unresponsive, the whole host where theyrun getsunresponsive too, so that gives a hint about the problem, my betis theculprit is somewhere on the host side and not on the VMs side.Probably the vm became unresponsive because connection to the hostwas lost.I forgot to mention that less commonly we have situations where thehostdoesn't get unresponsive but the VMs on it do and they don't become responsive ever again, so we have to forcibly power them off andstart themon a different host. But in this case the connection with the hostdoesn'tever get lost (so basically the host is Up, but any VM run on themis unresponsive).When thathappens, the host itself gets non-responsive and only recoverableafter reboot, since it's unable to reconnect.Piotr, can you check engine log and explain why host is not reconnected?I must say this is not specific tothis oVirt version, when we were using v.3.6.4 the same happened,and it'salso worthy mentioning we've not done any configuration changesand everything had been working quite well for a long time. We were monitoring our ds1 and ds2 physical backend to see performance andwe suspect we've run out of IOPS since we're reaching the maximumspecifiedby the manufacturer, probably at certain times the host cannotperform a storage operation within some time limit and it marks VMs as unresponsive.That's why we've set up ds3 and we're migrating ds1 and ds2 tods3. When werun out of space on ds3 we'll create more smaller volumes to keepmigrating.On the host side, when this happens, we've run repoplot on thevdsm log andI'm attaching the result. Clearly there's a *huge* LVM responsetime (~30 secs.).Indeed the log show very slow vgck and vgs commands - these arecalled every5 minutes for checking the vg health and refreshing vdsm lvm cache.1. starting vgck Thread-96::DEBUG::2016-04-2913:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset--cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { pre ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\ '', '\''r|.*|'\'' ] } global { locking_type=1prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup{ retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 9c-8eee-10368647c413 (cwd None) 2. vgck ends after 55 seconds Thread-96::DEBUG::2016-04-2913:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> ='WARNING: lvmetad is running but disabled. Restart lvmetad beforeenabling it!\n'; <rc> = 0 3. starting vgs Thread-96::DEBUG::2016-04-2913:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset--cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { pref erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/de v/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } '--noheadings --units b --nosuffix --separator '| ' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 5de4a000-a9c4-489c-8eee-10368 647c413 (cwd None) 4. vgs finished after 37 seconds Thread-96::DEBUG::2016-04-2913:17:48,680::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: <err> ='WARNING: lvmetad is running but disabled. Restart lvmetad beforeenabling it!\n'; <rc> = 0 Zdenek, how do you suggest to debug this slow lvm commands?Can you run the following commands on a host in trouble, and onsome other hosts in the same timeframe? time vgck -vvvv --config ' devices { filter = ['\''a|/dev/mapper/36000eb3a4f1acbc20000000000000043|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413 time vgs -vvvv --config ' global { locking_type=1prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup{ retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-489c-8eee-10368647c413Note that I added -vvvv to both commands, this will generate hugeamount of debugging info. Please share the output of these commands.You may need to fix the commands. You can always copy and pastedirectly from vdsm log into the shell and add the -vvvv flag.Indeed, there seems to be a big difference on hosts 5 and 6. I'mattachingthe results of the execution of both commands on all hosts. Bothcommandsshow a pretty bigger output on hosts 5 and 6, and also a much biggerexecution time. Times are also attached in a file called TIMES.Our host storage network is correctly configured and on a 1G interface, no errors on the host itself, switches, etc.1G serving about 20-70 vms per host? (495 vms, 120 always up, 7hosts)?Do you have separate network for management and storage, or bothuse this 1G interface?Yes, we have separate networks for management, storage and motion.Storage and motion have 1G each (plus, for storage we use a bond of 2 interfaces inALB-mode (6)). Currently no host has more than 30 VMs at a time.We've also limited storage in QoS to use 10MB/s and 40 IOPS,How did you configure this limit?In the Data Center tab, I chose our DC and a QoS sub-tab appears,just likedescribed here: http://www.ovirt.org/documentation/sla/network-qos/but this issuestill happens, which leads me to be concerned whether this is notjust anIOPS issue; each host handles about cca. 600 LVs. Could this be anissuetoo? I remark the LVM response times are low in normal conditions(~1-2 seconds).We recommend to limit number lvs per vg to 350. If you have 276disks onds1, and the disks are using snapshots, you may have too many lvs,which can cause slowdowns in lvm operations. Can you share the output of:vdsm-tool dump-volume-chains 5de4a000-a9c4-489c-8eee-10368647c413Ok, right now no VG has more than 350 VGs so I guess this is notcurrently the problem. Unfortunately, I run the vdsm-tool command but it didn't endnor provide any output in cca. 1 hour, so I guess it was hanged andIstopped it. If you confirm this is a normal execution time I canleave it running whatever time it takes. host5:~# vgs VG #PV #LV #SN Attr VSize VFree0927c050-6fb6-463c-bb37-8b8da641dcd3 1 63 0 wz--n- 499,62g206,50g5de4a000-a9c4-489c-8eee-10368647c413 1 335 0 wz--n- 2,00t518,62gb13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8 1 215 0 wz--n- 2,00t495,25gsys_vg 1 6 0 wz--n- 136,44g122,94gI'm attaching the vdsm.log, engine.log and repoplot PDF;This is very useful, thanks. Can you send also the vdsm logs andrepoplots from other hosts for the same timeframe?if someone could give a hint on some additional problems in them and shed some light on the above thoughts I'd be very grateful.Do you have sar configured on the host? having sar logs can revealmore info about this timeframe.Do you have information about amount of io from vms during thistimeframe? amount of io on the storage backend during this timeframe?Not currently, but I'll be alert for the time this happens again andI'llcheck some sar commands related to I/O and I'll provide feedback. Nevertheless, by the time I run the commands above no machine wasunresponsive and I'm still getting such huge execution times. I triedrunning iotop now and I see there are 2 vgck and vgs processes witha rate of ~500Kb/s each for reading.root 24446 1.3 0.0 57008 6824 ? D< 09:15 0:00/usr/sbin/lvm vgs --config devices { preferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ 'a|/dev/mapper/36000eb3a4f1acbc20000000000000043|/dev/mapper/36000eb3a4f1acbc200000000000000b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|', 'r|.*|' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } --noheadings --units b --nosuffix --separator | --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name b13b9eac-1f2e-4a7e-bcd9-49f5f855c3dI'm also attaching the iostat output for host 5 in a file callediostat-host5.Interesting, all hosts are checking the same storage, but on host5and host6the output of vgs and vgck is 10 times bigger - this explain why theytake about 10 times longer to run. $ ls -lh vgs-* -rw-r--r--. 1 nsoffer nsoffer 221K Apr 30 08:55 vgs-host1 -rw-r--r--. 1 nsoffer nsoffer 239K Apr 30 08:55 vgs-host2 -rw-r--r--. 1 nsoffer nsoffer 228K Apr 30 08:55 vgs-host3 -rw-r--r--. 1 nsoffer nsoffer 288K Apr 30 08:55 vgs-host4 -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host5 -rw-r--r--. 1 nsoffer nsoffer 2.2M Apr 30 08:55 vgs-host6 -rw-r--r--. 1 nsoffer nsoffer 232K Apr 30 08:55 vgs-host7 $ ls -lh vgck-* -rw-r--r--. 1 nsoffer nsoffer 238K Apr 30 08:55 vgck-host1 -rw-r--r--. 1 nsoffer nsoffer 254K Apr 30 08:55 vgck-host2 -rw-r--r--. 1 nsoffer nsoffer 244K Apr 30 08:55 vgck-host3 -rw-r--r--. 1 nsoffer nsoffer 298K Apr 30 08:55 vgck-host4 -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host5 -rw-r--r--. 1 nsoffer nsoffer 2.0M Apr 30 08:55 vgck-host6 -rw-r--r--. 1 nsoffer nsoffer 248K Apr 30 08:55 vgck-host7Can you do collect the output of "dmsetup table -v" on host 1, 5, and6? NirSure, please find attached results of the command. Once again, forhosts 5 and 6 the size is 10 times bigger than for host 1.I think the issue causing slow vgs and vgck commands is stalelvs on host 5 and 6. This may be the root caused for paused vms, but Icannot explain how it is related yet. Comparing vgck on host1 and host5 - we can see that vgck opens 213 /dev/mapper/* devices, actually 53 uniq devices, most of them are opened 4 times. On host5, vgck opens 2489 devices (622 uniq). This explains why the operation takes about 10 times longer.Checking dmsteup table output, we can see that host1 has 53 devices,and host5 622 devices. Checking the device open count, host1 has 15 stale devices (Open count: 0), but host5 has 597 stale devices.Leaving stale devices is a known issue, but we never had evidence thatit cause trouble except warnings in lvm commands.Please open an ovirt bug about this issue, and include all the filesyou sent so far in this thread, and all the vdsm logs on host5. To remove the stale devices, you can do:for name in `dmsetup info -c -o open,name | awk '/ 0 / {print $2}'`; doecho "removing stale device: $name" dmsetup remove $name doneNote that this cleanup is safe only in maintenance mode, since it willalso removethe mapping for vdsm special lvs (ids, leases, inbox, outbox, master,metatada) that are accessed by vdsm regularly, and are not stale. This is also not safe if you use hosted engine, hosted engine agent should maintain its vg. The safest way to remove stale devices is to restart vdsm - it deactivates all unused lvs (except special lvs) during startup.Hm, I run it without even putting the host on maintenance, but I willnext time. Fortunately it seems it didn't do any harm and everything is working.I'll probably do a script that will put host on maintenance, cleanup or restart vdsm and start again (this is not a hosted engine) if the numberof stale devices is bigger than a threshold.I would not do anything automatic at this point except monitoring thisissue. For cleanup - you have 2 options: - host is up - restart vdsm - host in maintenance - remove stale devices Based on local testing, I think the issue is this: - When connecting to storage, all lvs becomes active - this may be a change in lvm during el 7 development that we did not notice.- Each time vdsm is restarted, all active lvs that should not be activeare deactivated during vdsm startup - When deactivating storage domain, vdsm does not deactivate the lvs leaving stale devices. Can you confirm that works like this on your system? To test this:I did these tests on host5. Before switching the host to maintenance, I run the vgck command and execution time was: 0m1.347s. Current stale devices atthat time were: 5de4a000--a9c4--489c--8eee--10368647c413-master 5de4a000--a9c4--489c--8eee--10368647c413-outbox 0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata 5de4a000--a9c4--489c--8eee--10368647c413-ids 0927c050--6fb6--463c--bb37--8b8da641dcd3-leases 5de4a000--a9c4--489c--8eee--10368647c413-inbox b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-ids b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox 5de4a000--a9c4--489c--8eee--10368647c413-leases 0927c050--6fb6--463c--bb37--8b8da641dcd3-master 5de4a000--a9c4--489c--8eee--10368647c413-metadata b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master 0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox 0927c050--6fb6--463c--bb37--8b8da641dcd3-inbox b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox 0927c050--6fb6--463c--bb37--8b8da641dcd3-ids(seems there were not any new stale devices since yesterday after runningthe loop).Expected- vdsm access master, metadata, inbox and outbox from time to time, they mustbe active when host is up- sanlock access ids when host is up - this lv should have open count of 11. Put one host to maintenance You should see some stale devices - the special lvs (ids, leases, master, metadata, inbox, outbox) And probably 2 OVF_STORE lvs per vg (128m lvs used to store vm ovf)Still had the same devices as above.Expected bug - we don't cleanup lvs properly when entering maintenance.2. Cleanup the stale devicesAfter cleaning the stale devices, those remained: 36000eb3a4f1acbc20000000000000043 360014056f0dc8930d744f83af8ddc709 36000eb3a4f1acbc200000000000000b9Expected bug.These are the multipath mappings, running multipath -F should remove them.If we clean the lvs properly when deactivating domain during maintenance, thesemappings will be automatically removed.3. Activate host You should see again about 600 stale devices, and lvm commands will probably run much slower as you reported.Current stale devices were these: 5de4a000--a9c4--489c--8eee--10368647c413-master 5de4a000--a9c4--489c--8eee--10368647c413-outbox 0927c050--6fb6--463c--bb37--8b8da641dcd3-metadata 0927c050--6fb6--463c--bb37--8b8da641dcd3-leases 5de4a000--a9c4--489c--8eee--10368647c413-inbox b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-leases b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-metadata b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-inbox 5de4a000--a9c4--489c--8eee--10368647c413-leases 0927c050--6fb6--463c--bb37--8b8da641dcd3-master 5de4a000--a9c4--489c--8eee--10368647c413-metadata b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-master 0927c050--6fb6--463c--bb37--8b8da641dcd3-outbox b13b9eac--1f2e--4a7e--bcd9--49f5f855c3d8-outbox 0927c050--6fb6--463c--bb37--8b8da641dcd3-inboxStrange, on my test setup (RHEL 7.2), *all* the lvs in the vg get activated. Did you check the output of lvs? pvscan --cache lvs I guess you will see about 600 lvs active (-wi-a-----) Maybe you did not wait until host connected to all storage domains?
Ok, that was the problem, I didn't run the 'pvscan --cache' command prior to see the output. After activating the host again, I see 620 LVs. However, in my case all LVs except the special ones are inactive:
host5:~# lvscan | grep -v inactiveACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/metadata' [512,00 MiB] inherit ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/outbox' [128,00 MiB] inherit ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/leases' [2,00 GiB] inherit ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/ids' [128,00 MiB] inherit ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/inbox' [128,00 MiB] inherit ACTIVE '/dev/0927c050-6fb6-463c-bb37-8b8da641dcd3/master' [1,00 GiB] inherit
ACTIVE '/dev/sys_vg/audit' [512,00 MiB] inherit ACTIVE '/dev/sys_vg/tmp' [1,00 GiB] inherit ACTIVE '/dev/sys_vg/var' [4,00 GiB] inherit ACTIVE '/dev/sys_vg/home' [1,00 GiB] inherit ACTIVE '/dev/sys_vg/root' [6,00 GiB] inherit ACTIVE '/dev/sys_vg/swap' [1,00 GiB] inheritACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/metadata' [512,00 MiB] inherit ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/outbox' [128,00 MiB] inherit ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/leases' [2,00 GiB] inherit ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/ids' [128,00 MiB] inherit ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/inbox' [128,00 MiB] inherit ACTIVE '/dev/5de4a000-a9c4-489c-8eee-10368647c413/master' [1,00 GiB] inherit ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/metadata' [512,00 MiB] inherit ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/outbox' [128,00 MiB] inherit ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/leases' [2,00 GiB] inherit ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/ids' [128,00 MiB] inherit ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/inbox' [128,00 MiB] inherit ACTIVE '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/master' [1,00 GiB] inherit
Others are all 'inactive':inactive '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/d256c965-25dd-40f2-aeae-4a3ae10befcf' [3,12 GiB] inherit inactive '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/6cfafa3a-c335-47bf-805e-bddb850dc914' [3,12 GiB] inherit inactive '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/0d60a745-6952-42e7-8bcf-a12d36b2c39e' [5,25 GiB] inherit inactive '/dev/b13b9eac-1f2e-4a7e-bcd9-49f5f855c3d8/38d94545-b4b0-4384-ae3a-fa26c40e40e3' [5,25 GiB] inherit
I tested this on both hosts (5 and 6) with same result.
4. Restart vdsm All the regular lvs should be inactive now, no stale devices should exists, except the special lvs (expected)Same result as above. The strange thing is that now I don't see any VGsrunning the 'vgs' command but the system's: host5:~# vgs VG #PV #LV #SN Attr VSize VFree sys_vg 1 6 0 wz--n- 136,44g 122,94gVdsm bypass lvmetads, so all lvm commands from the shell show incorrect results. You must update lvmetad about changes made by vdsm using pvscan --cache vgs Or using vdsm way: vgs --config 'global {use_lvmetad=0}'
(Same here, after using 'pvscan --cache' I can see the VGs).
However, machines running on that host have no issues and storage is working well. vgscan doesn't do the trick either. I tested this on host6 too and the same happens: After reactivating host I cannot see VGs with the vgs command- however, everything seems to work quite well. This is CentOS Linux release 7.2.1511 FWIW.Nir
_______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users