Re: [ovirt-users] Import storage domain - disks not listed
Hi Sahina, The disks with snapshots should be part of the VMs, once you will register those VMs you should see those disks in the disks sub tab. Regarding floating disks (without snapshots), you can register them through REST. If you are working on the master branch there should be a sub tab dedicated for those also. Regards, Maor On Tue, Apr 26, 2016 at 1:44 PM, Sahina Bosewrote: > Hi all, > > I have a gluster volume used as data storage domain which is replicated to > a slave gluster volume (say, slavevol) using gluster's geo-replication > feature. > > Now, in a new oVirt instance, I use the import storage domain to import > the slave gluster volume. The "VM Import" tab correctly lists the VMs that > were present in my original gluster volume. However the "Disks" tab is > empty. > > GET > https://new-ovitt/api/storagedomains/5e1a37cf-933d-424c-8e3d-eb9e40b690a7/disks;unregistered > --> > > > > In the code GetUnregisteredDiskQuery - if volumesList.size() != 1 - the > image is skipped with a comment that we can't deal with snapshots. > > How do I recover the disks/images in this case? > > > Further info: > > /rhev/data-center/mnt/glusterSD/10.70.40.112:_slavevol > ├── 5e1a37cf-933d-424c-8e3d-eb9e40b690a7 > │ ├── dom_md > │ │ ├── ids > │ │ ├── inbox > │ │ ├── leases > │ │ ├── metadata > │ │ └── outbox > │ ├── images > │ │ ├── 202efaa6-0d01-40f3-a541-10eee920d221 > │ │ │ ├── eb701046-6ee1-4c9d-b097-e51a8fd283e1 > │ │ │ ├── eb701046-6ee1-4c9d-b097-e51a8fd283e1.lease > │ │ │ └── eb701046-6ee1-4c9d-b097-e51a8fd283e1.meta > │ │ ├── c52e4e02-dc6c-4a77-a184-9fcab88106c2 > │ │ │ ├── 34e46104-8fad-4510-a5bf-0730b97a6659 > │ │ │ ├── 34e46104-8fad-4510-a5bf-0730b97a6659.lease > │ │ │ ├── 34e46104-8fad-4510-a5bf-0730b97a6659.meta > │ │ │ ├── 766a15b9-57db-417d-bfa0-beadbbb84ad2 > │ │ │ ├── 766a15b9-57db-417d-bfa0-beadbbb84ad2.lease > │ │ │ ├── 766a15b9-57db-417d-bfa0-beadbbb84ad2.meta > │ │ │ ├── 90f1e26a-00e9-4ea5-9e92-2e448b9b8bfa > │ │ │ ├── 90f1e26a-00e9-4ea5-9e92-2e448b9b8bfa.lease > │ │ │ └── 90f1e26a-00e9-4ea5-9e92-2e448b9b8bfa.meta > │ │ ├── c75de5b7-aa88-48d7-ba1b-067181eac6ae > │ │ │ ├── ff09e16a-e8a0-452b-b95c-e160e68d09a9 > │ │ │ ├── ff09e16a-e8a0-452b-b95c-e160e68d09a9.lease > │ │ │ └── ff09e16a-e8a0-452b-b95c-e160e68d09a9.meta > │ │ ├── efa94a0d-c08e-4ad9-983b-4d1d76bca865 > │ │ │ ├── 64e3913c-da91-447c-8b69-1cff1f34e4b7 > │ │ │ ├── 64e3913c-da91-447c-8b69-1cff1f34e4b7.lease > │ │ │ ├── 64e3913c-da91-447c-8b69-1cff1f34e4b7.meta > │ │ │ ├── 8174e8b4-3605-4db3-86a1-cb62c3a079f4 > │ │ │ ├── 8174e8b4-3605-4db3-86a1-cb62c3a079f4.lease > │ │ │ ├── 8174e8b4-3605-4db3-86a1-cb62c3a079f4.meta > │ │ │ ├── e79a8821-bb4a-436a-902d-3876f107dd99 > │ │ │ ├── e79a8821-bb4a-436a-902d-3876f107dd99.lease > │ │ │ └── e79a8821-bb4a-436a-902d-3876f107dd99.meta > │ │ └── f5eacc6e-4f16-4aa5-99ad-53ac1cda75b7 > │ │ ├── 476bbfe9-1805-4c43-bde6-e7de5f7bd75d > │ │ ├── 476bbfe9-1805-4c43-bde6-e7de5f7bd75d.lease > │ │ └── 476bbfe9-1805-4c43-bde6-e7de5f7bd75d.meta > │ └── master > │ ├── tasks > │ └── vms > └── __DIRECT_IO_TEST__ > > engine.log: > 2016-04-26 06:37:57,715 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.GetImageInfoVDSCommand] > (org.ovirt.thread.pool-6-thread-25) [5e6b7a53] FINISH, > GetImageInfoVDSCommand, return: org.ov > irt.engine.core.common.businessentities.storage.DiskImage@d4b3ac2f, log > id: 7b693bad > 2016-04-26 06:37:57,724 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.GetVolumesListVDSCommand] > (org.ovirt.thread.pool-6-thread-25) [5e6b7a53] START, > GetVolumesListVDSCommand( StoragePool > DomainAndGroupIdBaseVDSCommandParameters:{runAsync='true', > storagePoolId='ed338557-5995-4634-97e2-15454a9d8800', > ignoreFailoverLimit='false', > storageDomainId='5e1a37cf-933d-424c-8e3d-eb9e40b > 690a7', imageGroupId='c52e4e02-dc6c-4a77-a184-9fcab88106c2'}), log id: > 741b9214 > 2016-04-26 06:37:58,748 INFO > [org.ovirt.engine.core.vdsbroker.irsbroker.GetVolumesListVDSCommand] > (org.ovirt.thread.pool-6-thread-25) [5e6b7a53] FINISH, > GetVolumesListVDSCommand, return: [9 > 0f1e26a-00e9-4ea5-9e92-2e448b9b8bfa, 766a15b9-57db-417d-bfa0-beadbbb84ad2, > 34e46104-8fad-4510-a5bf-0730b97a6659], log id: 741b9214 > > ___ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users > ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VMs becoming non-responsive sporadically
El 2016-04-30 23:22, Nir Soffer escribió: On Sun, May 1, 2016 at 12:48 AM,wrote: El 2016-04-30 22:37, Nir Soffer escribió: On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer wrote: On Sat, Apr 30, 2016 at 7:16 PM, wrote: El 2016-04-30 16:55, Nir Soffer escribió: On Sat, Apr 30, 2016 at 11:33 AM, Nicolás wrote: Hi Nir, El 29/04/16 a las 22:34, Nir Soffer escribió: On Fri, Apr 29, 2016 at 9:17 PM, wrote: Hi, We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues with some VMs being paused because they're marked as non-responsive. Mostly, after a few seconds they recover, but we want to debug precisely this problem so we can fix it consistently. Our scenario is the following: ~495 VMs, of which ~120 are constantly up 3 datastores, all of them iSCSI-based: * ds1: 2T, currently has 276 disks * ds2: 2T, currently has 179 disks * ds3: 500G, currently has 65 disks 7 hosts: All have mostly the same hardware. CPU and memory are currently very lowly used (< 10%). ds1 and ds2 are physically the same backend which exports two 2TB volumes. ds3 is a different storage backend where we're currently migrating some disks from ds1 and ds2. What the the storage backend behind ds1 and 2? The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 G2. Usually, when VMs become unresponsive, the whole host where they run gets unresponsive too, so that gives a hint about the problem, my bet is the culprit is somewhere on the host side and not on the VMs side. Probably the vm became unresponsive because connection to the host was lost. I forgot to mention that less commonly we have situations where the host doesn't get unresponsive but the VMs on it do and they don't become responsive ever again, so we have to forcibly power them off and start them on a different host. But in this case the connection with the host doesn't ever get lost (so basically the host is Up, but any VM run on them is unresponsive). When that happens, the host itself gets non-responsive and only recoverable after reboot, since it's unable to reconnect. Piotr, can you check engine log and explain why host is not reconnected? I must say this is not specific to this oVirt version, when we were using v.3.6.4 the same happened, and it's also worthy mentioning we've not done any configuration changes and everything had been working quite well for a long time. We were monitoring our ds1 and ds2 physical backend to see performance and we suspect we've run out of IOPS since we're reaching the maximum specified by the manufacturer, probably at certain times the host cannot perform a storage operation within some time limit and it marks VMs as unresponsive. That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When we run out of space on ds3 we'll create more smaller volumes to keep migrating. On the host side, when this happens, we've run repoplot on the vdsm log and I'm attaching the result. Clearly there's a *huge* LVM response time (~30 secs.). Indeed the log show very slow vgck and vgs commands - these are called every 5 minutes for checking the vg health and refreshing vdsm lvm cache. 1. starting vgck Thread-96::DEBUG::2016-04-29 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { pre ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ '', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 9c-8eee-10368647c413 (cwd None) 2. vgck ends after 55 seconds Thread-96::DEBUG::2016-04-29 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; = 0 3. starting vgs Thread-96::DEBUG::2016-04-29 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { pref erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20043|/de v/mapper/36000eb3a4f1acbc200b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '| ' --ignoreskippedcluster -o
Re: [ovirt-users] VMs becoming non-responsive sporadically
On Sun, May 1, 2016 at 12:48 AM,wrote: > El 2016-04-30 22:37, Nir Soffer escribió: >> >> On Sat, Apr 30, 2016 at 10:28 PM, Nir Soffer wrote: >>> >>> On Sat, Apr 30, 2016 at 7:16 PM, wrote: El 2016-04-30 16:55, Nir Soffer escribió: > > > On Sat, Apr 30, 2016 at 11:33 AM, Nicolás wrote: >> >> >> Hi Nir, >> >> El 29/04/16 a las 22:34, Nir Soffer escribió: >>> >>> >>> >>> On Fri, Apr 29, 2016 at 9:17 PM, wrote: Hi, We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues with some VMs being paused because they're marked as non-responsive. Mostly, after a few seconds they recover, but we want to debug precisely this problem so we can fix it consistently. Our scenario is the following: ~495 VMs, of which ~120 are constantly up 3 datastores, all of them iSCSI-based: * ds1: 2T, currently has 276 disks * ds2: 2T, currently has 179 disks * ds3: 500G, currently has 65 disks 7 hosts: All have mostly the same hardware. CPU and memory are currently very lowly used (< 10%). ds1 and ds2 are physically the same backend which exports two 2TB volumes. ds3 is a different storage backend where we're currently migrating some disks from ds1 and ds2. >>> >>> >>> >>> What the the storage backend behind ds1 and 2? >> >> >> >> >> The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand >> P4000 >> G2. >> Usually, when VMs become unresponsive, the whole host where they run gets unresponsive too, so that gives a hint about the problem, my bet is the culprit is somewhere on the host side and not on the VMs side. >>> >>> >>> >>> Probably the vm became unresponsive because connection to the host >>> was >>> lost. >> >> >> >> >> I forgot to mention that less commonly we have situations where the >> host >> doesn't get unresponsive but the VMs on it do and they don't become >> responsive ever again, so we have to forcibly power them off and start >> them >> on a different host. But in this case the connection with the host >> doesn't >> ever get lost (so basically the host is Up, but any VM run on them is >> unresponsive). >> >> When that happens, the host itself gets non-responsive and only recoverable after reboot, since it's unable to reconnect. >>> >>> >>> >>> Piotr, can you check engine log and explain why host is not >>> reconnected? >>> I must say this is not specific to this oVirt version, when we were using v.3.6.4 the same happened, and it's also worthy mentioning we've not done any configuration changes and everything had been working quite well for a long time. We were monitoring our ds1 and ds2 physical backend to see performance and we suspect we've run out of IOPS since we're reaching the maximum specified by the manufacturer, probably at certain times the host cannot perform a storage operation within some time limit and it marks VMs as unresponsive. That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When we run out of space on ds3 we'll create more smaller volumes to keep migrating. On the host side, when this happens, we've run repoplot on the vdsm log and I'm attaching the result. Clearly there's a *huge* LVM response time (~30 secs.). >>> >>> >>> >>> Indeed the log show very slow vgck and vgs commands - these are >>> called >>> every >>> 5 minutes for checking the vg health and refreshing vdsm lvm cache. >>> >>> 1. starting vgck >>> >>> Thread-96::DEBUG::2016-04-29 >>> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >>> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' >>> devices >>> { pre >>> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >>> write_cache_state=0 disable_after_error_count=3 filter = [ >>> '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ >>> '', '\''r|.*|'\'' ] } global { locking_type=1 >>> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { >>> retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 >>> 9c-8eee-10368647c413 (cwd None) >>> >>> 2. vgck ends after 55 seconds
Re: [ovirt-users] VMs becoming non-responsive sporadically
El 2016-04-30 22:37, Nir Soffer escribió: On Sat, Apr 30, 2016 at 10:28 PM, Nir Sofferwrote: On Sat, Apr 30, 2016 at 7:16 PM, wrote: El 2016-04-30 16:55, Nir Soffer escribió: On Sat, Apr 30, 2016 at 11:33 AM, Nicolás wrote: Hi Nir, El 29/04/16 a las 22:34, Nir Soffer escribió: On Fri, Apr 29, 2016 at 9:17 PM, wrote: Hi, We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues with some VMs being paused because they're marked as non-responsive. Mostly, after a few seconds they recover, but we want to debug precisely this problem so we can fix it consistently. Our scenario is the following: ~495 VMs, of which ~120 are constantly up 3 datastores, all of them iSCSI-based: * ds1: 2T, currently has 276 disks * ds2: 2T, currently has 179 disks * ds3: 500G, currently has 65 disks 7 hosts: All have mostly the same hardware. CPU and memory are currently very lowly used (< 10%). ds1 and ds2 are physically the same backend which exports two 2TB volumes. ds3 is a different storage backend where we're currently migrating some disks from ds1 and ds2. What the the storage backend behind ds1 and 2? The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 G2. Usually, when VMs become unresponsive, the whole host where they run gets unresponsive too, so that gives a hint about the problem, my bet is the culprit is somewhere on the host side and not on the VMs side. Probably the vm became unresponsive because connection to the host was lost. I forgot to mention that less commonly we have situations where the host doesn't get unresponsive but the VMs on it do and they don't become responsive ever again, so we have to forcibly power them off and start them on a different host. But in this case the connection with the host doesn't ever get lost (so basically the host is Up, but any VM run on them is unresponsive). When that happens, the host itself gets non-responsive and only recoverable after reboot, since it's unable to reconnect. Piotr, can you check engine log and explain why host is not reconnected? I must say this is not specific to this oVirt version, when we were using v.3.6.4 the same happened, and it's also worthy mentioning we've not done any configuration changes and everything had been working quite well for a long time. We were monitoring our ds1 and ds2 physical backend to see performance and we suspect we've run out of IOPS since we're reaching the maximum specified by the manufacturer, probably at certain times the host cannot perform a storage operation within some time limit and it marks VMs as unresponsive. That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When we run out of space on ds3 we'll create more smaller volumes to keep migrating. On the host side, when this happens, we've run repoplot on the vdsm log and I'm attaching the result. Clearly there's a *huge* LVM response time (~30 secs.). Indeed the log show very slow vgck and vgs commands - these are called every 5 minutes for checking the vg health and refreshing vdsm lvm cache. 1. starting vgck Thread-96::DEBUG::2016-04-29 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { pre ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ '', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 9c-8eee-10368647c413 (cwd None) 2. vgck ends after 55 seconds Thread-96::DEBUG::2016-04-29 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; = 0 3. starting vgs Thread-96::DEBUG::2016-04-29 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { pref erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20043|/de v/mapper/36000eb3a4f1acbc200b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' --noheadings --units b --nosuffix --separator '| ' --ignoreskippedcluster -o uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name 5de4a000-a9c4-489c-8eee-10368 647c413 (cwd None) 4. vgs finished after 37 seconds Thread-96::DEBUG::2016-04-29
Re: [ovirt-users] VMs becoming non-responsive sporadically
On Sat, Apr 30, 2016 at 10:28 PM, Nir Sofferwrote: > On Sat, Apr 30, 2016 at 7:16 PM, wrote: >> El 2016-04-30 16:55, Nir Soffer escribió: >>> >>> On Sat, Apr 30, 2016 at 11:33 AM, Nicolás wrote: Hi Nir, El 29/04/16 a las 22:34, Nir Soffer escribió: > > > On Fri, Apr 29, 2016 at 9:17 PM, wrote: >> >> >> Hi, >> >> We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues >> with >> some VMs being paused because they're marked as non-responsive. Mostly, >> after a few seconds they recover, but we want to debug precisely this >> problem so we can fix it consistently. >> >> Our scenario is the following: >> >> ~495 VMs, of which ~120 are constantly up >> 3 datastores, all of them iSCSI-based: >>* ds1: 2T, currently has 276 disks >>* ds2: 2T, currently has 179 disks >>* ds3: 500G, currently has 65 disks >> 7 hosts: All have mostly the same hardware. CPU and memory are >> currently >> very lowly used (< 10%). >> >>ds1 and ds2 are physically the same backend which exports two 2TB >> volumes. >> ds3 is a different storage backend where we're currently migrating some >> disks from ds1 and ds2. > > > What the the storage backend behind ds1 and 2? The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 G2. >> Usually, when VMs become unresponsive, the whole host where they run >> gets >> unresponsive too, so that gives a hint about the problem, my bet is the >> culprit is somewhere on the host side and not on the VMs side. > > > Probably the vm became unresponsive because connection to the host was > lost. I forgot to mention that less commonly we have situations where the host doesn't get unresponsive but the VMs on it do and they don't become responsive ever again, so we have to forcibly power them off and start them on a different host. But in this case the connection with the host doesn't ever get lost (so basically the host is Up, but any VM run on them is unresponsive). >> When that >> happens, the host itself gets non-responsive and only recoverable after >> reboot, since it's unable to reconnect. > > > Piotr, can you check engine log and explain why host is not reconnected? > >> I must say this is not specific to >> this oVirt version, when we were using v.3.6.4 the same happened, and >> it's >> also worthy mentioning we've not done any configuration changes and >> everything had been working quite well for a long time. >> >> We were monitoring our ds1 and ds2 physical backend to see performance >> and >> we suspect we've run out of IOPS since we're reaching the maximum >> specified >> by the manufacturer, probably at certain times the host cannot perform >> a >> storage operation within some time limit and it marks VMs as >> unresponsive. >> That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. >> When >> we >> run out of space on ds3 we'll create more smaller volumes to keep >> migrating. >> >> On the host side, when this happens, we've run repoplot on the vdsm log >> and >> I'm attaching the result. Clearly there's a *huge* LVM response time >> (~30 >> secs.). > > > Indeed the log show very slow vgck and vgs commands - these are called > every > 5 minutes for checking the vg health and refreshing vdsm lvm cache. > > 1. starting vgck > > Thread-96::DEBUG::2016-04-29 > 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset > --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices > { pre > ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 > write_cache_state=0 disable_after_error_count=3 filter = [ > '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ > '', '\''r|.*|'\'' ] } global { locking_type=1 > prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { > retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 > 9c-8eee-10368647c413 (cwd None) > > 2. vgck ends after 55 seconds > > Thread-96::DEBUG::2016-04-29 > 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: = ' > WARNING: lvmetad is running but disabled. Restart lvmetad before > enabling it!\n'; = 0 > > 3. starting vgs > > Thread-96::DEBUG::2016-04-29 > 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset > --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices > { pref > erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 > write_cache_state=0 disable_after_error_count=3 filter =
[ovirt-users] hosted engine setup failed for 10 minutes delay.. engine seems alive
Hello, trying to deploy a self hosted engine on an Intel NUC6i5SYB with CentOS 7.2 using oVirt 3.6.5 and appliance (picked up rpm is ovirt-engine-appliance-3.6-20160420.1.el7.centos.noarch) Near the end of the command hosted-engine --deploy I get ... |- [ INFO ] Initializing PostgreSQL |- [ INFO ] Creating PostgreSQL 'engine' database |- [ INFO ] Configuring PostgreSQL |- [ INFO ] Creating/refreshing Engine database schema |- [ INFO ] Creating/refreshing Engine 'internal' domain database schema [ ERROR ] Engine setup got stuck on the appliance [ ERROR ] Failed to execute stage 'Closing up': Engine setup is stalled on the appliance since 600 seconds ago. Please check its log on the appliance. [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20160430200654.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: this system is not reliable, please check the issue, fix and redeploy On host log I indeed see the 10 minutes timeout: 2016-04-30 19:56:52 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:219 DIALOG:SEND |- [ INFO ] Creating/refreshing Engine 'internal' domain database schema 2016-04-30 20:06:53 ERROR otopi.plugins.ovirt_hosted_engine_setup.engine.health health._closeup:140 Engine setup got stuck on the appliance On engine I don't see any particular problem but a ten minutes delay in its log: 2016-04-30 17:56:57 DEBUG otopi.context context.dumpEnvironment:514 ENVIRONMENT DUMP - END 2016-04-30 17:56:57 DEBUG otopi.context context._executeMethod:142 Stage misc METHOD otopi.plugins.ovirt_engine_setup.ovirt_engine.config.aaajdbc.Plugin._setupAdminPassword 2016-04-30 17:56:57 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.aaajdbc plugin.executeRaw:828 execute: ('/usr/bin/ovirt-aaa-jdbc-tool', '--db-config=/etc/ovirt-engine/aaa/internal.properties', 'user', 'password-reset', 'admin', '--password=env:pass', '--force', '--password-valid-to=2216-03-13 17:56:57Z'), executable='None', cwd='None', env={'LANG': 'en_US.UTF-8', 'SHLVL': '1', 'PYTHONPATH': '/usr/share/ovirt-engine/setup/bin/..::', 'pass': '**FILTERED**', 'OVIRT_ENGINE_JAVA_HOME_FORCE': '1', 'PWD': '/', 'OVIRT_ENGINE_JAVA_HOME': u'/usr/lib/jvm/jre', 'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin', 'OTOPI_LOGFILE': '/var/log/ovirt-engine/setup/ovirt-engine-setup-20160430175551-dttt2p.log', 'OVIRT_JBOSS_HOME': '/usr/share/ovirt-engine-wildfly', 'OTOPI_EXECDIR': '/'} 2016-04-30 18:07:06 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.aaajdbc plugin.executeRaw:878 execute-result: ('/usr/bin/ovirt-aaa-jdbc-tool', '--db-config=/etc/ovirt-engine/aaa/internal.properties', 'user', 'password-reset', 'admin', '--password=env:pass', '--force', '--password-valid-to=2216-03-13 17:56:57Z'), rc=0 and its last lines are: 2016-04-30 18:07:06 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.aaajdbc plugin.execute:936 execute-output: ('/usr/bin/ovirt-aaa-jdbc-tool', '--db-config=/etc/ovirt-engine/aaa/internal.properties', 'user', 'password-reset', 'admin', '--password=env:pass', '--force', '--password-valid-to=2216-03-13 17:56:57Z') stdout: updating user admin... user updated successfully 2016-04-30 18:07:06 DEBUG otopi.plugins.ovirt_engine_setup.ovirt_engine.config.aaajdbc plugin.execute:941 execute-output: ('/usr/bin/ovirt-aaa-jdbc-tool', '--db-config=/etc/ovirt-engine/aaa/internal.properties', 'user', 'password-reset', 'admin', '--password=env:pass', '--force', '--password-valid-to=2216-03-13 17:56:57Z') stderr: 2016-04-30 18:07:06 DEBUG otopi.context context._executeMethod:142 Stage misc METHOD otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca.Plugin._miscUpgrade 2016-04-30 18:07:06 INFO otopi.plugins.ovirt_engine_setup.ovirt_engine.pki.ca ca._miscUpgrade:510 Upgrading CA Full logs of host and engine here: https://drive.google.com/file/d/0BwoPbcrMv8mvQm9jeDhpZEdRUjg/view?usp=sharing I can connect via vnc to the engine and see 277 tables in engine database (277 rows in output of "\d" command) Can anyone tell me if I can follow up without starting from scratch and how in case? Also understand the reason of this delay, as the NUC is a physical host with 32Gb of ram and SSD disks and should be quite fast... faster than a VM non my laptop where I had no problems in similar setup... As a last question how to clean up things in case I have to start from scratch. I can leave the situation as it is in the moment, so I can work on the live environment before power off Thanks in advance, Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VMs becoming non-responsive sporadically
On Sat, Apr 30, 2016 at 11:33 AM, Nicoláswrote: > Hi Nir, > > El 29/04/16 a las 22:34, Nir Soffer escribió: >> >> On Fri, Apr 29, 2016 at 9:17 PM, wrote: >>> >>> Hi, >>> >>> We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues >>> with >>> some VMs being paused because they're marked as non-responsive. Mostly, >>> after a few seconds they recover, but we want to debug precisely this >>> problem so we can fix it consistently. >>> >>> Our scenario is the following: >>> >>> ~495 VMs, of which ~120 are constantly up >>> 3 datastores, all of them iSCSI-based: >>>* ds1: 2T, currently has 276 disks >>>* ds2: 2T, currently has 179 disks >>>* ds3: 500G, currently has 65 disks >>> 7 hosts: All have mostly the same hardware. CPU and memory are currently >>> very lowly used (< 10%). >>> >>>ds1 and ds2 are physically the same backend which exports two 2TB >>> volumes. >>> ds3 is a different storage backend where we're currently migrating some >>> disks from ds1 and ds2. >> >> What the the storage backend behind ds1 and 2? > > > The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 G2. > >>> Usually, when VMs become unresponsive, the whole host where they run gets >>> unresponsive too, so that gives a hint about the problem, my bet is the >>> culprit is somewhere on the host side and not on the VMs side. >> >> Probably the vm became unresponsive because connection to the host was >> lost. > > > I forgot to mention that less commonly we have situations where the host > doesn't get unresponsive but the VMs on it do and they don't become > responsive ever again, so we have to forcibly power them off and start them > on a different host. But in this case the connection with the host doesn't > ever get lost (so basically the host is Up, but any VM run on them is > unresponsive). > > >>> When that >>> happens, the host itself gets non-responsive and only recoverable after >>> reboot, since it's unable to reconnect. >> >> Piotr, can you check engine log and explain why host is not reconnected? >> >>> I must say this is not specific to >>> this oVirt version, when we were using v.3.6.4 the same happened, and >>> it's >>> also worthy mentioning we've not done any configuration changes and >>> everything had been working quite well for a long time. >>> >>> We were monitoring our ds1 and ds2 physical backend to see performance >>> and >>> we suspect we've run out of IOPS since we're reaching the maximum >>> specified >>> by the manufacturer, probably at certain times the host cannot perform a >>> storage operation within some time limit and it marks VMs as >>> unresponsive. >>> That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When >>> we >>> run out of space on ds3 we'll create more smaller volumes to keep >>> migrating. >>> >>> On the host side, when this happens, we've run repoplot on the vdsm log >>> and >>> I'm attaching the result. Clearly there's a *huge* LVM response time (~30 >>> secs.). >> >> Indeed the log show very slow vgck and vgs commands - these are called >> every >> 5 minutes for checking the vg health and refreshing vdsm lvm cache. >> >> 1. starting vgck >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices >> { pre >> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter = [ >> '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ >> '', '\''r|.*|'\'' ] } global { locking_type=1 >> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { >> retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 >> 9c-8eee-10368647c413 (cwd None) >> >> 2. vgck ends after 55 seconds >> >> Thread-96::DEBUG::2016-04-29 >> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: = ' >> WARNING: lvmetad is running but disabled. Restart lvmetad before >> enabling it!\n'; = 0 >> >> 3. starting vgs >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices >> { pref >> erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter = [ >> '\''a|/dev/mapper/36000eb3a4f1acbc20043|/de >> >> v/mapper/36000eb3a4f1acbc200b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', >> '\''r|.*|'\'' ] } global { >> locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' >> --noheadings --units b --nosuffix --separator '| >> ' --ignoreskippedcluster -o >> >> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >> 5de4a000-a9c4-489c-8eee-10368 >> 647c413
Re: [ovirt-users] VMs becoming non-responsive sporadically
On Sat, Apr 30, 2016 at 7:16 PM,wrote: > El 2016-04-30 16:55, Nir Soffer escribió: >> >> On Sat, Apr 30, 2016 at 11:33 AM, Nicolás wrote: >>> >>> Hi Nir, >>> >>> El 29/04/16 a las 22:34, Nir Soffer escribió: On Fri, Apr 29, 2016 at 9:17 PM, wrote: > > > Hi, > > We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues > with > some VMs being paused because they're marked as non-responsive. Mostly, > after a few seconds they recover, but we want to debug precisely this > problem so we can fix it consistently. > > Our scenario is the following: > > ~495 VMs, of which ~120 are constantly up > 3 datastores, all of them iSCSI-based: >* ds1: 2T, currently has 276 disks >* ds2: 2T, currently has 179 disks >* ds3: 500G, currently has 65 disks > 7 hosts: All have mostly the same hardware. CPU and memory are > currently > very lowly used (< 10%). > >ds1 and ds2 are physically the same backend which exports two 2TB > volumes. > ds3 is a different storage backend where we're currently migrating some > disks from ds1 and ds2. What the the storage backend behind ds1 and 2? >>> >>> >>> >>> The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 >>> G2. >>> > Usually, when VMs become unresponsive, the whole host where they run > gets > unresponsive too, so that gives a hint about the problem, my bet is the > culprit is somewhere on the host side and not on the VMs side. Probably the vm became unresponsive because connection to the host was lost. >>> >>> >>> >>> I forgot to mention that less commonly we have situations where the host >>> doesn't get unresponsive but the VMs on it do and they don't become >>> responsive ever again, so we have to forcibly power them off and start >>> them >>> on a different host. But in this case the connection with the host >>> doesn't >>> ever get lost (so basically the host is Up, but any VM run on them is >>> unresponsive). >>> >>> > When that > happens, the host itself gets non-responsive and only recoverable after > reboot, since it's unable to reconnect. Piotr, can you check engine log and explain why host is not reconnected? > I must say this is not specific to > this oVirt version, when we were using v.3.6.4 the same happened, and > it's > also worthy mentioning we've not done any configuration changes and > everything had been working quite well for a long time. > > We were monitoring our ds1 and ds2 physical backend to see performance > and > we suspect we've run out of IOPS since we're reaching the maximum > specified > by the manufacturer, probably at certain times the host cannot perform > a > storage operation within some time limit and it marks VMs as > unresponsive. > That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. > When > we > run out of space on ds3 we'll create more smaller volumes to keep > migrating. > > On the host side, when this happens, we've run repoplot on the vdsm log > and > I'm attaching the result. Clearly there's a *huge* LVM response time > (~30 > secs.). Indeed the log show very slow vgck and vgs commands - these are called every 5 minutes for checking the vg health and refreshing vdsm lvm cache. 1. starting vgck Thread-96::DEBUG::2016-04-29 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices { pre ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ '', '\''r|.*|'\'' ] } global { locking_type=1 prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 9c-8eee-10368647c413 (cwd None) 2. vgck ends after 55 seconds Thread-96::DEBUG::2016-04-29 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: = ' WARNING: lvmetad is running but disabled. Restart lvmetad before enabling it!\n'; = 0 3. starting vgs Thread-96::DEBUG::2016-04-29 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices { pref erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 write_cache_state=0 disable_after_error_count=3 filter = [ '\''a|/dev/mapper/36000eb3a4f1acbc20043|/de
[ovirt-users] hosted-engine setup Gluster fails to execute
I'm attempting to host the engine within a GlusterFS Replica 3 storage volume. During setup, after entering the server and volume, I'm receiving the message that '/sbin/gluster' failed to execute. Reviewing the gluster cmd log, it looks as though /sbin/gluster does execute. I can successfully mount the volume on the host outside of the hosted-engine setup. Any assistance would be appreciated. ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users
Re: [ovirt-users] VMs becoming non-responsive sporadically
On Sat, Apr 30, 2016 at 11:33 AM, Nicoláswrote: > Hi Nir, > > El 29/04/16 a las 22:34, Nir Soffer escribió: >> >> On Fri, Apr 29, 2016 at 9:17 PM, wrote: >>> >>> Hi, >>> >>> We're running oVirt 3.6.5.3-1 and lately we're experiencing some issues >>> with >>> some VMs being paused because they're marked as non-responsive. Mostly, >>> after a few seconds they recover, but we want to debug precisely this >>> problem so we can fix it consistently. >>> >>> Our scenario is the following: >>> >>> ~495 VMs, of which ~120 are constantly up >>> 3 datastores, all of them iSCSI-based: >>>* ds1: 2T, currently has 276 disks >>>* ds2: 2T, currently has 179 disks >>>* ds3: 500G, currently has 65 disks >>> 7 hosts: All have mostly the same hardware. CPU and memory are currently >>> very lowly used (< 10%). >>> >>>ds1 and ds2 are physically the same backend which exports two 2TB >>> volumes. >>> ds3 is a different storage backend where we're currently migrating some >>> disks from ds1 and ds2. >> >> What the the storage backend behind ds1 and 2? > > > The storage backend for ds1 and ds2 is the iSCSI-based HP LeftHand P4000 G2. > >>> Usually, when VMs become unresponsive, the whole host where they run gets >>> unresponsive too, so that gives a hint about the problem, my bet is the >>> culprit is somewhere on the host side and not on the VMs side. >> >> Probably the vm became unresponsive because connection to the host was >> lost. > > > I forgot to mention that less commonly we have situations where the host > doesn't get unresponsive but the VMs on it do and they don't become > responsive ever again, so we have to forcibly power them off and start them > on a different host. But in this case the connection with the host doesn't > ever get lost (so basically the host is Up, but any VM run on them is > unresponsive). > > >>> When that >>> happens, the host itself gets non-responsive and only recoverable after >>> reboot, since it's unable to reconnect. >> >> Piotr, can you check engine log and explain why host is not reconnected? >> >>> I must say this is not specific to >>> this oVirt version, when we were using v.3.6.4 the same happened, and >>> it's >>> also worthy mentioning we've not done any configuration changes and >>> everything had been working quite well for a long time. >>> >>> We were monitoring our ds1 and ds2 physical backend to see performance >>> and >>> we suspect we've run out of IOPS since we're reaching the maximum >>> specified >>> by the manufacturer, probably at certain times the host cannot perform a >>> storage operation within some time limit and it marks VMs as >>> unresponsive. >>> That's why we've set up ds3 and we're migrating ds1 and ds2 to ds3. When >>> we >>> run out of space on ds3 we'll create more smaller volumes to keep >>> migrating. >>> >>> On the host side, when this happens, we've run repoplot on the vdsm log >>> and >>> I'm attaching the result. Clearly there's a *huge* LVM response time (~30 >>> secs.). >> >> Indeed the log show very slow vgck and vgs commands - these are called >> every >> 5 minutes for checking the vg health and refreshing vdsm lvm cache. >> >> 1. starting vgck >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:48,682::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgck --config ' devices >> { pre >> ferred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter = [ >> '\''a|/dev/mapper/36000eb3a4f1acbc20043|'\ >> '', '\''r|.*|'\'' ] } global { locking_type=1 >> prioritise_write_locks=1 wait_for_locks=1 use_lvmetad=0 } backup { >> retain_min = 50 retain_days = 0 } ' 5de4a000-a9c4-48 >> 9c-8eee-10368647c413 (cwd None) >> >> 2. vgck ends after 55 seconds >> >> Thread-96::DEBUG::2016-04-29 >> 13:18:43,173::lvm::290::Storage.Misc.excCmd::(cmd) SUCCESS: = ' >> WARNING: lvmetad is running but disabled. Restart lvmetad before >> enabling it!\n'; = 0 >> >> 3. starting vgs >> >> Thread-96::DEBUG::2016-04-29 >> 13:17:11,963::lvm::290::Storage.Misc.excCmd::(cmd) /usr/bin/taskset >> --cpu-list 0-23 /usr/bin/sudo -n /usr/sbin/lvm vgs --config ' devices >> { pref >> erred_names = ["^/dev/mapper/"] ignore_suspended_devices=1 >> write_cache_state=0 disable_after_error_count=3 filter = [ >> '\''a|/dev/mapper/36000eb3a4f1acbc20043|/de >> >> v/mapper/36000eb3a4f1acbc200b9|/dev/mapper/360014056f0dc8930d744f83af8ddc709|/dev/mapper/WDC_WD5003ABYZ-011FA0_WD-WMAYP0J73DU6|'\'', >> '\''r|.*|'\'' ] } global { >> locking_type=1 prioritise_write_locks=1 wait_for_locks=1 >> use_lvmetad=0 } backup { retain_min = 50 retain_days = 0 } ' >> --noheadings --units b --nosuffix --separator '| >> ' --ignoreskippedcluster -o >> >> uuid,name,attr,size,free,extent_size,extent_count,free_count,tags,vg_mda_size,vg_mda_free,lv_count,pv_count,pv_name >> 5de4a000-a9c4-489c-8eee-10368 >> 647c413
Re: [ovirt-users] CINLUG: Virtualization Management, The oVirt Way
Il 29/Apr/2016 21:13, "Brian Proffitt"ha scritto: > > The world of virtualization seems to be getting passed by with all of the advances in containers and container management technology. But don't count virtual machines out just yet. Large-scale, centralized management for server and desktop virtual machines is available now, with the free and open source software platform oVirt. This KVM-based management tool provides production-ready VM management to organizations large and small, and is used by universities, businesses, and even major airports. Join Red Hat's Brian Proffitt on a tour of oVirt plus a fun look at how VM management and cloud computing *do* work together. > > > http://www.meetup.com/CINLUG/events/230746101/ > > Interesting... is it possible to record the event ? Gianluca ___ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users