[ovirt-users] Re: Node upgrade to 4.4
Vincent, This document will be useful https://www.ovirt.org/documentation/upgrade_guide/#Upgrading_the_Manager_to_4-4_4-3_SHE On Wed, Sep 23, 2020, 3:55 AM Vincent Royer wrote: > I have 3 nodes running node ng 4.3.9 with a gluster/hci cluster. How do I > upgrade to 4.4? Is there a guide? > ___ > Users mailing list -- users@ovirt.org > To unsubscribe send an email to users-le...@ovirt.org > Privacy Statement: https://www.ovirt.org/privacy-policy.html > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/TCX2RUE5RN7RNB45UWBXZ4SKH6KT7ZFC/ > ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/ONMS74F4DSDLNLM2PSGIBARYOBOUCQOZ/
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
eMail client with this forum is a bit .. I was told this web interface I could post images... as embedded ones in email get scraped out... but not seeing how that is done. Seems to be txt only. 1) ..."I would give the engine a 'Windows'-style fix (a.k.a. reboot)" how does one restart just the oVirt-engine? 2) I now show in shell 3 nodes, each with the one brick for data, vmstore, engine (and an ISO one I am trying to make).. with one brick each and all online and replicating. But the GUI shows thor (first server running engine) offline needing to be reloaded. Now volumes show two bricks.. one online one offline. And no option to start / force restart. 3) I have tried several times to try a graceful reboot to see if startup sequence was issue. I tore down VLANs and bridges to make it flat 1 x 1Gb mgmt, 1 x 10Gb storage. SSH between nodes is fine... copy test was great. I don't think it is nodes. 4) To the question of "did I add third node later." I would attach deployment guide I am building ... but can't do that in this forum. but this is as simple as I can make it. 3 intel generic servers, 1 x boot drive , 1 x 512GB SSD, 2 x 1TB SSD in each. wipe all data all configuration fresh Centos8 minimal install.. setup SSH setup basic networking... install cockpit.. run HCI wizard for all three nodes. That is all. Trying to learn and support concept of oVirt as a viable platform but still trying to work through learning how to root cause, kick tires, and debug / recover when things go down .. as they will. Help is appreciated. The main concern I have is gap in what engine sees and what CLI shows. Can someone show me where to get logs? the GUI log when I try to "activate" thor server "Status of host thor was set to NonOperational." "Gluster command [] failed on server ." is very unhelpful. ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/LKD7LJMC4X3LG5SEZ2M64YN5UKX36RAS/
[ovirt-users] Node upgrade to 4.4
I have 3 nodes running node ng 4.3.9 with a gluster/hci cluster. How do I upgrade to 4.4? Is there a guide? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TCX2RUE5RN7RNB45UWBXZ4SKH6KT7ZFC/
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
On Tue, Sep 22, 2020 at 11:23 PM Strahil Nikolov wrote: > > In my setup , I got no filter at all (yet, I'm on 4.3.10): > [root@ovirt ~]# lvmconfig | grep -i filter We create lvm filter automatically since 4.4.1. If you don't use block storage (FC, iSCSI) you don't need lvm filter. If you do, you can create it manually using vdsm-tool. > [root@ovirt ~]# > > P.S.: Don't forget to 'dracut -f' due to the fact that the initramfs has a > local copy of the lvm.conf Good point > > > Best Regards, > Strahil Nikolov > > > > > В вторник, 22 септември 2020 г., 23:05:29 Гринуич+3, Jeremey Wise > написа: > > > > > > > > Correct.. on wwid > > > I do want to make clear here. that to geta around the error you must ADD > (not remove ) drives to /etc/lvm/lvm.conf so oVirt Gluster can complete > setup of drives. > > [root@thor log]# cat /etc/lvm/lvm.conf |grep filter > # Broken for gluster in oVirt > #filter = > ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", > "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", > "r|.*|"] > # working for gluster wizard in oVirt > filter = > ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", > "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", > "a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"] > > > > On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov wrote: > > Obtaining the wwid is not exactly correct. > > You can identify them via: > > > > multipath -v4 | grep 'got wwid of' > > > > Short example: > > [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of' > > Sep 22 22:55:58 | nvme0n1: got wwid of > > 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001' > > Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S' > > Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7' > > Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189' > > Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133' > > > > Of course if you are planing to use only gluster it could be far easier to > > set: > > > > [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf > > blacklist { > > devnode "*" > > } > > > > > > > > Best Regards, > > Strahil Nikolov > > > > В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer > > написа: > > > > > > > > > > > > On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise wrote: > >> > >> > >> Agree about an NVMe Card being put under mpath control. > > > > NVMe can be used via multipath, this is a new feature added in RHEL 8.1: > > https://bugzilla.redhat.com/1498546 > > > > Of course when the NVMe device is local there is no point to use it > > via multipath. > > To avoid this, you need to blacklist the devices like this: > > > > 1. Find the device wwid > > > > For NVMe, you need the device ID_WWN: > > > > $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN > > ID_WWN=eui.5cd2e42a81a11f69 > > > > 2. Add local blacklist file: > > > > $ mkdir /etc/multipath/conf.d > > $ cat /etc/multipath/conf.d/local.conf > > blacklist { > > wwid "eui.5cd2e42a81a11f69" > > } > > > > 3. Reconfigure multipath > > > > $ multipathd reconfigure > > > > Gluster should do this for you automatically during installation, but > > it does not > > you can do this manually. > > > >> I have not even gotten to that volume / issue. My guess is something > >> weird in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block > >> devices. > >> > >> I will post once I cross bridge of getting standard SSD volumes working > >> > >> On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov > >> wrote: > >>> > >>> Why is your NVME under multipath ? That doesn't make sense at all . > >>> I have modified my multipath.conf to block all local disks . Also ,don't > >>> forget the '# VDSM PRIVATE' line somewhere in the top of the file. > >>> > >>> Best Regards, > >>> Strahil Nikolov > >>> > >>> > >>> > >>> > >>> > >>> > >>> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise > >>> написа: > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> vdo: ERROR - Device /dev/sdc excluded by a filter > >>> > >>> > >>> > >>> > >>> Other server > >>> vdo: ERROR - Device > >>> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 > >>> excluded by a filter. > >>> > >>> > >>> All systems when I go to create VDO volume on blank drives.. I get this > >>> filter error. All disk outside of the HCI wizard setup are now blocked > >>> from creating new Gluster volume group. > >>> > >>> Here is what I see in /dev/lvm/lvm.conf |grep filter > >>> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter > >>> filter = > >>> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", > >>> > >>> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", > >>> "r|.*|"] > >>> > >>>
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
In my setup , I got no filter at all (yet, I'm on 4.3.10): [root@ovirt ~]# lvmconfig | grep -i filter [root@ovirt ~]# P.S.: Don't forget to 'dracut -f' due to the fact that the initramfs has a local copy of the lvm.conf Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 23:05:29 Гринуич+3, Jeremey Wise написа: Correct.. on wwid I do want to make clear here. that to geta around the error you must ADD (not remove ) drives to /etc/lvm/lvm.conf so oVirt Gluster can complete setup of drives. [root@thor log]# cat /etc/lvm/lvm.conf |grep filter # Broken for gluster in oVirt #filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", "r|.*|"] # working for gluster wizard in oVirt filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", "a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"] On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov wrote: > Obtaining the wwid is not exactly correct. > You can identify them via: > > multipath -v4 | grep 'got wwid of' > > Short example: > [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of' > Sep 22 22:55:58 | nvme0n1: got wwid of > 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001' > Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S' > Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7' > Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189' > Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133' > > Of course if you are planing to use only gluster it could be far easier to > set: > > [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf > blacklist { > devnode "*" > } > > > > Best Regards, > Strahil Nikolov > > В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer > написа: > > > > > > On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise wrote: >> >> >> Agree about an NVMe Card being put under mpath control. > > NVMe can be used via multipath, this is a new feature added in RHEL 8.1: > https://bugzilla.redhat.com/1498546 > > Of course when the NVMe device is local there is no point to use it > via multipath. > To avoid this, you need to blacklist the devices like this: > > 1. Find the device wwid > > For NVMe, you need the device ID_WWN: > > $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN > ID_WWN=eui.5cd2e42a81a11f69 > > 2. Add local blacklist file: > > $ mkdir /etc/multipath/conf.d > $ cat /etc/multipath/conf.d/local.conf > blacklist { > wwid "eui.5cd2e42a81a11f69" > } > > 3. Reconfigure multipath > > $ multipathd reconfigure > > Gluster should do this for you automatically during installation, but > it does not > you can do this manually. > >> I have not even gotten to that volume / issue. My guess is something weird >> in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block devices. >> >> I will post once I cross bridge of getting standard SSD volumes working >> >> On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov >> wrote: >>> >>> Why is your NVME under multipath ? That doesn't make sense at all . >>> I have modified my multipath.conf to block all local disks . Also ,don't >>> forget the '# VDSM PRIVATE' line somewhere in the top of the file. >>> >>> Best Regards, >>> Strahil Nikolov >>> >>> >>> >>> >>> >>> >>> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise >>> написа: >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> vdo: ERROR - Device /dev/sdc excluded by a filter >>> >>> >>> >>> >>> Other server >>> vdo: ERROR - Device >>> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 >>> excluded by a filter. >>> >>> >>> All systems when I go to create VDO volume on blank drives.. I get this >>> filter error. All disk outside of the HCI wizard setup are now blocked >>> from creating new Gluster volume group. >>> >>> Here is what I see in /dev/lvm/lvm.conf |grep filter >>> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter >>> filter = >>> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", >>> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", >>> "r|.*|"] >>> >>> [root@odin ~]# ls -al /dev/disk/by-id/ >>> total 0 >>> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 . >>> drwxr-xr-x. 6 root root 120 Sep 18 14:32 .. >>> lrwxrwxrwx. 1 root root 9 Sep 18 22:40 >>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda >>> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 >>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1 >>> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 >>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2 >>> lrwxrwxrwx. 1 root root 9 Sep 18 14:32 >>>
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
On Tue, Sep 22, 2020 at 11:05 PM Jeremey Wise wrote: > > > > Correct.. on wwid > > > I do want to make clear here. that to geta around the error you must ADD > (not remove ) drives to /etc/lvm/lvm.conf so oVirt Gluster can complete > setup of drives. > > [root@thor log]# cat /etc/lvm/lvm.conf |grep filter > # Broken for gluster in oVirt > #filter = > ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", > "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", > "r|.*|"] > # working for gluster wizard in oVirt > filter = > ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", > "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", > "a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"] Yes, you need to add the devices gluster is going to use to the filter. The easiest way it to remove the filter before you install gluster, and then created the filter using vdsm-tool config-lvm-filter It should add all the devices needed for the mounted logical volumes automatically. Please file a bug if it does not do this. > On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov wrote: >> >> Obtaining the wwid is not exactly correct. >> You can identify them via: >> >> multipath -v4 | grep 'got wwid of' >> >> Short example: >> [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of' >> Sep 22 22:55:58 | nvme0n1: got wwid of >> 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001' >> Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S' >> Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7' >> Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189' >> Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133' >> >> Of course if you are planing to use only gluster it could be far easier to >> set: >> >> [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf >> blacklist { >> devnode "*" >> } >> >> >> >> Best Regards, >> Strahil Nikolov >> >> В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer >> написа: >> >> >> >> >> >> On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise wrote: >> > >> > >> > Agree about an NVMe Card being put under mpath control. >> >> NVMe can be used via multipath, this is a new feature added in RHEL 8.1: >> https://bugzilla.redhat.com/1498546 >> >> Of course when the NVMe device is local there is no point to use it >> via multipath. >> To avoid this, you need to blacklist the devices like this: >> >> 1. Find the device wwid >> >> For NVMe, you need the device ID_WWN: >> >> $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN >> ID_WWN=eui.5cd2e42a81a11f69 >> >> 2. Add local blacklist file: >> >> $ mkdir /etc/multipath/conf.d >> $ cat /etc/multipath/conf.d/local.conf >> blacklist { >> wwid "eui.5cd2e42a81a11f69" >> } >> >> 3. Reconfigure multipath >> >> $ multipathd reconfigure >> >> Gluster should do this for you automatically during installation, but >> it does not >> you can do this manually. >> >> > I have not even gotten to that volume / issue. My guess is something >> > weird in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block >> > devices. >> > >> > I will post once I cross bridge of getting standard SSD volumes working >> > >> > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov >> > wrote: >> >> >> >> Why is your NVME under multipath ? That doesn't make sense at all . >> >> I have modified my multipath.conf to block all local disks . Also ,don't >> >> forget the '# VDSM PRIVATE' line somewhere in the top of the file. >> >> >> >> Best Regards, >> >> Strahil Nikolov >> >> >> >> >> >> >> >> >> >> >> >> >> >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise >> >> написа: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> vdo: ERROR - Device /dev/sdc excluded by a filter >> >> >> >> >> >> >> >> >> >> Other server >> >> vdo: ERROR - Device >> >> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 >> >> excluded by a filter. >> >> >> >> >> >> All systems when I go to create VDO volume on blank drives.. I get this >> >> filter error. All disk outside of the HCI wizard setup are now blocked >> >> from creating new Gluster volume group. >> >> >> >> Here is what I see in /dev/lvm/lvm.conf |grep filter >> >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter >> >> filter = >> >> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", >> >> >> >> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", >> >> "r|.*|"] >> >> >> >> [root@odin ~]# ls -al /dev/disk/by-id/ >> >> total 0 >> >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 . >> >> drwxr-xr-x. 6 root root 120 Sep 18 14:32 .. >> >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 >> >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda >> >> lrwxrwxrwx. 1 root root 10
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
On Tue, Sep 22, 2020 at 10:57 PM Strahil Nikolov wrote: > > Obtaining the wwid is not exactly correct. It is correct - for nvme devices, see: https://github.com/oVirt/vdsm/blob/353e7b1e322aa02d4767b6617ed094be0643b094/lib/vdsm/storage/lvmfilter.py#L300 This matches the way that multipath lookup devices wwids. > You can identify them via: > > multipath -v4 | grep 'got wwid of' > > Short example: > [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of' > Sep 22 22:55:58 | nvme0n1: got wwid of > 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001' > Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S' > Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7' > Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189' > Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133' There are 2 issues with this: - It detects and setup maps for all devices in the system, unwanted when you want to blacklist devices - It depends on debug output that may change, not on public documented API You can use these commands: Show devices that multipath does not use yet without setting up maps: $ sudo multipath -d Show devices that multipath is already using: $ sudo multipath -ll But I'm not sure if these commands work if dm_multipath kernel module is not loaded or multiapthd is not running. Getting the device wwid using udevadm works regardless of multipathd/dm_multipath module. > Of course if you are planing to use only gluster it could be far easier to > set: > > [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf > blacklist { > devnode "*" > } > > > > Best Regards, > Strahil Nikolov > > В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer > написа: > > > > > > On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise wrote: > > > > > > Agree about an NVMe Card being put under mpath control. > > NVMe can be used via multipath, this is a new feature added in RHEL 8.1: > https://bugzilla.redhat.com/1498546 > > Of course when the NVMe device is local there is no point to use it > via multipath. > To avoid this, you need to blacklist the devices like this: > > 1. Find the device wwid > > For NVMe, you need the device ID_WWN: > > $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN > ID_WWN=eui.5cd2e42a81a11f69 > > 2. Add local blacklist file: > > $ mkdir /etc/multipath/conf.d > $ cat /etc/multipath/conf.d/local.conf > blacklist { > wwid "eui.5cd2e42a81a11f69" > } > > 3. Reconfigure multipath > > $ multipathd reconfigure > > Gluster should do this for you automatically during installation, but > it does not > you can do this manually. > > > I have not even gotten to that volume / issue. My guess is something weird > > in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block devices. > > > > I will post once I cross bridge of getting standard SSD volumes working > > > > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov > > wrote: > >> > >> Why is your NVME under multipath ? That doesn't make sense at all . > >> I have modified my multipath.conf to block all local disks . Also ,don't > >> forget the '# VDSM PRIVATE' line somewhere in the top of the file. > >> > >> Best Regards, > >> Strahil Nikolov > >> > >> > >> > >> > >> > >> > >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise > >> написа: > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> vdo: ERROR - Device /dev/sdc excluded by a filter > >> > >> > >> > >> > >> Other server > >> vdo: ERROR - Device > >> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 > >> excluded by a filter. > >> > >> > >> All systems when I go to create VDO volume on blank drives.. I get this > >> filter error. All disk outside of the HCI wizard setup are now blocked > >> from creating new Gluster volume group. > >> > >> Here is what I see in /dev/lvm/lvm.conf |grep filter > >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter > >> filter = > >> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", > >> > >> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", > >> "r|.*|"] > >> > >> [root@odin ~]# ls -al /dev/disk/by-id/ > >> total 0 > >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 . > >> drwxr-xr-x. 6 root root 120 Sep 18 14:32 .. > >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 > >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda > >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 > >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1 > >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 > >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2 > >> lrwxrwxrwx. 1 root root9 Sep 18 14:32 > >> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb > >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 > >> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc > >> lrwxrwxrwx. 1 root root 10
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
Correct.. on wwid I do want to make clear here. that to geta around the error you must ADD (not remove ) drives to /etc/lvm/lvm.conf so oVirt Gluster can complete setup of drives. [root@thor log]# cat /etc/lvm/lvm.conf |grep filter # Broken for gluster in oVirt #filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", "r|.*|"] # working for gluster wizard in oVirt filter = ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", "a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"] On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov wrote: > Obtaining the wwid is not exactly correct. > You can identify them via: > > multipath -v4 | grep 'got wwid of' > > Short example: > [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of' > Sep 22 22:55:58 | nvme0n1: got wwid of > 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001' > Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S' > Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7' > Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189' > Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133' > > Of course if you are planing to use only gluster it could be far easier to > set: > > [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf > blacklist { > devnode "*" > } > > > > Best Regards, > Strahil Nikolov > > В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer < > nsof...@redhat.com> написа: > > > > > > On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise > wrote: > > > > > > Agree about an NVMe Card being put under mpath control. > > NVMe can be used via multipath, this is a new feature added in RHEL 8.1: > https://bugzilla.redhat.com/1498546 > > Of course when the NVMe device is local there is no point to use it > via multipath. > To avoid this, you need to blacklist the devices like this: > > 1. Find the device wwid > > For NVMe, you need the device ID_WWN: > > $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN > ID_WWN=eui.5cd2e42a81a11f69 > > 2. Add local blacklist file: > > $ mkdir /etc/multipath/conf.d > $ cat /etc/multipath/conf.d/local.conf > blacklist { > wwid "eui.5cd2e42a81a11f69" > } > > 3. Reconfigure multipath > > $ multipathd reconfigure > > Gluster should do this for you automatically during installation, but > it does not > you can do this manually. > > > I have not even gotten to that volume / issue. My guess is something > weird in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block > devices. > > > > I will post once I cross bridge of getting standard SSD volumes working > > > > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov > wrote: > >> > >> Why is your NVME under multipath ? That doesn't make sense at all . > >> I have modified my multipath.conf to block all local disks . Also > ,don't forget the '# VDSM PRIVATE' line somewhere in the top of the file. > >> > >> Best Regards, > >> Strahil Nikolov > >> > >> > >> > >> > >> > >> > >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise < > jeremey.w...@gmail.com> написа: > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> vdo: ERROR - Device /dev/sdc excluded by a filter > >> > >> > >> > >> > >> Other server > >> vdo: ERROR - Device > /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 > excluded by a filter. > >> > >> > >> All systems when I go to create VDO volume on blank drives.. I get this > filter error. All disk outside of the HCI wizard setup are now blocked > from creating new Gluster volume group. > >> > >> Here is what I see in /dev/lvm/lvm.conf |grep filter > >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter > >> filter = > ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", > "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", > "r|.*|"] > >> > >> [root@odin ~]# ls -al /dev/disk/by-id/ > >> total 0 > >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 . > >> drwxr-xr-x. 6 root root 120 Sep 18 14:32 .. > >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 > ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda > >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 > ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1 > >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 > ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2 > >> lrwxrwxrwx. 1 root root9 Sep 18 14:32 > ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb > >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 > ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc > >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-home -> ../../dm-2 > >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-root -> ../../dm-0 > >> lrwxrwxrwx. 1 root root 10
[ovirt-users] Re: console breaks with signed SSL certs
Hmm. that seems to be half the battle. I updated the filels in /etc/pki/vdsm/libvirt-spice, and the debug output from remote-viewer changes.. but its not entirely happy. (remote-viewer.exe:15808): Spice-WARNING **: 12:55:01.188: ../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in certificate chain verification: unable to get issuer certificate (num=2:depth1:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2) (remote-viewer.exe:15808): GSpice-WARNING **: 12:55:01.189: main-1:0: SSL_connect: error:0001:lib(0):func(0):reason(1) (remote-viewer.exe:15808): virt-viewer-DEBUG: 12:55:01.192: Destroy SPICE channel SpiceMainChannel 0 (remote-viewer.exe:15808): virt-viewer-DEBUG: 12:55:01.192: zap main channel I put the cert itself, in server-cert.pem I put the key in server-key.pem I put the bundle file from godaddy, which they call "gd_bundle-g2-g1", in "ca-cert.pem" but its still complaining about error in chain? Ive been updating a whoole bunch of SSL-requiring systems this month, and notice that one or two systems like a different order to the multiple-cert-CA stack. Does libvirt-spice require yet another, different stacking? Can you tell me what needs to be in each, and in what order, please? :-/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/HRYDOGPSHT6XDUTRPOCTTS76VAKEMBU2/
[ovirt-users] Re: console breaks with signed SSL certs
Most probably there is an option to tell it (I mean oVIrt) the exact keys to be used. Yet, give the engine a gentle push and reboot it - just to be sure you are not chasing a ghost. I'm using self-signed certs and I can't help much in this case. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 22:54:28 Гринуич+3, Philip Brown написа: Thanks for the initial start, Strahil, my desktop is windows. but I took apart the console.vv file, and these are my findings: in the console.vv file, there is a valid CA cert, which is for the signing CA for our valid wildcard SSL cert. However, when I connected to the target host, on the tls-port, i noted that it is still using the original self-signed CA, generated by ovirt-engine for the host. Digging with lsof says that the process is qemu-kvm Looking at command line, that has x509-dir=/etc/pki/vdsm/libvirt-spice So... I guess I need to update server.key server.cert and ca-cert in there? except there's a whoole lot of '*key.pem' files under the /etc/pki directory tree. Suggestions on which is best to update? For example, there is also /etc/pki/vdsm/keys/vdsmkey.pem - Original Message - From: "Strahil Nikolov" To: "users" , "Philip Brown" Sent: Tuesday, September 22, 2020 12:09:55 PM Subject: Re: [ovirt-users] Re: console breaks with signed SSL certs I assume you are working on linux (for windows you will need to ssh to a linux box or even one ofthe Hosts). When you download the 'console.vv' file for Spice connection - you will have to note several stuff: - host - tls-port (not the plain 'port=' !!! ) - ca Process the CA and replace the '\n' with new lines . Then you can run: openssl s_client -connect : -CAfile -showcerts Then you can inspect the certificate chain. I would then grep for the strings from openssl in the engine. In my case I find these containing the line with the 'issuer': /etc/pki/ovirt-engine/certs/websocket-proxy.cer /etc/pki/ovirt-engine/certs/apache.cer /etc/pki/ovirt-engine/certs/reports.cer /etc/pki/ovirt-engine/certs/imageio-proxy.cer /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer Happy Hunting! Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 21:52:10 Гринуич+3, Philip Brown написа: More detail on the problem. after starting remote-viewer --debug, I get (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: New spice channel 0608B240 SpiceMainChannel 0 (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: notebook show status 03479130 (remote-viewer.exe:18308): Spice-WARNING **: 11:45:30.691: ../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in certificate chain verification: self signed certificate in certificate chain (num=19:depth1:/C=US/O=xx.65101) (remote-viewer.exe:18308): GSpice-WARNING **: 11:45:30.692: main-1:0: SSL_connect: error:0001:lib(0):func(0):reason(1) (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.693: Destroy SPICE channel SpiceMainChannel 0 So it seems like there's some additional thing that needs telling to use the official signed cert. Any clues for me please? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKSX7CLJ4N7PNCDE5IQ73BIVPAXS7RSF/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/545XR3UZJ3U4H5BKZ4A5PRQEUGWICYQY/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/6GMZNLDLTAZKL5B2AJUOE5KQRGWNNML5/
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
Obtaining the wwid is not exactly correct. You can identify them via: multipath -v4 | grep 'got wwid of' Short example: [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of' Sep 22 22:55:58 | nvme0n1: got wwid of 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001' Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S' Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7' Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189' Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133' Of course if you are planing to use only gluster it could be far easier to set: [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf blacklist { devnode "*" } Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer написа: On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise wrote: > > > Agree about an NVMe Card being put under mpath control. NVMe can be used via multipath, this is a new feature added in RHEL 8.1: https://bugzilla.redhat.com/1498546 Of course when the NVMe device is local there is no point to use it via multipath. To avoid this, you need to blacklist the devices like this: 1. Find the device wwid For NVMe, you need the device ID_WWN: $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN ID_WWN=eui.5cd2e42a81a11f69 2. Add local blacklist file: $ mkdir /etc/multipath/conf.d $ cat /etc/multipath/conf.d/local.conf blacklist { wwid "eui.5cd2e42a81a11f69" } 3. Reconfigure multipath $ multipathd reconfigure Gluster should do this for you automatically during installation, but it does not you can do this manually. > I have not even gotten to that volume / issue. My guess is something weird > in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block devices. > > I will post once I cross bridge of getting standard SSD volumes working > > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov wrote: >> >> Why is your NVME under multipath ? That doesn't make sense at all . >> I have modified my multipath.conf to block all local disks . Also ,don't >> forget the '# VDSM PRIVATE' line somewhere in the top of the file. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise >> написа: >> >> >> >> >> >> >> >> >> >> >> vdo: ERROR - Device /dev/sdc excluded by a filter >> >> >> >> >> Other server >> vdo: ERROR - Device >> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 >> excluded by a filter. >> >> >> All systems when I go to create VDO volume on blank drives.. I get this >> filter error. All disk outside of the HCI wizard setup are now blocked from >> creating new Gluster volume group. >> >> Here is what I see in /dev/lvm/lvm.conf |grep filter >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter >> filter = >> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", >> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", >> "r|.*|"] >> >> [root@odin ~]# ls -al /dev/disk/by-id/ >> total 0 >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 . >> drwxr-xr-x. 6 root root 120 Sep 18 14:32 .. >> lrwxrwxrwx. 1 root root 9 Sep 18 22:40 >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1 >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2 >> lrwxrwxrwx. 1 root root 9 Sep 18 14:32 >> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb >> lrwxrwxrwx. 1 root root 9 Sep 18 22:40 >> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-home -> ../../dm-2 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-root -> ../../dm-0 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-swap -> ../../dm-1 >> lrwxrwxrwx. 1 root root 11 Sep 18 16:40 >> dm-name-gluster_vg_sdb-gluster_lv_data -> ../../dm-11 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-name-gluster_vg_sdb-gluster_lv_engine -> ../../dm-6 >> lrwxrwxrwx. 1 root root 11 Sep 18 16:40 >> dm-name-gluster_vg_sdb-gluster_lv_vmstore -> ../../dm-12 >> lrwxrwxrwx. 1 root root 10 Sep 18 23:35 >> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001 >> -> ../../dm-3 >> lrwxrwxrwx. 1 root root 10 Sep 18 23:49 >> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 >> -> ../../dm-4 >> lrwxrwxrwx. 1 root root 10 Sep 18 14:32 dm-name-vdo_sdb -> ../../dm-5 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADc49gc6PWLRBCoJ2B3JC9tDJejyx5eDPT >> -> ../../dm-1 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >>
[ovirt-users] Re: console breaks with signed SSL certs
Thanks for the initial start, Strahil, my desktop is windows. but I took apart the console.vv file, and these are my findings: in the console.vv file, there is a valid CA cert, which is for the signing CA for our valid wildcard SSL cert. However, when I connected to the target host, on the tls-port, i noted that it is still using the original self-signed CA, generated by ovirt-engine for the host. Digging with lsof says that the process is qemu-kvm Looking at command line, that has x509-dir=/etc/pki/vdsm/libvirt-spice So... I guess I need to update server.key server.cert and ca-cert in there? except there's a whoole lot of '*key.pem' files under the /etc/pki directory tree. Suggestions on which is best to update? For example, there is also /etc/pki/vdsm/keys/vdsmkey.pem - Original Message - From: "Strahil Nikolov" To: "users" , "Philip Brown" Sent: Tuesday, September 22, 2020 12:09:55 PM Subject: Re: [ovirt-users] Re: console breaks with signed SSL certs I assume you are working on linux (for windows you will need to ssh to a linux box or even one ofthe Hosts). When you download the 'console.vv' file for Spice connection - you will have to note several stuff: - host - tls-port (not the plain 'port=' !!! ) - ca Process the CA and replace the '\n' with new lines . Then you can run: openssl s_client -connect : -CAfile -showcerts Then you can inspect the certificate chain. I would then grep for the strings from openssl in the engine. In my case I find these containing the line with the 'issuer': /etc/pki/ovirt-engine/certs/websocket-proxy.cer /etc/pki/ovirt-engine/certs/apache.cer /etc/pki/ovirt-engine/certs/reports.cer /etc/pki/ovirt-engine/certs/imageio-proxy.cer /etc/pki/ovirt-engine/certs/ovn-ndb.cer /etc/pki/ovirt-engine/certs/ovn-sdb.cer /etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer Happy Hunting! Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 21:52:10 Гринуич+3, Philip Brown написа: More detail on the problem. after starting remote-viewer --debug, I get (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: New spice channel 0608B240 SpiceMainChannel 0 (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: notebook show status 03479130 (remote-viewer.exe:18308): Spice-WARNING **: 11:45:30.691: ../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in certificate chain verification: self signed certificate in certificate chain (num=19:depth1:/C=US/O=xx.65101) (remote-viewer.exe:18308): GSpice-WARNING **: 11:45:30.692: main-1:0: SSL_connect: error:0001:lib(0):func(0):reason(1) (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.693: Destroy SPICE channel SpiceMainChannel 0 So it seems like there's some additional thing that needs telling to use the official signed cert. Any clues for me please? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKSX7CLJ4N7PNCDE5IQ73BIVPAXS7RSF/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/545XR3UZJ3U4H5BKZ4A5PRQEUGWICYQY/
[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter
On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise wrote: > > > Agree about an NVMe Card being put under mpath control. NVMe can be used via multipath, this is a new feature added in RHEL 8.1: https://bugzilla.redhat.com/1498546 Of course when the NVMe device is local there is no point to use it via multipath. To avoid this, you need to blacklist the devices like this: 1. Find the device wwid For NVMe, you need the device ID_WWN: $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN ID_WWN=eui.5cd2e42a81a11f69 2. Add local blacklist file: $ mkdir /etc/multipath/conf.d $ cat /etc/multipath/conf.d/local.conf blacklist { wwid "eui.5cd2e42a81a11f69" } 3. Reconfigure multipath $ multipathd reconfigure Gluster should do this for you automatically during installation, but it does not you can do this manually. > I have not even gotten to that volume / issue. My guess is something weird > in CentOS / 4.18.0-193.19.1.el8_2.x86_64 kernel with NVMe block devices. > > I will post once I cross bridge of getting standard SSD volumes working > > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov wrote: >> >> Why is your NVME under multipath ? That doesn't make sense at all . >> I have modified my multipath.conf to block all local disks . Also ,don't >> forget the '# VDSM PRIVATE' line somewhere in the top of the file. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise >> написа: >> >> >> >> >> >> >> >> >> >> >> vdo: ERROR - Device /dev/sdc excluded by a filter >> >> >> >> >> Other server >> vdo: ERROR - Device >> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 >> excluded by a filter. >> >> >> All systems when I go to create VDO volume on blank drives.. I get this >> filter error. All disk outside of the HCI wizard setup are now blocked from >> creating new Gluster volume group. >> >> Here is what I see in /dev/lvm/lvm.conf |grep filter >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter >> filter = >> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", >> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", >> "r|.*|"] >> >> [root@odin ~]# ls -al /dev/disk/by-id/ >> total 0 >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 . >> drwxr-xr-x. 6 root root 120 Sep 18 14:32 .. >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1 >> lrwxrwxrwx. 1 root root 10 Sep 18 22:40 >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2 >> lrwxrwxrwx. 1 root root9 Sep 18 14:32 >> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 >> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-home -> ../../dm-2 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-root -> ../../dm-0 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 dm-name-cl-swap -> ../../dm-1 >> lrwxrwxrwx. 1 root root 11 Sep 18 16:40 >> dm-name-gluster_vg_sdb-gluster_lv_data -> ../../dm-11 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-name-gluster_vg_sdb-gluster_lv_engine -> ../../dm-6 >> lrwxrwxrwx. 1 root root 11 Sep 18 16:40 >> dm-name-gluster_vg_sdb-gluster_lv_vmstore -> ../../dm-12 >> lrwxrwxrwx. 1 root root 10 Sep 18 23:35 >> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001 >> -> ../../dm-3 >> lrwxrwxrwx. 1 root root 10 Sep 18 23:49 >> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1 >> -> ../../dm-4 >> lrwxrwxrwx. 1 root root 10 Sep 18 14:32 dm-name-vdo_sdb -> ../../dm-5 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADc49gc6PWLRBCoJ2B3JC9tDJejyx5eDPT >> -> ../../dm-1 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADOMNJfgcat9ZLOpcNO7FyG8ixcl5s93TU >> -> ../../dm-2 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADzqPGk0yTQ19FIqgoAfsCxWg7cDMtl71r >> -> ../../dm-0 >> lrwxrwxrwx. 1 root root 10 Sep 18 16:40 >> dm-uuid-LVM-ikNfztYY7KGT1SI2WYXPz4DhM2cyTelOq6Om5comvRFWJDbtVZAKtE5YGl4jciP9 >> -> ../../dm-6 >> lrwxrwxrwx. 1 root root 11 Sep 18 16:40 >> dm-uuid-LVM-ikNfztYY7KGT1SI2WYXPz4DhM2cyTelOqVheASEgerWSEIkjM1BR3us3D9ekHt0L >> -> ../../dm-11 >> lrwxrwxrwx. 1 root root 11 Sep 18 16:40 >> dm-uuid-LVM-ikNfztYY7KGT1SI2WYXPz4DhM2cyTelOQz6vXuivIfup6cquKAjPof8wIGOSe4Vz >> -> ../../dm-12 >> lrwxrwxrwx. 1 root root 10 Sep 18 23:35 >> dm-uuid-mpath-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001 >> -> ../../dm-3 >> lrwxrwxrwx. 1 root root 10
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
Ovirt uses the "/rhev/mnt... mountpoints. Do you have those (for each storage domain ) ? Here is an example from one of my nodes: [root@ovirt1 ~]# df -hT | grep rhev gluster1:/engine fuse.glusterfs 100G 19G 82G 19% /rhev/data-center/mnt/glusterSD/gluster1:_engine gluster1:/fast4 fuse.glusterfs 100G 53G 48G 53% /rhev/data-center/mnt/glusterSD/gluster1:_fast4 gluster1:/fast1 fuse.glusterfs 100G 56G 45G 56% /rhev/data-center/mnt/glusterSD/gluster1:_fast1 gluster1:/fast2 fuse.glusterfs 100G 56G 45G 56% /rhev/data-center/mnt/glusterSD/gluster1:_fast2 gluster1:/fast3 fuse.glusterfs 100G 55G 46G 55% /rhev/data-center/mnt/glusterSD/gluster1:_fast3 gluster1:/data fuse.glusterfs 2.4T 535G 1.9T 23% /rhev/data-center/mnt/glusterSD/gluster1:_data Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 19:44:54 Гринуич+3, Jeremey Wise написа: Yes. And at one time it was fine. I did a graceful shutdown.. and after booting it always seems to now have issue with the one server... of course the one hosting the ovirt-engine :P # Three nodes in cluster # Error when you hover over node # when i select node and choose "activate" #Gluster is working fine... this is oVirt who is confused. [root@medusa vmstore]# mount |grep media/vmstore medusast.penguinpages.local:/vmstore on /media/vmstore type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev) [root@medusa vmstore]# echo > /media/vmstore/test.out [root@medusa vmstore]# ssh -f thor 'echo $HOSTNAME >> /media/vmstore/test.out' [root@medusa vmstore]# ssh -f odin 'echo $HOSTNAME >> /media/vmstore/test.out' [root@medusa vmstore]# ssh -f medusa 'echo $HOSTNAME >> /media/vmstore/test.out' [root@medusa vmstore]# cat /media/vmstore/test.out thor.penguinpages.local odin.penguinpages.local medusa.penguinpages.local Ideas to fix oVirt? On Tue, Sep 22, 2020 at 10:42 AM Strahil Nikolov wrote: > By the way, did you add the third host in the oVirt ? > > If not , maybe that is the real problem :) > > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise > написа: > > > > > > Its like oVirt thinks there are only two nodes in gluster replication > > > > > > # Yet it is clear the CLI shows three bricks. > [root@medusa vms]# gluster volume status vmstore > Status of volume: vmstore > Gluster process TCP Port RDMA Port Online Pid > -- > Brick thorst.penguinpages.local:/gluster_br > icks/vmstore/vmstore 49154 0 Y 9444 > Brick odinst.penguinpages.local:/gluster_br > icks/vmstore/vmstore 49154 0 Y 3269 > Brick medusast.penguinpages.local:/gluster_ > bricks/vmstore/vmstore 49154 0 Y 7841 > Self-heal Daemon on localhost N/A N/A Y 80152 > Self-heal Daemon on odinst.penguinpages.loc > al N/A N/A Y > 141750 > Self-heal Daemon on thorst.penguinpages.loc > al N/A N/A Y > 245870 > > Task Status of Volume vmstore > -- > There are no active volume tasks > > > > How do I get oVirt to re-establish reality to what Gluster sees? > > > > On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov wrote: >> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 >> bricks up , but usually it was an UI issue and you go to UI and mark a >> "force start" which will try to start any bricks that were down (won't >> affect gluster) and will wake up the UI task to verify again brick status. >> >> >> https://github.com/gluster/gstatus is a good one to verify your cluster >> health , yet human's touch is priceless in any kind of technology. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> >> >> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise >> написа: >> >> >> >> >> >> >> >> when I posted last.. in the tread I paste a roling restart. And... now >> it is replicating. >> >> oVirt still showing wrong. BUT.. I did my normal test from each of the >> three nodes. >> >> 1) Mount Gluster file system with localhost as primary and other two as >> tertiary to local mount (like a client would do) >> 2) run test file create Ex: echo $HOSTNAME >> /media/glustervolume/test.out >> 3) repeat from each node then read back that all are in sync. >> >> I REALLY hate reboot (restart) as a fix. I need to get better with root >>
[ovirt-users] Re: console breaks with signed SSL certs
More detail on the problem. after starting remote-viewer --debug, I get (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: New spice channel 0608B240 SpiceMainChannel 0 (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: notebook show status 03479130 (remote-viewer.exe:18308): Spice-WARNING **: 11:45:30.691: ../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in certificate chain verification: self signed certificate in certificate chain (num=19:depth1:/C=US/O=xx.65101) (remote-viewer.exe:18308): GSpice-WARNING **: 11:45:30.692: main-1:0: SSL_connect: error:0001:lib(0):func(0):reason(1) (remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.693: Destroy SPICE channel SpiceMainChannel 0 So it seems like there's some additional thing that needs telling to use the official signed cert. Any clues for me please? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKSX7CLJ4N7PNCDE5IQ73BIVPAXS7RSF/
[ovirt-users] Re: oVirt - KVM QCow2 Import
On Tue, Sep 22, 2020 at 4:18 AM Jeremey Wise wrote: > > > Well.. to know how to do it with Curl is helpful.. but I think I did > > [root@odin ~]# curl -s -k --user admin@internal:blahblah > https://ovirte01.penguinpages.local/ovirt-engine/api/storagedomains/ |grep > '' > data > hosted_storage > ovirt-image-repository > > What I guess I did is translated that field --sd-name my-storage-domain \ > to " volume" name... My question is .. where do those fields come from? And > which would you typically place all your VMs into? > > > > > I just took a guess.. and figured "data" sounded like a good place to stick > raw images to build into VM... > > [root@medusa thorst.penguinpages.local:_vmstore]# python3 > /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py --engine-url > https://ovirte01.penguinpages.local/ --username admin@internal > --password-file > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirt.password > --cafile > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirte01_pki-resource.cer > --sd-name data --disk-sparse > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/ns02.qcow2 > Checking image... > Image format: qcow2 > Disk format: cow > Disk content type: data > Disk provisioned size: 21474836480 > Disk initial size: 11574706176 > Disk name: ns02.qcow2 > Disk backup: False > Connecting... > Creating disk... > Disk ID: 9ccb26cf-dd4a-4c9a-830c-ee084074d7a1 > Creating image transfer... > Transfer ID: 3a382f0b-1e7d-4397-ab16-4def0e9fe890 > Transfer host name: medusa > Uploading image... > [ 100.00% ] 20.00 GiB, 249.86 seconds, 81.97 MiB/s > Finalizing image transfer... > Upload completed successfully > [root@medusa thorst.penguinpages.local:_vmstore]# python3 > /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py --engine-url > https://ovirte01.penguinpages.local/ --username admin@internal > --password-file > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirt.password > --cafile > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirte01_pki-resource.cer > --sd-name data --disk-sparse > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/ns02_v^C > [root@medusa thorst.penguinpages.local:_vmstore]# ls > example.log f118dcae-6162-4e9a-89e4-f30ffcfb9ccf ns02_20200910.tgz > ns02.qcow2 ns02_var.qcow2 > [root@medusa thorst.penguinpages.local:_vmstore]# python3 > /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py --engine-url > https://ovirte01.penguinpages.local/ --username admin@internal > --password-file > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirt.password > --cafile > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirte01_pki-resource.cer > --sd-name data --disk-sparse > /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/ns02_var.qcow2 > Checking image... > Image format: qcow2 > Disk format: cow > Disk content type: data > Disk provisioned size: 107374182400 > Disk initial size: 107390828544 > Disk name: ns02_var.qcow2 > Disk backup: False > Connecting... > Creating disk... > Disk ID: 26def4e7-1153-417c-88c1-fd3dfe2b0fb9 > Creating image transfer... > Transfer ID: 41518eac-8881-453e-acc0-45391fd23bc7 > Transfer host name: medusa > Uploading image... > [ 16.50% ] 16.50 GiB, 556.42 seconds, 30.37 MiB/s > > Now with those ID numbers and that it kept its name (very helpful)... I am > able to re-constitute the VM > > > VM boots fine. Fixing VLANs and manual macs on vNICs.. but this process > worked fine. > > Thanks for input. Would be nice to have a GUI "upload" via http into system > :) We have upload via GUI, but from your mail I understood the images are on the hypervisor, so copying them to the machine running the browser would be wasted of time. Go to Storage > Disks and click "Upload" or "Download". But this is less efficient, less correct, and not supporting all the features like converting image format and controlling sparseness. For uploading and downloading qcow2 images it should be fine, but if you have a qcow2 and want to upload to raw format this can be done only using the API, for example with upload_disk.py. > On Mon, Sep 21, 2020 at 2:19 PM Nir Soffer wrote: >> >> On Mon, Sep 21, 2020 at 8:37 PM penguin pages wrote: >> > >> > >> > I pasted old / file path not right example above.. But here is a cleaner >> > version with error i am trying to root cause >> > >> > [root@odin vmstore]# python3 >> > /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py >> > --engine-url https://ovirte01.penguinpages.local/ --username >> > admin@internal --password-file >> > /gluster_bricks/vmstore/vmstore/.ovirt.password --cafile >> > /gluster_bricks/vmstore/vmstore/.ovirte01_pki-resource.cer --sd-name >> > vmstore --disk-sparse /gluster_bricks/vmstore/vmstore/ns01.qcow2 >> > Checking image... >>
[ovirt-users] console breaks with signed SSL certs
Chrome didnt want to talk AT ALL to ovirt with self-signed certs (Because HSTS is enabled) So I installed signed wildcard certs to the engine, and the nodes, following http://187.1.81.65/ovirt-engine/docs/manual/en-US/html/Administration_Guide/appe-Red_Hat_Enterprise_Virtualization_and_SSL.html and https://cockpit-project.org/guide/172/https.html and chrome is happy now... except that suddenly, consoles refuse to work. and there are no useful errors that I see, other than "Unable to connect to the graphic server" from the remote viewer app. I see someone not too long ago had the exact same problem, in https://www.mail-archive.com/users@ovirt.org/msg58814.html but.. no answer was given to him? Help please -- Philip Brown| Sr. Linux System Administrator | Medata, Inc. 5 Peters Canyon Rd Suite 250 Irvine CA 92606 Office 714.918.1310| Fax 714.918.1325 pbr...@medata.com| www.medata.com ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KNJGW2Z6XPK4CD5LSEPB3ILXQ5KLPQ6B/
[ovirt-users] Re: Upgrade Ovirt from 4.2 to 4.4 on CentOS7.4
oVirt 4.4 requires EL8.2 , so no you cannot go to 4.4 without upgrading the OS to EL8. Yet, you can still bump the version to 4.3.10 which is still EL7 based and it works quite good. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 17:39:52 Гринуич+3, написа: Hi everyone, I am writing for support regarding the ovirt upgrade. I am using Ovirt with version 4.2 on CentOS 7.4 operating system. The latest release of the Ovirt engine is 4.4 which is available for CentOS 8.Can I upgrade without upgrading the operating system to centos8? I would not be wrong but it is not possible to switch from Centos7 to Centos8 .Can anyone give me some advice?Thank you all!!! ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IWFDBQVPDIX5JHZVIELIU7VIAOSRVROX/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/KJYX6PDK6K2ZZROVACDHMSSRZ5PBRWUS/
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
By the way, did you add the third host in the oVirt ? If not , maybe that is the real problem :) Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise написа: Its like oVirt thinks there are only two nodes in gluster replication # Yet it is clear the CLI shows three bricks. [root@medusa vms]# gluster volume status vmstore Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid -- Brick thorst.penguinpages.local:/gluster_br icks/vmstore/vmstore 49154 0 Y 9444 Brick odinst.penguinpages.local:/gluster_br icks/vmstore/vmstore 49154 0 Y 3269 Brick medusast.penguinpages.local:/gluster_ bricks/vmstore/vmstore 49154 0 Y 7841 Self-heal Daemon on localhost N/A N/A Y 80152 Self-heal Daemon on odinst.penguinpages.loc al N/A N/A Y 141750 Self-heal Daemon on thorst.penguinpages.loc al N/A N/A Y 245870 Task Status of Volume vmstore -- There are no active volume tasks How do I get oVirt to re-establish reality to what Gluster sees? On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov wrote: > Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 > bricks up , but usually it was an UI issue and you go to UI and mark a "force > start" which will try to start any bricks that were down (won't affect > gluster) and will wake up the UI task to verify again brick status. > > > https://github.com/gluster/gstatus is a good one to verify your cluster > health , yet human's touch is priceless in any kind of technology. > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise > написа: > > > > > > > > when I posted last.. in the tread I paste a roling restart. And... now > it is replicating. > > oVirt still showing wrong. BUT.. I did my normal test from each of the > three nodes. > > 1) Mount Gluster file system with localhost as primary and other two as > tertiary to local mount (like a client would do) > 2) run test file create Ex: echo $HOSTNAME >> /media/glustervolume/test.out > 3) repeat from each node then read back that all are in sync. > > I REALLY hate reboot (restart) as a fix. I need to get better with root > cause of gluster issues if I am going to trust it. Before when I manually > made the volumes and it was simply (vdo + gluster) then worst case was that > gluster would break... but I could always go into "brick" path and copy data > out. > > Now with oVirt.. .and LVM and thin provisioning etc.. I am abstracted from > simple file recovery.. Without GLUSTER AND oVirt Engine up... all my > environment and data is lost. This means nodes moved more to "pets" then > cattle. > > And with three nodes.. I can't afford to loose any pets. > > I will post more when I get cluster settled and work on those wierd notes > about quorum volumes noted on two nodes when glusterd is restarted. > > Thanks, > > On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov wrote: >> Replication issue could mean that one of the client (FUSE mounts) is not >> attached to all bricks. >> >> You can check the amount of clients via: >> gluster volume status all client-list >> >> >> As a prevention , just do a rolling restart: >> - set a host in maintenance and mark it to stop glusterd service (I'm >> reffering to the UI) >> - Activate the host , once it was moved to maintenance >> >> Wait for the host's HE score to recover (silver/gold crown in UI) and then >> proceed with the next one. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise >> написа: >> >> >> >> >> >> >> I did. >> >> Here are all three nodes with restart. I find it odd ... their has been a >> set of messages at end (see below) which I don't know enough about what >> oVirt laid out to know if it is bad. >> >> ### >> [root@thor vmstore]# systemctl status glusterd >> ● glusterd.service - GlusterFS, a clustered file-system server >> Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor >> preset: disabled) >> Drop-In: /etc/systemd/system/glusterd.service.d >> └─99-cpu.conf >> Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago >> Docs: man:glusterd(8) >> Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid >> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) >> Main PID: 2113 (glusterd) >> Tasks: 151 (limit: 1235410) >> Memory:
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
That's really wierd. I would give the engine a 'Windows'-style fix (a.k.a. reboot). I guess some of the engine's internal processes crashed/looped and it doesn't see the reality. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 16:27:25 Гринуич+3, Jeremey Wise написа: Its like oVirt thinks there are only two nodes in gluster replication # Yet it is clear the CLI shows three bricks. [root@medusa vms]# gluster volume status vmstore Status of volume: vmstore Gluster process TCP Port RDMA Port Online Pid -- Brick thorst.penguinpages.local:/gluster_br icks/vmstore/vmstore 49154 0 Y 9444 Brick odinst.penguinpages.local:/gluster_br icks/vmstore/vmstore 49154 0 Y 3269 Brick medusast.penguinpages.local:/gluster_ bricks/vmstore/vmstore 49154 0 Y 7841 Self-heal Daemon on localhost N/A N/A Y 80152 Self-heal Daemon on odinst.penguinpages.loc al N/A N/A Y 141750 Self-heal Daemon on thorst.penguinpages.loc al N/A N/A Y 245870 Task Status of Volume vmstore -- There are no active volume tasks How do I get oVirt to re-establish reality to what Gluster sees? On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov wrote: > Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 > bricks up , but usually it was an UI issue and you go to UI and mark a "force > start" which will try to start any bricks that were down (won't affect > gluster) and will wake up the UI task to verify again brick status. > > > https://github.com/gluster/gstatus is a good one to verify your cluster > health , yet human's touch is priceless in any kind of technology. > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise > написа: > > > > > > > > when I posted last.. in the tread I paste a roling restart. And... now > it is replicating. > > oVirt still showing wrong. BUT.. I did my normal test from each of the > three nodes. > > 1) Mount Gluster file system with localhost as primary and other two as > tertiary to local mount (like a client would do) > 2) run test file create Ex: echo $HOSTNAME >> /media/glustervolume/test.out > 3) repeat from each node then read back that all are in sync. > > I REALLY hate reboot (restart) as a fix. I need to get better with root > cause of gluster issues if I am going to trust it. Before when I manually > made the volumes and it was simply (vdo + gluster) then worst case was that > gluster would break... but I could always go into "brick" path and copy data > out. > > Now with oVirt.. .and LVM and thin provisioning etc.. I am abstracted from > simple file recovery.. Without GLUSTER AND oVirt Engine up... all my > environment and data is lost. This means nodes moved more to "pets" then > cattle. > > And with three nodes.. I can't afford to loose any pets. > > I will post more when I get cluster settled and work on those wierd notes > about quorum volumes noted on two nodes when glusterd is restarted. > > Thanks, > > On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov wrote: >> Replication issue could mean that one of the client (FUSE mounts) is not >> attached to all bricks. >> >> You can check the amount of clients via: >> gluster volume status all client-list >> >> >> As a prevention , just do a rolling restart: >> - set a host in maintenance and mark it to stop glusterd service (I'm >> reffering to the UI) >> - Activate the host , once it was moved to maintenance >> >> Wait for the host's HE score to recover (silver/gold crown in UI) and then >> proceed with the next one. >> >> Best Regards, >> Strahil Nikolov >> >> >> >> >> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise >> написа: >> >> >> >> >> >> >> I did. >> >> Here are all three nodes with restart. I find it odd ... their has been a >> set of messages at end (see below) which I don't know enough about what >> oVirt laid out to know if it is bad. >> >> ### >> [root@thor vmstore]# systemctl status glusterd >> ● glusterd.service - GlusterFS, a clustered file-system server >> Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor >> preset: disabled) >> Drop-In: /etc/systemd/system/glusterd.service.d >> └─99-cpu.conf >> Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago >> Docs: man:glusterd(8) >> Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid >> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited,
[ovirt-users] Re: VM stuck in "reboot in progress" ("virtual machine XXX should be running in a host but it isn't.").
Arik / Strahil, Many thanks! Just in-case anyone else is hitting the same issue (*NOTE* Host and VM ID _will_ be different!) 0. Ran a backup: 1. Connect to the hosted-engine and DB: $ ssh root@vmengine $ su - postgres $ psql engine 2. Execute a select query to verify that the VM's run_on_vds is NULL: # select * from vm_dynamic where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5'; 3. Execute Arik's update query: # update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5'; 4. Re-started the engine: $ systemctl restart ovirt-engine 5. Everything seems fine now. Profit! Thanks again, Gilboa On Mon, Sep 21, 2020 at 4:28 PM Arik Hadas wrote: > > > > On Sun, Sep 20, 2020 at 11:21 AM Gilboa Davara wrote: >> >> On Sat, Sep 19, 2020 at 7:44 PM Arik Hadas wrote: >> > >> > >> > >> > On Fri, Sep 18, 2020 at 8:27 AM Gilboa Davara wrote: >> >> >> >> Hello all (and happy new year), >> >> >> >> (Note: Also reported as >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1880251) >> >> >> >> Self hosted engine, single node, NFS. >> >> Attempted to install CentOS over an existing Fedora VM with one host >> >> device (USB printer). >> >> Reboot failed, trying to boot from a non-existent CDROM. >> >> Tried shutting the VM down, failed. >> >> Tried powering off the VM, failed. >> >> Dropped cluster to global maintenance, reboot host + engine (was >> >> planning to upgrade it anyhow...), VM still stuck. >> >> >> >> When trying to power off the VM, the following message can be found >> >> the in engine.log: >> >> 2020-09-18 07:58:51,439+03 INFO >> >> [org.ovirt.engine.core.bll.StopVmCommand] >> >> (EE-ManagedThreadFactory-engine-Thread-42) >> >> [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand >> >> internal: false. Entities affected : ID: >> >> b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with >> >> role type USER >> >> 2020-09-18 07:58:51,441+03 WARN >> >> [org.ovirt.engine.core.bll.StopVmCommand] >> >> (EE-ManagedThreadFactory-engine-Thread-42) >> >> [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the >> >> status 'RebootInProgress' virtual machine >> >> 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but >> >> it isn't. >> >> 2020-09-18 07:58:51,594+03 ERROR >> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] >> >> (EE-ManagedThreadFactory-engine-Thread-42) >> >> [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID: >> >> USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host: >> >> , User: gilboa@internal-authz). >> >> >> >> My question is simple: Pending a solution to the bug, can I somehow >> >> drop the state of the VM? It's currently holding a sizable disk image >> >> and a USB device I need (printer). >> > >> > >> > It would be best to modify the VM as if it should still be running on the >> > host and let the system discover that it's not running there and update >> > the VM accordingly. >> > >> > You can do it by changing the database with: >> > update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' >> > where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5'; >> > >> > >> >> >> >> >> >> As it's my private VM cluster, I have no problem dropping the site >> >> completely for maintenance. >> >> >> >> Thanks, >> >> >> >> Gilboa >> >> >> Hello, >> >> Thanks for the prompt answer. >> >> Edward, >> >> Full reboot of both engine and host didn't help. >> Most likely there's a consistency problem in the oVirt DB. >> >> Arik, >> >> To which DB I should connect and as which user? >> E.g. psql -U user db_name > > > To the 'engine' database. > I usually connect to it by switching to the 'postgres' user as Strahil > described. > >> >> >> Thanks again, >> - Gilboa >> ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/UVUOGUI7N3AW2L4J2WCQBQUW4BTTCOA6/
[ovirt-users] Upgrade Ovirt from 4.2 to 4.4 on CentOS7.4
Hi everyone, I am writing for support regarding the ovirt upgrade. I am using Ovirt with version 4.2 on CentOS 7.4 operating system. The latest release of the Ovirt engine is 4.4 which is available for CentOS 8.Can I upgrade without upgrading the operating system to centos8? I would not be wrong but it is not possible to switch from Centos7 to Centos8 .Can anyone give me some advice?Thank you all!!! ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/IWFDBQVPDIX5JHZVIELIU7VIAOSRVROX/
[ovirt-users] Re: Question on "Memory" column/field in Virtual Machines list/table in ovirt GUI
>Ok, May I know why you think it's only a bug in SLES?. I never claimed it is a bug in SLES, but a bug in Ovirt detecting proper memory usage in SLES. The behaviour you observe was normal for RHEL6/CentOS6/SLES11/openSUSE and bellow , so it is normal for some OSes.In my oVirt 4.3.10 , I see that the entry there is "SLES11+" , but I believe that it is checking the memory on SLES15 , just as if it is a SLES11. >As I said before, ovirt is behaving the same way even for CentOS7 VMs. I am >attaching the details again here below. Most probably oVirt is checking memory the RHEL6 style , which is not the correct one. >My question is why ovirt is treating buff/cache memory as used memory and why >is not reporting memory usage just based on actual used memory? Most probably it is a bug :D , every software has some. I would recommend you to open a bug in the bugzilla.redhat.com for each OS type (for example 1 for SLES/openSUSE and 1 for EL7/EL8-based). Best Regards, Strahil Nikolov ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQT22I3GTVLAZZPHJ6UAMPIW6Y2XEKEA/
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 bricks up , but usually it was an UI issue and you go to UI and mark a "force start" which will try to start any bricks that were down (won't affect gluster) and will wake up the UI task to verify again brick status. https://github.com/gluster/gstatus is a good one to verify your cluster health , yet human's touch is priceless in any kind of technology. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise написа: when I posted last.. in the tread I paste a roling restart. And... now it is replicating. oVirt still showing wrong. BUT.. I did my normal test from each of the three nodes. 1) Mount Gluster file system with localhost as primary and other two as tertiary to local mount (like a client would do) 2) run test file create Ex: echo $HOSTNAME >> /media/glustervolume/test.out 3) repeat from each node then read back that all are in sync. I REALLY hate reboot (restart) as a fix. I need to get better with root cause of gluster issues if I am going to trust it. Before when I manually made the volumes and it was simply (vdo + gluster) then worst case was that gluster would break... but I could always go into "brick" path and copy data out. Now with oVirt.. .and LVM and thin provisioning etc.. I am abstracted from simple file recovery.. Without GLUSTER AND oVirt Engine up... all my environment and data is lost. This means nodes moved more to "pets" then cattle. And with three nodes.. I can't afford to loose any pets. I will post more when I get cluster settled and work on those wierd notes about quorum volumes noted on two nodes when glusterd is restarted. Thanks, On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov wrote: > Replication issue could mean that one of the client (FUSE mounts) is not > attached to all bricks. > > You can check the amount of clients via: > gluster volume status all client-list > > > As a prevention , just do a rolling restart: > - set a host in maintenance and mark it to stop glusterd service (I'm > reffering to the UI) > - Activate the host , once it was moved to maintenance > > Wait for the host's HE score to recover (silver/gold crown in UI) and then > proceed with the next one. > > Best Regards, > Strahil Nikolov > > > > > В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise > написа: > > > > > > > I did. > > Here are all three nodes with restart. I find it odd ... their has been a set > of messages at end (see below) which I don't know enough about what oVirt > laid out to know if it is bad. > > ### > [root@thor vmstore]# systemctl status glusterd > ● glusterd.service - GlusterFS, a clustered file-system server > Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor > preset: disabled) > Drop-In: /etc/systemd/system/glusterd.service.d > └─99-cpu.conf > Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago > Docs: man:glusterd(8) > Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid > --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) > Main PID: 2113 (glusterd) > Tasks: 151 (limit: 1235410) > Memory: 3.8G > CPU: 6min 46.050s > CGroup: /glusterfs.slice/glusterd.service > ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level > INFO > ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p > /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log > -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option > *replicate*.node-uu> > ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p > /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid > -S /var/r> > ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p > /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p> > ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore > -p > /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms> > └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p > /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid > -S /var/run/glu> > > Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a > clustered file-system server... > Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a > clustered file-system server. > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 >
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
Usually I first start with: 'gluster volume heal info summary' Anything that is not 'Connected' is bad. Yeah, the abstraction is not so nice, but the good thing is that you can always extract the data from a single node left (it will require to play a little bit with the quorum of the volume). Usually I have seen that the FUSE fails to reconnect to a "gone bad and recovered" brick and then you got that endless healing (as FUSE will write the data to only 2 out of 3 bricks and then a heal is pending :D ). I would go with the gluster logs and the brick logs and then you can dig deeper if you suspect network issue. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise написа: when I posted last.. in the tread I paste a roling restart. And... now it is replicating. oVirt still showing wrong. BUT.. I did my normal test from each of the three nodes. 1) Mount Gluster file system with localhost as primary and other two as tertiary to local mount (like a client would do) 2) run test file create Ex: echo $HOSTNAME >> /media/glustervolume/test.out 3) repeat from each node then read back that all are in sync. I REALLY hate reboot (restart) as a fix. I need to get better with root cause of gluster issues if I am going to trust it. Before when I manually made the volumes and it was simply (vdo + gluster) then worst case was that gluster would break... but I could always go into "brick" path and copy data out. Now with oVirt.. .and LVM and thin provisioning etc.. I am abstracted from simple file recovery.. Without GLUSTER AND oVirt Engine up... all my environment and data is lost. This means nodes moved more to "pets" then cattle. And with three nodes.. I can't afford to loose any pets. I will post more when I get cluster settled and work on those wierd notes about quorum volumes noted on two nodes when glusterd is restarted. Thanks, On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov wrote: > Replication issue could mean that one of the client (FUSE mounts) is not > attached to all bricks. > > You can check the amount of clients via: > gluster volume status all client-list > > > As a prevention , just do a rolling restart: > - set a host in maintenance and mark it to stop glusterd service (I'm > reffering to the UI) > - Activate the host , once it was moved to maintenance > > Wait for the host's HE score to recover (silver/gold crown in UI) and then > proceed with the next one. > > Best Regards, > Strahil Nikolov > > > > > В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise > написа: > > > > > > > I did. > > Here are all three nodes with restart. I find it odd ... their has been a set > of messages at end (see below) which I don't know enough about what oVirt > laid out to know if it is bad. > > ### > [root@thor vmstore]# systemctl status glusterd > ● glusterd.service - GlusterFS, a clustered file-system server > Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor > preset: disabled) > Drop-In: /etc/systemd/system/glusterd.service.d > └─99-cpu.conf > Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago > Docs: man:glusterd(8) > Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid > --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) > Main PID: 2113 (glusterd) > Tasks: 151 (limit: 1235410) > Memory: 3.8G > CPU: 6min 46.050s > CGroup: /glusterfs.slice/glusterd.service > ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level > INFO > ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p > /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log > -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option > *replicate*.node-uu> > ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p > /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid > -S /var/r> > ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p > /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p> > ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore > -p > /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms> > └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p > /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid > -S /var/run/glu> > > Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a > clustered file-system server...
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
when I posted last.. in the tread I paste a roling restart.And... now it is replicating. oVirt still showing wrong. BUT.. I did my normal test from each of the three nodes. 1) Mount Gluster file system with localhost as primary and other two as tertiary to local mount (like a client would do) 2) run test file create Ex: echo $HOSTNAME >> /media/glustervolume/test.out 3) repeat from each node then read back that all are in sync. I REALLY hate reboot (restart) as a fix. I need to get better with root cause of gluster issues if I am going to trust it. Before when I manually made the volumes and it was simply (vdo + gluster) then worst case was that gluster would break... but I could always go into "brick" path and copy data out. Now with oVirt.. .and LVM and thin provisioning etc.. I am abstracted from simple file recovery.. Without GLUSTER AND oVirt Engine up... all my environment and data is lost. This means nodes moved more to "pets" then cattle. And with three nodes.. I can't afford to loose any pets. I will post more when I get cluster settled and work on those wierd notes about quorum volumes noted on two nodes when glusterd is restarted. Thanks, On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov wrote: > Replication issue could mean that one of the client (FUSE mounts) is not > attached to all bricks. > > You can check the amount of clients via: > gluster volume status all client-list > > > As a prevention , just do a rolling restart: > - set a host in maintenance and mark it to stop glusterd service (I'm > reffering to the UI) > - Activate the host , once it was moved to maintenance > > Wait for the host's HE score to recover (silver/gold crown in UI) and then > proceed with the next one. > > Best Regards, > Strahil Nikolov > > > > > В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise < > jeremey.w...@gmail.com> написа: > > > > > > > I did. > > Here are all three nodes with restart. I find it odd ... their has been a > set of messages at end (see below) which I don't know enough about what > oVirt laid out to know if it is bad. > > ### > [root@thor vmstore]# systemctl status glusterd > ● glusterd.service - GlusterFS, a clustered file-system server >Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; > vendor preset: disabled) > Drop-In: /etc/systemd/system/glusterd.service.d >└─99-cpu.conf >Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago > Docs: man:glusterd(8) > Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid > --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) > Main PID: 2113 (glusterd) > Tasks: 151 (limit: 1235410) >Memory: 3.8G > CPU: 6min 46.050s >CGroup: /glusterfs.slice/glusterd.service >├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level > INFO >├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data > -p /var/run/gluster/shd/data/data-shd.pid -l > /var/log/glusterfs/glustershd.log -S > /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option > *replicate*.node-uu> >├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p > /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid > -S /var/r> >├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine > -p > /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p> >├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id > vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p > /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms> >└─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local > --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p > /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid > -S /var/run/glu> > > Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a > clustered file-system server... > Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a > clustered file-system server. > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 > 00:32:28.605674] C [MSGID: 106003] > [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] > 0-management: Server quorum regained for volume data. Starting lo> > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 > 00:32:28.639490] C [MSGID: 106003] > [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] > 0-management: Server quorum regained for volume engine. Starting > > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 > 00:32:28.680665] C [MSGID: 106003] > [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] >
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
At around Sep 21 20:33 local time , you got a loss of quorum - that's not good. Could it be a network 'hicup' ? Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 15:05:16 Гринуич+3, Jeremey Wise написа: I did. Here are all three nodes with restart. I find it odd ... their has been a set of messages at end (see below) which I don't know enough about what oVirt laid out to know if it is bad. ### [root@thor vmstore]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/glusterd.service.d └─99-cpu.conf Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago Docs: man:glusterd(8) Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 2113 (glusterd) Tasks: 151 (limit: 1235410) Memory: 3.8G CPU: 6min 46.050s CGroup: /glusterfs.slice/glusterd.service ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu> ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid -S /var/r> ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p> ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms> └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/glu> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server... Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server. Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.605674] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting lo> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.639490] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Starting > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.680665] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starting> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, discon> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, disc> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, dis> Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 seconds, disconn> [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# systemctl restart glusterd [root@thor vmstore]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/glusterd.service.d └─99-cpu.conf Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago Docs: man:glusterd(8) Process: 245831 ExecStart=/usr/sbin/glusterd -p
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
Replication issue could mean that one of the client (FUSE mounts) is not attached to all bricks. You can check the amount of clients via: gluster volume status all client-list As a prevention , just do a rolling restart: - set a host in maintenance and mark it to stop glusterd service (I'm reffering to the UI) - Activate the host , once it was moved to maintenance Wait for the host's HE score to recover (silver/gold crown in UI) and then proceed with the next one. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise написа: I did. Here are all three nodes with restart. I find it odd ... their has been a set of messages at end (see below) which I don't know enough about what oVirt laid out to know if it is bad. ### [root@thor vmstore]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/glusterd.service.d └─99-cpu.conf Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago Docs: man:glusterd(8) Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 2113 (glusterd) Tasks: 151 (limit: 1235410) Memory: 3.8G CPU: 6min 46.050s CGroup: /glusterfs.slice/glusterd.service ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu> ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid -S /var/r> ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p> ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms> └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/glu> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server... Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server. Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.605674] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting lo> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.639490] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Starting > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.680665] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starting> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, discon> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, disc> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, dis> Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 seconds, disconn> [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# systemctl restart glusterd [root@thor vmstore]# systemctl status glusterd ● glusterd.service - GlusterFS, a
[ovirt-users] Re: Cannot import VM disks from previously detached storage domain
I will have a look. Thank you for your support in oVirt! On Tue, 22 Sep 2020 at 15:30, Strahil Nikolov wrote: > Hi Eyal, > > thanks for the reply - all the proposed options make sense. > I have opened a RFE -> https://bugzilla.redhat.com/show_bug.cgi?id=1881457 > , but can you verify that the product/team are the correct one ? > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 22 септември 2020 г., 12:55:56 Гринуич+3, Eyal Shenitzky < > eshen...@redhat.com> написа: > > > > > > > > On Mon, 21 Sep 2020 at 23:19, Strahil Nikolov > wrote: > > Hey Eyal, > > > > it's really irritating that only ISOs can be imported as disks. > > > > I had to: > > 1. Delete snapshot (but I really wanted to keep it) > > 2. Detach all disks from existing VM > > 3. Delete the VM > > 4. Import the Vm from the data domain > > 5. Delete the snapshot , so disks from data domain are "in sync" with > the non-data disks > > 6. Attach the non-data disks to the VM > > > > If all disks for a VM were on the same storage domain - I didn't have to > wipe my snapshots. > > > > Should I file a RFE in order to allow disk import for non-ISO disks ? > > If I wanted to rebuild the engine and import the sotrage domains I would > have to import the VM the first time , just to delete it and import it > again - so I can get my VM disks from the storage... > > > > From what I understand you want to file an RFE that requests the option to > split 'unregistered' entities in a data domain, but unfortunately this is > not possible. > > But we may add different options: > * merge/squash to identical partial VMs > * Override an existing VM > * Force import the VM with a different ID > You can file an RFE with those suggest options. > > Also, please add the description of why do you think it is needed. > > > > Best Regards, > > Strahil Nikolov > > > > > > > > > > > > В понеделник, 21 септември 2020 г., 11:47:04 Гринуич+3, Eyal Shenitzky < > eshen...@redhat.com> написа: > > > > > > > > > > > > Hi Stranhil, > > > > Maybe those VMs has more disks on different data storage domains? > > If so, those VMs will remain on the environment with the disks that are > not based on the detached storage-domain. > > > > You can try to import the VM as partial, another option is to remove the > VM that remained in the environment but > > keep the disks so you will be able to import the VM and attach the disks > to it. > > > > On Sat, 19 Sep 2020 at 15:49, Strahil Nikolov via Users > wrote: > >> Hello All, > >> > >> I would like to ask how to proceed further. > >> > >> Here is what I have done so far on my ovirt 4.3.10: > >> 1. Set in maintenance and detached my Gluster-based storage domain > >> 2. Did some maintenance on the gluster > >> 3. Reattached and activated my Gluster-based storage domain > >> 4. I have imported my ISOs via the Disk Import tab in UI > >> > >> Next I tried to import the VM Disks , but they are unavailable in the > disk tab > >> So I tried to import the VM: > >> 1. First try - import with partial -> failed due to MAC conflict > >> 2. Second try - import with partial , allow MAC reassignment -> failed > as VM id exists -> recommends to remove the original VM > >> 3. I tried to detach the VMs disks , so I can delete it - but this is > not possible as the Vm already got a snapshot. > >> > >> > >> What is the proper way to import my non-OS disks (data domain is slower > but has more space which is more suitable for "data") ? > >> > >> > >> Best Regards, > >> Strahil Nikolov > >> ___ > >> Users mailing list -- users@ovirt.org > >> To unsubscribe send an email to users-le...@ovirt.org > >> Privacy Statement: https://www.ovirt.org/privacy-policy.html > >> oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > >> List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTJXOIVDWU6DGVZQQ243VKGWJLPKHR4L/ > > > >> > > > > > > -- > > Regards, > > Eyal Shenitzky > > > > > > > -- > Regards, > Eyal Shenitzky > > -- Regards, Eyal Shenitzky ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/SL2I3WEQ3MS6TIVBL5SC42B4FAZNTSWX/
[ovirt-users] Re: Gluster Domain Storage full
Any option to extend the Gluster Volume ? Other approaches are quite destructive. I guess , you can obtain the VM's xml via virsh and then copy the disks to another pure-KVM host. Then you can start the VM , while you are recovering from the situation. virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf dumpxml > /some/path/.xml Once you got the VM running on a pure-KVM host , you can go to oVirt and try to wipe the VM from the UI. Usually those 10% reserve is just in case something like this one has happened, but Gluster doesn't check it every second (or the overhead will be crazy). Maybe you can extend the Gluster volume temporarily , till you manage to move away the VM to a bigger storage. Then you can reduce the volume back to original size. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 14:53:53 Гринуич+3, supo...@logicworks.pt написа: Hello Strahil, I just set cluster.min-free-disk to 1%: # gluster volume info data Volume Name: data Type: Distribute Volume ID: 2d3ea533-aca3-41c4-8cb6-239fe4f82bc3 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: node2.domain.com:/home/brick1 Options Reconfigured: cluster.min-free-disk: 1% cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 transport.address-family: inet nfs.disable: on But still get the same error: Error while executing action: Cannot move Virtual Disk. Low disk space on Storage Domain I restarted the glusterfs volume. But I can not do anything with the VM disk. I know that filling the bricks is very bad, we lost access to the VM. I think there should be a mechanism to prevent stopping the VM. we should continue to have access to the VM to free some space. If you have a VM with a Thin Provision disk, if the VM fills the entire disk, we got the same problem. Any idea? Thanks José De: "Strahil Nikolov" Para: "users" , supo...@logicworks.pt Enviadas: Segunda-feira, 21 De Setembro de 2020 21:28:10 Assunto: Re: [ovirt-users] Gluster Domain Storage full Usually gluster has a 10% reserver defined in 'cluster.min-free-disk' volume option. You can power off the VM , then set cluster.min-free-disk to 1% and immediately move any of the VM's disks to another storage domain. Keep in mind that filling your bricks is bad and if you eat that reserve , the only option would be to try to export the VM as OVA and then wipe from current storage and import in a bigger storage domain. Of course it would be more sensible to just expand the gluster volume (either scale-up the bricks -> add more disks, or scale-out -> adding more servers with disks on them), but I guess that is not an option - right ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 15:58:01 Гринуич+3, supo...@logicworks.pt написа: Hello, I'm running oVirt Version 4.3.4.3-1.el7. I have a small GlusterFS Domain storage brick on a dedicated filesystem serving only one VM. The VM filled all the Domain storage. The Linux filesystem has 4.1G available and 100% used, the mounted brick has 0GB available and 100% used I can not do anything with this disk, for example, if I try to move it to another Gluster Domain Storage get the message: Error while executing action: Cannot move Virtual Disk. Low disk space on Storage Domain Any idea? Thanks -- Jose Ferradeira http://www.logicworks.pt ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WFN2VOQZPPVCGXAIFEYVIDEVJEUCSWY7/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AIJUP2HZIWRSQHN4XU3BGGT2ZDKEVJZ3/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBAJWBN3QSKWEPWVP4DIL7OGNTASVZLP/
[ovirt-users] Re: Cannot import VM disks from previously detached storage domain
Hi Eyal, thanks for the reply - all the proposed options make sense. I have opened a RFE -> https://bugzilla.redhat.com/show_bug.cgi?id=1881457 , but can you verify that the product/team are the correct one ? Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 12:55:56 Гринуич+3, Eyal Shenitzky написа: On Mon, 21 Sep 2020 at 23:19, Strahil Nikolov wrote: > Hey Eyal, > > it's really irritating that only ISOs can be imported as disks. > > I had to: > 1. Delete snapshot (but I really wanted to keep it) > 2. Detach all disks from existing VM > 3. Delete the VM > 4. Import the Vm from the data domain > 5. Delete the snapshot , so disks from data domain are "in sync" with the > non-data disks > 6. Attach the non-data disks to the VM > > If all disks for a VM were on the same storage domain - I didn't have to wipe > my snapshots. > > Should I file a RFE in order to allow disk import for non-ISO disks ? > If I wanted to rebuild the engine and import the sotrage domains I would have > to import the VM the first time , just to delete it and import it again - so > I can get my VM disks from the storage... > From what I understand you want to file an RFE that requests the option to split 'unregistered' entities in a data domain, but unfortunately this is not possible. But we may add different options: * merge/squash to identical partial VMs * Override an existing VM * Force import the VM with a different ID You can file an RFE with those suggest options. Also, please add the description of why do you think it is needed. > Best Regards, > Strahil Nikolov > > > > > > В понеделник, 21 септември 2020 г., 11:47:04 Гринуич+3, Eyal Shenitzky > написа: > > > > > > Hi Stranhil, > > Maybe those VMs has more disks on different data storage domains? > If so, those VMs will remain on the environment with the disks that are not > based on the detached storage-domain. > > You can try to import the VM as partial, another option is to remove the VM > that remained in the environment but > keep the disks so you will be able to import the VM and attach the disks to > it. > > On Sat, 19 Sep 2020 at 15:49, Strahil Nikolov via Users > wrote: >> Hello All, >> >> I would like to ask how to proceed further. >> >> Here is what I have done so far on my ovirt 4.3.10: >> 1. Set in maintenance and detached my Gluster-based storage domain >> 2. Did some maintenance on the gluster >> 3. Reattached and activated my Gluster-based storage domain >> 4. I have imported my ISOs via the Disk Import tab in UI >> >> Next I tried to import the VM Disks , but they are unavailable in the disk >> tab >> So I tried to import the VM: >> 1. First try - import with partial -> failed due to MAC conflict >> 2. Second try - import with partial , allow MAC reassignment -> failed as VM >> id exists -> recommends to remove the original VM >> 3. I tried to detach the VMs disks , so I can delete it - but this is not >> possible as the Vm already got a snapshot. >> >> >> What is the proper way to import my non-OS disks (data domain is slower but >> has more space which is more suitable for "data") ? >> >> >> Best Regards, >> Strahil Nikolov >> ___ >> Users mailing list -- users@ovirt.org >> To unsubscribe send an email to users-le...@ovirt.org >> Privacy Statement: https://www.ovirt.org/privacy-policy.html >> oVirt Code of Conduct: >> https://www.ovirt.org/community/about/community-guidelines/ >> List Archives: >> https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTJXOIVDWU6DGVZQQ243VKGWJLPKHR4L/ > >> > > > -- > Regards, > Eyal Shenitzky > > -- Regards, Eyal Shenitzky ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/FEU3KIA76YUA6EDI6SIOY43MHI2Z2ZNB/
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
I did. Here are all three nodes with restart. I find it odd ... their has been a set of messages at end (see below) which I don't know enough about what oVirt laid out to know if it is bad. ### [root@thor vmstore]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/glusterd.service.d └─99-cpu.conf Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago Docs: man:glusterd(8) Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 2113 (glusterd) Tasks: 151 (limit: 1235410) Memory: 3.8G CPU: 6min 46.050s CGroup: /glusterfs.slice/glusterd.service ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu> ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid -S /var/r> ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p> ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms> └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid -S /var/run/glu> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a clustered file-system server... Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a clustered file-system server. Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.605674] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume data. Starting lo> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.639490] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume engine. Starting > Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 00:32:28.680665] C [MSGID: 106003] [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: Server quorum regained for volume vmstore. Starting> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, discon> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, disc> Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 30 seconds, dis> Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 seconds, disconn> [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# [root@thor vmstore]# systemctl restart glusterd [root@thor vmstore]# systemctl status glusterd ● glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/glusterd.service.d └─99-cpu.conf Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago Docs: man:glusterd(8) Process: 245831 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 245832 (glusterd) Tasks: 151 (limit: 1235410) Memory: 3.8G CPU: 132ms CGroup: /glusterfs.slice/glusterd.service ├─ 2914 /usr/sbin/glusterfs -s localhost
[ovirt-users] Re: Gluster Domain Storage full
Hello Strahil, I just set cluster.min-free-disk to 1%: # gluster volume info data Volume Name: data Type: Distribute Volume ID: 2d3ea533-aca3-41c4-8cb6-239fe4f82bc3 Status: Started Snapshot Count: 0 Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: node2.domain.com:/home/brick1 Options Reconfigured: cluster.min-free-disk: 1% cluster.data-self-heal-algorithm: full performance.low-prio-threads: 32 features.shard-block-size: 512MB features.shard: on storage.owner-gid: 36 storage.owner-uid: 36 transport.address-family: inet nfs.disable: on But still get the same error: Error while executing action: Cannot move Virtual Disk. Low disk space on Storage Domain I restarted the glusterfs volume. But I can not do anything with the VM disk. I know that filling the bricks is very bad, we lost access to the VM. I think there should be a mechanism to prevent stopping the VM. we should continue to have access to the VM to free some space. If you have a VM with a Thin Provision disk, if the VM fills the entire disk, we got the same problem. Any idea? Thanks José De: "Strahil Nikolov" Para: "users" , supo...@logicworks.pt Enviadas: Segunda-feira, 21 De Setembro de 2020 21:28:10 Assunto: Re: [ovirt-users] Gluster Domain Storage full Usually gluster has a 10% reserver defined in 'cluster.min-free-disk' volume option. You can power off the VM , then set cluster.min-free-disk to 1% and immediately move any of the VM's disks to another storage domain. Keep in mind that filling your bricks is bad and if you eat that reserve , the only option would be to try to export the VM as OVA and then wipe from current storage and import in a bigger storage domain. Of course it would be more sensible to just expand the gluster volume (either scale-up the bricks -> add more disks, or scale-out -> adding more servers with disks on them), but I guess that is not an option - right ? Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 15:58:01 Гринуич+3, supo...@logicworks.pt написа: Hello, I'm running oVirt Version 4.3.4.3-1.el7. I have a small GlusterFS Domain storage brick on a dedicated filesystem serving only one VM. The VM filled all the Domain storage. The Linux filesystem has 4.1G available and 100% used, the mounted brick has 0GB available and 100% used I can not do anything with this disk, for example, if I try to move it to another Gluster Domain Storage get the message: Error while executing action: Cannot move Virtual Disk. Low disk space on Storage Domain Any idea? Thanks -- Jose Ferradeira http://www.logicworks.pt ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/WFN2VOQZPPVCGXAIFEYVIDEVJEUCSWY7/ ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/AIJUP2HZIWRSQHN4XU3BGGT2ZDKEVJZ3/
[ovirt-users] Re: Cannot import VM disks from previously detached storage domain
On Mon, 21 Sep 2020 at 23:19, Strahil Nikolov wrote: > Hey Eyal, > > it's really irritating that only ISOs can be imported as disks. > > I had to: > 1. Delete snapshot (but I really wanted to keep it) > 2. Detach all disks from existing VM > 3. Delete the VM > 4. Import the Vm from the data domain > 5. Delete the snapshot , so disks from data domain are "in sync" with the > non-data disks > 6. Attach the non-data disks to the VM > > If all disks for a VM were on the same storage domain - I didn't have to > wipe my snapshots. > > Should I file a RFE in order to allow disk import for non-ISO disks ? > If I wanted to rebuild the engine and import the sotrage domains I would > have to import the VM the first time , just to delete it and import it > again - so I can get my VM disks from the storage... > > >From what I understand you want to file an RFE that requests the option to split 'unregistered' entities in a data domain, but unfortunately this is not possible. But we may add different options: - merge/squash to identical partial VMs - Override an existing VM - Force import the VM with a different ID You can file an RFE with those suggest options. Also, please add the description of why do you think it is needed. > Best Regards, > Strahil Nikolov > > > > > > В понеделник, 21 септември 2020 г., 11:47:04 Гринуич+3, Eyal Shenitzky < > eshen...@redhat.com> написа: > > > > > > Hi Stranhil, > > Maybe those VMs has more disks on different data storage domains? > If so, those VMs will remain on the environment with the disks that are > not based on the detached storage-domain. > > You can try to import the VM as partial, another option is to remove the > VM that remained in the environment but > keep the disks so you will be able to import the VM and attach the disks > to it. > > On Sat, 19 Sep 2020 at 15:49, Strahil Nikolov via Users > wrote: > > Hello All, > > > > I would like to ask how to proceed further. > > > > Here is what I have done so far on my ovirt 4.3.10: > > 1. Set in maintenance and detached my Gluster-based storage domain > > 2. Did some maintenance on the gluster > > 3. Reattached and activated my Gluster-based storage domain > > 4. I have imported my ISOs via the Disk Import tab in UI > > > > Next I tried to import the VM Disks , but they are unavailable in the > disk tab > > So I tried to import the VM: > > 1. First try - import with partial -> failed due to MAC conflict > > 2. Second try - import with partial , allow MAC reassignment -> failed > as VM id exists -> recommends to remove the original VM > > 3. I tried to detach the VMs disks , so I can delete it - but this is > not possible as the Vm already got a snapshot. > > > > > > What is the proper way to import my non-OS disks (data domain is slower > but has more space which is more suitable for "data") ? > > > > > > Best Regards, > > Strahil Nikolov > > ___ > > Users mailing list -- users@ovirt.org > > To unsubscribe send an email to users-le...@ovirt.org > > Privacy Statement: https://www.ovirt.org/privacy-policy.html > > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > > List Archives: > https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTJXOIVDWU6DGVZQQ243VKGWJLPKHR4L/ > > > > > -- > Regards, > Eyal Shenitzky > > -- Regards, Eyal Shenitzky ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/5A7IOHPR6VOOMBXQIJT5FAN2O6FTKVHQ/
[ovirt-users] Re: Question on "Memory" column/field in Virtual Machines list/table in ovirt GUI
Ok, May I know why you think it's only a bug in SLES?. As I said before, ovirt is behaving the same way even for CentOS7 VMs. I am attaching the details again here below. One of running CentOS VM memory details are as below. [centos@centos-vm1 ~]$ free -m total used free shared buff/cache available Mem: 78161257 176 386 6383 5874 Swap: 00 0 Here, out of total allocated memory of 7816 MB, we can see that total actual available memory is 5874 MB and the actual used memory is just 1257 MB, excluding buff/cache. But in ovirt GUI, memory usage field/column for above VM (Compute -> Virtual Machines and then Select VM and Check Memory field/Column) shows usage as 98%. That means that, it says only 2% memory (considering 176 MB) is free and 98% is used (considering used + buff/cache i.e. 1257 MB + 6383 MB). My question is why ovirt is treating buff/cache memory as used memory and why is not reporting memory usage just based on actual used memory? ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/G7GTDFRI36RGFL3OKXRL35MP5N4LHUQ7/
[ovirt-users] Re: Fail install SHE ovirt-engine from backupfile (4.3 -> 4.4)
Ok, solved. Simply the server node2 could not mount via NFS the data domain of the node 1. Added node1 in the node2 firewall and in /etc/exports, tested and everything went fine. Regards, Francesco Il 21/09/2020 17:44, francesco--- via Users ha scritto: Hi Everyone, In a test environment I'm trying to deploy a single node self hosted engine 4.4 on CentOS 8 from a 4.3 backup. The actual setup is: - node1 with CentOS7, oVirt 4.3 with a working SH engine. The data domain is a local NFS; - node2 with CentOS8, where we are triyng to deploy the engine starting from the node1 engine backup - host1, with CentOS78, running a couple of VMs (4.3) I'm following the guide: https://www.ovirt.org/documentation/upgrade_guide/#Upgrading_the_Manager_to_4-4_4-3_SHE Everything seems working fine, the engine on the node1 is in maintenance:global mode and the ovirt-engine service i stopped. The deploy on the node2 stucks in the following error: TASK [ovirt.hosted_engine_setup : Wait for OVF_STORE disk content] [ ERROR ] {'msg': 'non-zero return code', 'cmd': "vdsm-client Image prepare storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 imageID=e48a66dd-74c9-43eb-890e-778e9c4ee8db volumeID=06bb5f34-112d-4214-91d2-53d0bdb84321 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - 6023764f-5547-4b23-92ca-422eafdf3f87.ovf", 'stdout': '', 'stderr': "vdsm-client: Command Image.prepare with args {'storagepoolID': '06c58622-f99b-11ea-9122-00163e1bbc93', 'storagedomainID': '2a4a3cce-f2f6-4ddd-b337-df5ef562f520', 'imageID': 'e48a66dd-74c9-43eb-890e-778e9c4ee8db', 'volumeID': '06bb5f34-112d-4214-91d2-53d0bdb84321'} failed:\n(code=309, message=Unknown pool id, pool not connected: ('06c58622-f99b-11ea-9122-00163e1bbc93',))\ntar: This does not look like a tar archive\ntar: 6023764f-5547-4b23-92ca-422eafdf3f87.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors", 'rc': 2, 'start': '2020-09-21 17:14:17.293090', 'end': '2020-09-21 17:14:17.644253', 'delta': '0:00:00.351163', 'changed': True, 'failed': True, 'invocation': {'module_args': {'warn': False, '_raw_params': "vdsm-client Image prepare storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 imageID=e48a66dd-74c9-43eb-890e-778e9c4ee8db volumeID=06bb5f34-112d-4214-91d2-53d0bdb84321 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - 6023764f-5547-4b23-92ca-422eafdf3f87.ovf", '_uses_shell': True, 'stdin_add_newline': True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable ': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': [], 'stderr_lines': ["vdsm-client: Command Image.prepare with args {'storagepoolID': '06c58622-f99b-11ea-9122-00163e1bbc93', 'storagedomainID': '2a4a3cce-f2f6-4ddd-b337-df5ef562f520', 'imageID': 'e48a66dd-74c9-43eb-890e-778e9c4ee8db', 'volumeID': '06bb5f34-112d-4214-91d2-53d0bdb84321'} failed:", "(code=309, message=Unknown pool id, pool not connected: ('06c58622-f99b-11ea-9122-00163e1bbc93',))", 'tar: This does not look like a tar archive', 'tar: 6023764f-5547-4b23-92ca-422eafdf3f87.ovf: Not found in archive', 'tar: Exiting with failure status due to previous errors'], '_ansible_no_log': False, 'attempts': 12, 'item': {'name': 'OVF_STORE', 'image_id': '06bb5f34-112d-4214-91d2-53d0bdb84321', 'id': 'e48a66dd-74c9-43eb-890e-778e9c4ee8db'}, 'ansible_loop_var': 'item', '_ansible_item_label': {'name': 'OVF_STORE', 'image_id': '06bb5f34-112d-4214-91d2-53d0bdb84321', 'id': 'e48a66dd-74c9-43eb-890e-778e9c4ee8db'}} [ ERROR ] {'msg': 'non-zero return code', 'cmd': "vdsm-client Image prepare storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 imageID=750428bd-1273-467f-9b27-7f6fe58a446c volumeID=1c89c678-f883-4e61-945c-5f7321add343 | grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - 6023764f-5547-4b23-92ca-422eafdf3f87.ovf", 'stdout': '', 'stderr': "vdsm-client: Command Image.prepare with args {'storagepoolID': '06c58622-f99b-11ea-9122-00163e1bbc93', 'storagedomainID': '2a4a3cce-f2f6-4ddd-b337-df5ef562f520', 'imageID': '750428bd-1273-467f-9b27-7f6fe58a446c', 'volumeID': '1c89c678-f883-4e61-945c-5f7321add343'} failed:\n(code=309, message=Unknown pool id, pool not connected: ('06c58622-f99b-11ea-9122-00163e1bbc93',))\ntar: This does not look like a tar archive\ntar: 6023764f-5547-4b23-92ca-422eafdf3f87.ovf: Not found in archive\ntar: Exiting with failure status due to previous errors", 'rc': 2, 'start': '2020-09-21 17:16:26.030343', 'end': '2020-09-21 17:16:26.381862', 'delta': '0:00:00.351519', 'changed': True, 'failed': True, 'invocation': {'module_args': {'warn': False, '_raw_params': "vdsm-client Image prepare storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520
[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?
Hi again Strahil, It’s oVirt 4.3.10. Same CPU on the entire cluster, it’s three machines with Xeon E5-2620v2 (Ivy Bridge), all the machines are identical in model and specs. I’ve changed the VM CPU Model to: Nehalem,+spec-ctrl,+ssbd Let’s see how it behaves. If it crashes again I’ll definitely look at rolling back the OS updates. Thank you all. PS: I can try upgrading to 4.4. > On 22 Sep 2020, at 04:28, Strahil Nikolov wrote: > > This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept > a pretty valid instruction and it was a bug in KVM. > > Maybe you can try to : > - power off the VM > - pick an older CPU type for that VM only > - power on and monitor in the next days > > Do you have a cluster with different cpu vendor (if currently on AMD -> Intel > and if currently Intel -> AMD)? Maybe you can move it to another cluster and > identify if the issue happens there too. > > Another option is to try to rollback the windows updates , to identify if any > of them has caused the problem. Yet, that's aworkaround and not a fix . > > > Are you using oVirt 4.3 or 4.4 ? > > Best Regards, > Strahil Nikolov > > > > > > > В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão > написа: > > > > > > Hi Strahil, yes I can’t find anything recently either. You digged way further > then me, I found some regressions on the kernel but I don’t know if it’s > related or not: > > > > https://patchwork.kernel.org/patch/5526561/ > > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 > > > > > Regarding the OS, nothing new was installed, just regular Windows Updates. > > And finally about nested virtualisation, it’s disabled on hypervisor. > > > > > One thing that caught my attention on the link you’ve sent is regarding a > rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443 > > > > > But come on, it’s from 2006… > > > > > Well, I’m up to other ideas, VM just crashed once again: > > > > > EAX= EBX=075c5180 ECX=75432002 EDX=000400b6 > ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770 > EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0 > ES = 00809300 > CS =9900 7ff99000 00809300 > SS = 00809300 > DS = 00809300 > FS = 00809300 > GS = 00809300 > LDT= 000f > TR =0040 075da000 0067 8b00 > GDT= 075dbfb0 0057 > IDT= > CR0=00050032 CR2=242cb25a CR3=001ad002 CR4= > DR0= DR1= DR2= > DR3= > DR6=4ff0 DR7=0400 > EFER= > Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff > ff > > > > > [519192.536247] *** Guest State *** > [519192.536275] CR0: actual=0x00050032, shadow=0x00050032, > gh_mask=fff7 > [519192.536324] CR4: actual=0x2050, shadow=0x, > gh_mask=f871 > [519192.537322] CR3 = 0x001ad002 > [519192.538166] RSP = 0xfb047db5d770 RIP = 0x8000 > [519192.539017] RFLAGS=0x0002 DR7 = 0x0400 > [519192.539861] Sysenter RSP= CS:RIP=: > [519192.540690] CS: sel=0x9900, attr=0x08093, limit=0x, > base=0x7ff99000 > [519192.541523] DS: sel=0x, attr=0x08093, limit=0x, > base=0x > [519192.542356] SS: sel=0x, attr=0x08093, limit=0x, > base=0x > [519192.543167] ES: sel=0x, attr=0x08093, limit=0x, > base=0x > [519192.543961] FS: sel=0x, attr=0x08093, limit=0x, > base=0x > [519192.544747] GS: sel=0x, attr=0x08093, limit=0x, > base=0x > [519192.545511] GDTR: limit=0x0057, > base=0xad01075dbfb0 > [519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, > base=0x > [519192.547052] IDTR: limit=0x, > base=0x > [519192.547841] TR: sel=0x0040, attr=0x0008b, limit=0x0067, > base=0xad01075da000 > [519192.548639] EFER = 0x PAT = 0x0007010600070106 > [519192.549460] DebugCtl = 0x DebugExceptions = > 0x > [519192.550302] Interruptibility = 0009 ActivityState = > [519192.551137] *** Host State *** > [519192.551963] RIP = 0xc150a034 RSP = 0x88cd9cafbc90 > [519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040 > [519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c > TRBase=88d45f2c4000 > [519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000 > [519192.555347]
[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?
Hi Gianluca. On 22 Sep 2020, at 04:24, Gianluca Cecchi mailto:gianluca.cec...@gmail.com>> wrote: On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users mailto:users@ovirt.org>> wrote: Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not: https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor. In your original post you wrote about the VM going suspended. So I think there could be something useful in engine.log on the engine and/or vdsm.log on the hypervisor. Could you check those? Yes I goes to suspend. I think this is just the engine don’t knowing what really happened and guessing it was suspended. On engine.log I only have this two lines: # grep "2020-09-22 01:51" /var/log/ovirt-engine/engine.log 2020-09-22 01:51:52,604-03 INFO [org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] VM '351db98a-5f74-439f-99a4-31f611b2d250'(cerulean) moved from 'Up' --> 'Paused' 2020-09-22 01:51:52,699-03 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] EVENT_ID: VM_PAUSED(1,025), VM cerulean has been paused. Note that I’ve “grepped” with time. There’s only this two lines when it crashed like 2h30m ago. On vdsm.log on the near time with the name of the VM I only found an huge JSON, with the characteristics of the VM. If there something that I should check specifically? Tried some combinations of “grep” but nothing really useful. Also, do you see anything in event viewer of the WIndows VM and/or in Freenas logs? FreeNAS is just cool, nothing wrong there. No errors on dmesg, nor resource starvation on ZFS. No overload on the disks, nothing… the storage is running easy. About Windows Event Viewer it’s my Achilles’ heel; nothing relevant either as far as I’m concerned. There’s of course some mentions of improperly shutdown due to the crash, but nothing else. I’m looking further here, will report back if I found something useful. Thanks, Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/XTTUYAGYB6EE5I3XNNLBZEBWY363XTIQ/
[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?
This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept a pretty valid instruction and it was a bug in KVM. Maybe you can try to : - power off the VM - pick an older CPU type for that VM only - power on and monitor in the next days Do you have a cluster with different cpu vendor (if currently on AMD -> Intel and if currently Intel -> AMD)? Maybe you can move it to another cluster and identify if the issue happens there too. Another option is to try to rollback the windows updates , to identify if any of them has caused the problem. Yet, that's aworkaround and not a fix . Are you using oVirt 4.3 or 4.4 ? Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão написа: Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not: https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor. One thing that caught my attention on the link you’ve sent is regarding a rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443 But come on, it’s from 2006… Well, I’m up to other ideas, VM just crashed once again: EAX= EBX=075c5180 ECX=75432002 EDX=000400b6 ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770 EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES = 00809300 CS =9900 7ff99000 00809300 SS = 00809300 DS = 00809300 FS = 00809300 GS = 00809300 LDT= 000f TR =0040 075da000 0067 8b00 GDT= 075dbfb0 0057 IDT= CR0=00050032 CR2=242cb25a CR3=001ad002 CR4= DR0= DR1= DR2= DR3= DR6=4ff0 DR7=0400 EFER= Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [519192.536247] *** Guest State *** [519192.536275] CR0: actual=0x00050032, shadow=0x00050032, gh_mask=fff7 [519192.536324] CR4: actual=0x2050, shadow=0x, gh_mask=f871 [519192.537322] CR3 = 0x001ad002 [519192.538166] RSP = 0xfb047db5d770 RIP = 0x8000 [519192.539017] RFLAGS=0x0002 DR7 = 0x0400 [519192.539861] Sysenter RSP= CS:RIP=: [519192.540690] CS: sel=0x9900, attr=0x08093, limit=0x, base=0x7ff99000 [519192.541523] DS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.542356] SS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.543167] ES: sel=0x, attr=0x08093, limit=0x, base=0x [519192.543961] FS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.544747] GS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.545511] GDTR: limit=0x0057, base=0xad01075dbfb0 [519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, base=0x [519192.547052] IDTR: limit=0x, base=0x [519192.547841] TR: sel=0x0040, attr=0x0008b, limit=0x0067, base=0xad01075da000 [519192.548639] EFER = 0x PAT = 0x0007010600070106 [519192.549460] DebugCtl = 0x DebugExceptions = 0x [519192.550302] Interruptibility = 0009 ActivityState = [519192.551137] *** Host State *** [519192.551963] RIP = 0xc150a034 RSP = 0x88cd9cafbc90 [519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040 [519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c TRBase=88d45f2c4000 [519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000 [519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0 [519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0 [519192.557058] EFER = 0x0d01 PAT = 0x0007050600070106 [519192.557913] *** Control State *** [519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb [519192.559605] EntryControls=d1ff ExitControls=002fefff [519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch= [519192.561306] VMEntry: intr_info= errcode=0006 ilen= [519192.562158] VMExit: intr_info= errcode= ilen=0001 [519192.563006] reason=8021 qualification= [519192.563860] IDTVectoring:
[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?
On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users wrote: > Hi Strahil, yes I can’t find anything recently either. You digged way > further then me, I found some regressions on the kernel but I don’t know if > it’s related or not: > > https://patchwork.kernel.org/patch/5526561/ > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 > > Regarding the OS, nothing new was installed, just regular Windows Updates. > And finally about nested virtualisation, it’s disabled on hypervisor. > > > In your original post you wrote about the VM going suspended. So I think there could be something useful in engine.log on the engine and/or vdsm.log on the hypervisor. Could you check those? Also, do you see anything in event viewer of the WIndows VM and/or in Freenas logs? Gianluca ___ Users mailing list -- users@ovirt.org To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: https://www.ovirt.org/privacy-policy.html oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/users@ovirt.org/message/X52ZUYHMIVBVFYWOSQDTTV75YYCHDC5L/
[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?
Hi Strahil, yes I can’t find anything recently either. You digged way further then me, I found some regressions on the kernel but I don’t know if it’s related or not: https://patchwork.kernel.org/patch/5526561/ https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027 Regarding the OS, nothing new was installed, just regular Windows Updates. And finally about nested virtualisation, it’s disabled on hypervisor. One thing that caught my attention on the link you’ve sent is regarding a rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443 But come on, it’s from 2006… Well, I’m up to other ideas, VM just crashed once again: EAX= EBX=075c5180 ECX=75432002 EDX=000400b6 ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770 EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0 ES = 00809300 CS =9900 7ff99000 00809300 SS = 00809300 DS = 00809300 FS = 00809300 GS = 00809300 LDT= 000f TR =0040 075da000 0067 8b00 GDT= 075dbfb0 0057 IDT= CR0=00050032 CR2=242cb25a CR3=001ad002 CR4= DR0= DR1= DR2= DR3= DR6=4ff0 DR7=0400 EFER= Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [519192.536247] *** Guest State *** [519192.536275] CR0: actual=0x00050032, shadow=0x00050032, gh_mask=fff7 [519192.536324] CR4: actual=0x2050, shadow=0x, gh_mask=f871 [519192.537322] CR3 = 0x001ad002 [519192.538166] RSP = 0xfb047db5d770 RIP = 0x8000 [519192.539017] RFLAGS=0x0002 DR7 = 0x0400 [519192.539861] Sysenter RSP= CS:RIP=: [519192.540690] CS: sel=0x9900, attr=0x08093, limit=0x, base=0x7ff99000 [519192.541523] DS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.542356] SS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.543167] ES: sel=0x, attr=0x08093, limit=0x, base=0x [519192.543961] FS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.544747] GS: sel=0x, attr=0x08093, limit=0x, base=0x [519192.545511] GDTR: limit=0x0057, base=0xad01075dbfb0 [519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, base=0x [519192.547052] IDTR: limit=0x, base=0x [519192.547841] TR: sel=0x0040, attr=0x0008b, limit=0x0067, base=0xad01075da000 [519192.548639] EFER = 0x PAT = 0x0007010600070106 [519192.549460] DebugCtl = 0x DebugExceptions = 0x [519192.550302] Interruptibility = 0009 ActivityState = [519192.551137] *** Host State *** [519192.551963] RIP = 0xc150a034 RSP = 0x88cd9cafbc90 [519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040 [519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c TRBase=88d45f2c4000 [519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000 [519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0 [519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0 [519192.557058] EFER = 0x0d01 PAT = 0x0007050600070106 [519192.557913] *** Control State *** [519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb [519192.559605] EntryControls=d1ff ExitControls=002fefff [519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch= [519192.561306] VMEntry: intr_info= errcode=0006 ilen= [519192.562158] VMExit: intr_info= errcode= ilen=0001 [519192.563006] reason=8021 qualification= [519192.563860] IDTVectoring: info= errcode= [519192.564695] TSC Offset = 0xfffcc6c7d53f16d7 [519192.565526] TPR Threshold = 0x00 [519192.566345] EPT pointer = 0x000b9397901e [519192.567162] PLE Gap=0080 Window=1000 [519192.567984] Virtual processor ID = 0x0005 Thank you! On 22 Sep 2020, at 02:30, Strahil Nikolov mailto:hunter86...@yahoo.com>> wrote: Interesting is that I don't find anything recent , but this one: https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653 Can you check if anything in the OS was updated/changed recently ? Also check if the VM is with nested virtualization enabled. Best Regards, Strahil Nikolov В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão написа: Strahil, thank you man. We finally got some output:
[ovirt-users] Re: hosted engine migration
So, let's summarize: - Cannot migrate the HE due to "CPU policy". - HE's CPU is westmere - just like hosts - You have enough resources on the second HE host (both CPU + MEMORY) What is the Cluster's CPU type (you can check in UI) ? Maybe you should enable debugging on various locations to identify the issue. Anything interesting in the libvirt's log for the HostedEngine.xml on the destination host ? Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 05:37:18 Гринуич+3, ddqlo написа: Yes. I can. The host which does not host the HE could be reinstalled sucessfully in web UI. After this is done nothing has changed. 在 2020-09-22 03:08:18,"Strahil Nikolov" 写道: >Can you put 1 host in maintenance and use the "Installation" -> "Reinstall" >and enable the HE deployment from one of the tabs ? > >Best Regards, >Strahil Nikolov > > > > > > >В понеделник, 21 септември 2020 г., 06:38:06 Гринуич+3, ddqlo >написа: > > > > > >so strange! After I set global maintenance, powered off and started H The cpu >of HE became 'Westmere'(did not change anything). But HE still could not be >migrated. > >HE xml: > > Westmere > > > > > > > > > > > >host capabilities: >Westmere > >cluster cpu type (UI): > > >host cpu type (UI): > > >HE cpu type (UI): > > > > > > > >在 2020-09-19 13:27:35,"Strahil Nikolov" 写道: >>Hm... interesting. >> >>The VM is using 'Haswell-noTSX' while the host is 'Westmere'. >> >>In my case I got no difference: >> >>[root@ovirt1 ~]# virsh dumpxml HostedEngine | grep Opteron >> Opteron_G5 >>[root@ovirt1 ~]# virsh capabilities | grep Opteron >> Opteron_G5 >> >>Did you update the cluster holding the Hosted Engine ? >> >> >>I guess you can try to: >> >>- Set global maintenance >>- Power off the HostedEngine VM >>- virsh dumpxml HostedEngine > /root/HE.xml >>- use virsh edit to change the cpu of the HE (non-permanent) change >>- try to power on the modified HE >> >>If it powers on , you can try to migrate it and if it succeeds - then you >>should make it permanent. >> >> >> >> >> >>Best Regards, >>Strahil Nikolov >> >>В петък, 18 септември 2020 г., 04:40:39 Гринуич+3, ddqlo >>написа: >> >> >> >> >> >>HE: >> >> >> HostedEngine >> b4e805ff-556d-42bd-a6df-02f5902fd01c >> http://ovirt.org/vm/tune/1.0; >>xmlns:ovirt-vm="http://ovirt.org/vm/1.0;> >> >> http://ovirt.org/vm/1.0;> >> 4.3 >> False >> false >> 1024 >> >type="int">1024 >> auto_resume >> 1600307555.19 >> >> external >> >> 4 >> >> >> >> ovirtmgmt >> >> 4 >> >> >> >> >>c17c1934-332f-464c-8f89-ad72463c00b3 >> /dev/vda2 >> >>8eca143a-4535-4421-bd35-9f5764d67d70 >> >>---- >> exclusive >> >>ae961104-c3b3-4a43-9f46-7fa6bdc2ac33 >> >> 1 >> >> >> >> >>c17c1934-332f-464c-8f89-ad72463c00b3 >> >>8eca143a-4535-4421-bd35-9f5764d67d70 >> >type="int">108003328 >> >>/dev/c17c1934-332f-464c-8f89-ad72463c00b3/leases >> >>/rhev/data-center/mnt/blockSD/c17c1934-332f-464c-8f89-ad72463c00b3/images/8eca143a-4535-4421-bd35-9f5764d67d70/ae961104-c3b3-4a43-9f46-7fa6bdc2ac33 >> >>ae961104-c3b3-4a43-9f46-7fa6bdc2ac33 >> >> >> >> >> >> >> 67108864 >> 16777216 >> 16777216 >> 64 >> 1 >> >> /machine >> >> >> >> oVirt >> oVirt Node >> 7-5.1804.el7.centos >> ----0CC47A6B3160 >> b4e805ff-556d-42bd-a6df-02f5902fd01c >> >> >> >> hvm >> >> >> >> >> >> >> >> >> Haswell-noTSX >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> destroy >> destroy >> destroy >> >> >> >> >> >> /usr/libexec/qemu-kvm >> >> >> >> >> >> >> >> >> >> >io='native' iothread='1'/> >> >dev='/var/run/vdsm/storage/c17c1934-332f-464c-8f89-ad72463c00b3/8eca143a-4535-4421-bd35-9f5764d67d70/ae961104-c3b3-4a43-9f46-7fa6bdc2ac33'> >> >> >> >> >> 8eca143a-4535-4421-bd35-9f5764d67d70 >> >> >function='0x0'/> >> >> >> >> >> >function='0x0'/> >> >> >> >> >function='0x1'/> >> >> >> >> >function='0x0'/> >> >> >> >> >function='0x2'/> >> >> >> >> >> >> c17c1934-332f-464c-8f89-ad72463c00b3 >> ae961104-c3b3-4a43-9f46-7fa6bdc2ac33 >> >offset='108003328'/> >> >> >> >> >> >> >> >> >> >> >> >> >function='0x0'/> >> >> >> >> >> >> >> >> >> >> >> >>
[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active
Have you restarted glusterd.service on the affected node. glusterd is just management layer and it won't affect the brick processes. Best Regards, Strahil Nikolov В вторник, 22 септември 2020 г., 01:43:36 Гринуич+3, Jeremey Wise написа: Start is not an option. It notes two bricks. but command line denotes three bricks and all present [root@odin thorst.penguinpages.local:_vmstore]# gluster volume status data Status of volume: data Gluster process TCP Port RDMA Port Online Pid -- Brick thorst.penguinpages.local:/gluster_br icks/data/data 49152 0 Y 33123 Brick odinst.penguinpages.local:/gluster_br icks/data/data 49152 0 Y 2970 Brick medusast.penguinpages.local:/gluster_ bricks/data/data 49152 0 Y 2646 Self-heal Daemon on localhost N/A N/A Y 3004 Self-heal Daemon on thorst.penguinpages.loc al N/A N/A Y 33230 Self-heal Daemon on medusast.penguinpages.l ocal N/A N/A Y 2475 Task Status of Volume data -- There are no active volume tasks [root@odin thorst.penguinpages.local:_vmstore]# gluster peer status Number of Peers: 2 Hostname: thorst.penguinpages.local Uuid: 7726b514-e7c3-4705-bbc9-5a90c8a966c9 State: Peer in Cluster (Connected) Hostname: medusast.penguinpages.local Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031 State: Peer in Cluster (Connected) [root@odin thorst.penguinpages.local:_vmstore]# On Mon, Sep 21, 2020 at 4:32 PM Strahil Nikolov wrote: > Just select the volume and press "start" . It will automatically mark "force > start" and will fix itself. > > Best Regards, > Strahil Nikolov > > > > > > > В понеделник, 21 септември 2020 г., 20:53:15 Гринуич+3, Jeremey Wise > написа: > > > > > > > oVirt engine shows one of the gluster servers having an issue. I did a > graceful shutdown of all three nodes over weekend as I have to move around > some power connections in prep for UPS. > > Came back up.. but > > > > And this is reflected in 2 bricks online (should be three for each volume) > > > Command line shows gluster should be happy. > > [root@thor engine]# gluster peer status > Number of Peers: 2 > > Hostname: odinst.penguinpages.local > Uuid: 83c772aa-33cd-430f-9614-30a99534d10e > State: Peer in Cluster (Connected) > > Hostname: medusast.penguinpages.local > Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031 > State: Peer in Cluster (Connected) > [root@thor engine]# > > # All bricks showing online > [root@thor engine]# gluster volume status > Status of volume: data > Gluster process TCP Port RDMA Port Online Pid > -- > Brick thorst.penguinpages.local:/gluster_br > icks/data/data 49152 0 Y 11001 > Brick odinst.penguinpages.local:/gluster_br > icks/data/data 49152 0 Y 2970 > Brick medusast.penguinpages.local:/gluster_ > bricks/data/data 49152 0 Y 2646 > Self-heal Daemon on localhost N/A N/A Y 50560 > Self-heal Daemon on odinst.penguinpages.loc > al N/A N/A Y 3004 > Self-heal Daemon on medusast.penguinpages.l > ocal N/A N/A Y 2475 > > Task Status of Volume data > -- > There are no active volume tasks > > Status of volume: engine > Gluster process TCP Port RDMA Port Online Pid > -- > Brick thorst.penguinpages.local:/gluster_br > icks/engine/engine 49153 0 Y 11012 > Brick odinst.penguinpages.local:/gluster_br > icks/engine/engine 49153 0 Y 2982 > Brick medusast.penguinpages.local:/gluster_ > bricks/engine/engine 49153 0 Y 2657 > Self-heal Daemon on localhost N/A N/A Y 50560 > Self-heal Daemon on odinst.penguinpages.loc > al N/A N/A Y 3004 > Self-heal Daemon on medusast.penguinpages.l > ocal N/A N/A Y 2475 > > Task Status of Volume engine > -- > There are no active