[ovirt-users] Re: Node upgrade to 4.4

2020-09-22 Thread Ritesh Chikatwar
Vincent,


This document will be useful
https://www.ovirt.org/documentation/upgrade_guide/#Upgrading_the_Manager_to_4-4_4-3_SHE

On Wed, Sep 23, 2020, 3:55 AM Vincent Royer  wrote:

> I have 3 nodes running node ng 4.3.9 with a gluster/hci cluster.  How do I
> upgrade to 4.4?  Is there a guide?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/TCX2RUE5RN7RNB45UWBXZ4SKH6KT7ZFC/
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ONMS74F4DSDLNLM2PSGIBARYOBOUCQOZ/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread penguin pages


eMail client with this forum is a bit .. I was told this web interface 
I could post images... as embedded ones in email get scraped out...  but not 
seeing how that is done. Seems to be txt only.



1) ..."I would give the engine a 'Windows'-style fix (a.k.a. reboot)"  how 
does one restart just the oVirt-engine?

2) I now show in shell  3 nodes, each with the one brick for data, vmstore, 
engine (and an ISO one I am trying to make).. with one brick each and all 
online and replicating.   But the GUI shows thor (first server running engine) 
offline needing to be reloaded.  Now volumes show two bricks.. one online one 
offline.  And no option to start / force restart.

3) I have tried several times to try a graceful reboot to see if startup 
sequence was issue.   I tore down VLANs and bridges to make it flat 1 x 1Gb 
mgmt, 1 x 10Gb storage.   SSH between nodes is fine... copy test was great.   I 
don't think it is nodes.

4) To the question of "did I add third node later."  I would attach deployment 
guide I am building ... but can't do that in this forum.  but this is as simple 
as I can make it.  3 intel generic servers,  1 x boot drive , 1 x 512GB SSD,  2 
x 1TB SSD in each.   wipe all data all configuration fresh Centos8 minimal 
install.. setup SSH setup basic networking... install cockpit.. run HCI wizard 
for all three nodes. That is all.

Trying to learn and support concept of oVirt as a viable platform but still 
trying to work through learning how to root cause, kick tires, and debug / 
recover when things go down .. as they will.

Help is appreciated.  The main concern I have is gap in what engine sees and 
what CLI shows.  Can someone show me where to get logs?  the GUI log  when I 
try to "activate" thor server "Status of host thor was set to NonOperational."  
"Gluster command [] failed on server ."   is very unhelpful.

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LKD7LJMC4X3LG5SEZ2M64YN5UKX36RAS/


[ovirt-users] Node upgrade to 4.4

2020-09-22 Thread Vincent Royer
I have 3 nodes running node ng 4.3.9 with a gluster/hci cluster.  How do I
upgrade to 4.4?  Is there a guide?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TCX2RUE5RN7RNB45UWBXZ4SKH6KT7ZFC/


[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Nir Soffer
On Tue, Sep 22, 2020 at 11:23 PM Strahil Nikolov  wrote:
>
> In my setup , I got no filter at all (yet, I'm on 4.3.10):
> [root@ovirt ~]# lvmconfig | grep -i filter

We create lvm filter automatically since 4.4.1. If you don't use block storage
(FC, iSCSI) you don't need lvm filter. If you do, you can create it manually
using vdsm-tool.

> [root@ovirt ~]#
>
> P.S.: Don't forget to 'dracut -f' due to the fact that the initramfs has a 
> local copy of the lvm.conf

Good point

>
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
> В вторник, 22 септември 2020 г., 23:05:29 Гринуич+3, Jeremey Wise 
>  написа:
>
>
>
>
>
>
>
> Correct..  on wwid
>
>
> I do want to make clear here.  that to geta around the error you must ADD  
> (not remove ) drives to /etc/lvm/lvm.conf  so oVirt Gluster can complete 
> setup of drives.
>
> [root@thor log]# cat /etc/lvm/lvm.conf |grep filter
> # Broken for gluster in oVirt
> #filter = 
> ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", 
> "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", 
> "r|.*|"]
> # working for gluster wizard in oVirt
> filter = 
> ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", 
> "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", 
> "a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"]
>
>
>
> On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov  wrote:
> > Obtaining the wwid is not exactly correct.
> > You can identify them via:
> >
> > multipath -v4 | grep 'got wwid of'
> >
> > Short example:
> > [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of'
> > Sep 22 22:55:58 | nvme0n1: got wwid of 
> > 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001'
> > Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S'
> > Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7'
> > Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189'
> > Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133'
> >
> > Of course if you are planing to use only gluster it could be far easier to 
> > set:
> >
> > [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf
> > blacklist {
> > devnode "*"
> > }
> >
> >
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer 
> >  написа:
> >
> >
> >
> >
> >
> > On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise  wrote:
> >>
> >>
> >> Agree about an NVMe Card being put under mpath control.
> >
> > NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
> > https://bugzilla.redhat.com/1498546
> >
> > Of course when the NVMe device is local there is no point to use it
> > via multipath.
> > To avoid this, you need to blacklist the devices like this:
> >
> > 1. Find the device wwid
> >
> > For NVMe, you need the device ID_WWN:
> >
> > $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
> > ID_WWN=eui.5cd2e42a81a11f69
> >
> > 2. Add local blacklist file:
> >
> > $ mkdir /etc/multipath/conf.d
> > $ cat /etc/multipath/conf.d/local.conf
> > blacklist {
> > wwid "eui.5cd2e42a81a11f69"
> > }
> >
> > 3. Reconfigure multipath
> >
> > $ multipathd reconfigure
> >
> > Gluster should do this for you automatically during installation, but
> > it does not
> > you can do this manually.
> >
> >> I have not even gotten to that volume / issue.  My guess is something 
> >> weird in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block 
> >> devices.
> >>
> >> I will post once I cross bridge of getting standard SSD volumes working
> >>
> >> On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov  
> >> wrote:
> >>>
> >>> Why is your NVME under multipath ? That doesn't make sense at all .
> >>> I have modified my multipath.conf to block all local disks . Also ,don't 
> >>> forget the '# VDSM PRIVATE' line somewhere in the top of the file.
> >>>
> >>> Best Regards,
> >>> Strahil Nikolov
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise 
> >>>  написа:
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> vdo: ERROR - Device /dev/sdc excluded by a filter
> >>>
> >>>
> >>>
> >>>
> >>> Other server
> >>> vdo: ERROR - Device 
> >>> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
> >>>  excluded by a filter.
> >>>
> >>>
> >>> All systems when I go to create VDO volume on blank drives.. I get this 
> >>> filter error.  All disk outside of the HCI wizard setup are now blocked 
> >>> from creating new Gluster volume group.
> >>>
> >>> Here is what I see in /dev/lvm/lvm.conf |grep filter
> >>> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
> >>> filter = 
> >>> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|",
> >>>  
> >>> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|",
> >>>  "r|.*|"]
> >>>
> >>> 

[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Strahil Nikolov via Users
In my setup , I got no filter at all (yet, I'm on 4.3.10):
[root@ovirt ~]# lvmconfig | grep -i filter
[root@ovirt ~]#

P.S.: Don't forget to 'dracut -f' due to the fact that the initramfs has a 
local copy of the lvm.conf 


Best Regards,
Strahil Nikolov




В вторник, 22 септември 2020 г., 23:05:29 Гринуич+3, Jeremey Wise 
 написа: 







Correct..  on wwid  

 
I do want to make clear here.  that to geta around the error you must ADD  (not 
remove ) drives to /etc/lvm/lvm.conf  so oVirt Gluster can complete setup of 
drives.

[root@thor log]# cat /etc/lvm/lvm.conf |grep filter
# Broken for gluster in oVirt
#filter = 
["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", 
"a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", 
"r|.*|"]
# working for gluster wizard in oVirt
filter = 
["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", 
"a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", 
"a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"]



On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov  wrote:
> Obtaining the wwid is not exactly correct.
> You can identify them via:
> 
> multipath -v4 | grep 'got wwid of'
> 
> Short example: 
> [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of'
> Sep 22 22:55:58 | nvme0n1: got wwid of 
> 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001'
> Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S'
> Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7'
> Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189'
> Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133'
> 
> Of course if you are planing to use only gluster it could be far easier to 
> set:
> 
> [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf 
> blacklist {
>         devnode "*"
> }
> 
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer 
>  написа: 
> 
> 
> 
> 
> 
> On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise  wrote:
>>
>>
>> Agree about an NVMe Card being put under mpath control.
> 
> NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
> https://bugzilla.redhat.com/1498546
> 
> Of course when the NVMe device is local there is no point to use it
> via multipath.
> To avoid this, you need to blacklist the devices like this:
> 
> 1. Find the device wwid
> 
> For NVMe, you need the device ID_WWN:
> 
>     $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
>     ID_WWN=eui.5cd2e42a81a11f69
> 
> 2. Add local blacklist file:
> 
>     $ mkdir /etc/multipath/conf.d
>     $ cat /etc/multipath/conf.d/local.conf
>     blacklist {
>         wwid "eui.5cd2e42a81a11f69"
>     }
> 
> 3. Reconfigure multipath
> 
>     $ multipathd reconfigure
> 
> Gluster should do this for you automatically during installation, but
> it does not
> you can do this manually.
> 
>> I have not even gotten to that volume / issue.  My guess is something weird 
>> in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block devices.
>>
>> I will post once I cross bridge of getting standard SSD volumes working
>>
>> On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov  
>> wrote:
>>>
>>> Why is your NVME under multipath ? That doesn't make sense at all .
>>> I have modified my multipath.conf to block all local disks . Also ,don't 
>>> forget the '# VDSM PRIVATE' line somewhere in the top of the file.
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>>
>>>
>>>
>>>
>>>
>>> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise 
>>>  написа:
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> vdo: ERROR - Device /dev/sdc excluded by a filter
>>>
>>>
>>>
>>>
>>> Other server
>>> vdo: ERROR - Device 
>>> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
>>>  excluded by a filter.
>>>
>>>
>>> All systems when I go to create VDO volume on blank drives.. I get this 
>>> filter error.  All disk outside of the HCI wizard setup are now blocked 
>>> from creating new Gluster volume group.
>>>
>>> Here is what I see in /dev/lvm/lvm.conf |grep filter
>>> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
>>> filter = 
>>> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", 
>>> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", 
>>> "r|.*|"]
>>>
>>> [root@odin ~]# ls -al /dev/disk/by-id/
>>> total 0
>>> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 .
>>> drwxr-xr-x. 6 root root  120 Sep 18 14:32 ..
>>> lrwxrwxrwx. 1 root root    9 Sep 18 22:40 
>>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda
>>> lrwxrwxrwx. 1 root root  10 Sep 18 22:40 
>>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1
>>> lrwxrwxrwx. 1 root root  10 Sep 18 22:40 
>>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2
>>> lrwxrwxrwx. 1 root root    9 Sep 18 14:32 
>>> 

[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Nir Soffer
On Tue, Sep 22, 2020 at 11:05 PM Jeremey Wise  wrote:
>
>
>
> Correct..  on wwid
>
>
> I do want to make clear here.  that to geta around the error you must ADD  
> (not remove ) drives to /etc/lvm/lvm.conf  so oVirt Gluster can complete 
> setup of drives.
>
> [root@thor log]# cat /etc/lvm/lvm.conf |grep filter
> # Broken for gluster in oVirt
> #filter = 
> ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", 
> "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", 
> "r|.*|"]
> # working for gluster wizard in oVirt
> filter = 
> ["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|", 
> "a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|", 
> "a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"]

Yes, you need to add the devices gluster is going to use to the
filter. The easiest way
it to remove the filter before you install gluster, and then created
the filter using

vdsm-tool config-lvm-filter

It should add all the devices needed for the mounted logical volumes
automatically.
Please file a bug if it does not do this.

> On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov  wrote:
>>
>> Obtaining the wwid is not exactly correct.
>> You can identify them via:
>>
>> multipath -v4 | grep 'got wwid of'
>>
>> Short example:
>> [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of'
>> Sep 22 22:55:58 | nvme0n1: got wwid of 
>> 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001'
>> Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S'
>> Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7'
>> Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189'
>> Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133'
>>
>> Of course if you are planing to use only gluster it could be far easier to 
>> set:
>>
>> [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf
>> blacklist {
>> devnode "*"
>> }
>>
>>
>>
>> Best Regards,
>> Strahil Nikolov
>>
>> В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer 
>>  написа:
>>
>>
>>
>>
>>
>> On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise  wrote:
>> >
>> >
>> > Agree about an NVMe Card being put under mpath control.
>>
>> NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
>> https://bugzilla.redhat.com/1498546
>>
>> Of course when the NVMe device is local there is no point to use it
>> via multipath.
>> To avoid this, you need to blacklist the devices like this:
>>
>> 1. Find the device wwid
>>
>> For NVMe, you need the device ID_WWN:
>>
>> $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
>> ID_WWN=eui.5cd2e42a81a11f69
>>
>> 2. Add local blacklist file:
>>
>> $ mkdir /etc/multipath/conf.d
>> $ cat /etc/multipath/conf.d/local.conf
>> blacklist {
>> wwid "eui.5cd2e42a81a11f69"
>> }
>>
>> 3. Reconfigure multipath
>>
>> $ multipathd reconfigure
>>
>> Gluster should do this for you automatically during installation, but
>> it does not
>> you can do this manually.
>>
>> > I have not even gotten to that volume / issue.  My guess is something 
>> > weird in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block 
>> > devices.
>> >
>> > I will post once I cross bridge of getting standard SSD volumes working
>> >
>> > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov  
>> > wrote:
>> >>
>> >> Why is your NVME under multipath ? That doesn't make sense at all .
>> >> I have modified my multipath.conf to block all local disks . Also ,don't 
>> >> forget the '# VDSM PRIVATE' line somewhere in the top of the file.
>> >>
>> >> Best Regards,
>> >> Strahil Nikolov
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise 
>> >>  написа:
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> vdo: ERROR - Device /dev/sdc excluded by a filter
>> >>
>> >>
>> >>
>> >>
>> >> Other server
>> >> vdo: ERROR - Device 
>> >> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
>> >>  excluded by a filter.
>> >>
>> >>
>> >> All systems when I go to create VDO volume on blank drives.. I get this 
>> >> filter error.  All disk outside of the HCI wizard setup are now blocked 
>> >> from creating new Gluster volume group.
>> >>
>> >> Here is what I see in /dev/lvm/lvm.conf |grep filter
>> >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
>> >> filter = 
>> >> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|",
>> >>  
>> >> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|",
>> >>  "r|.*|"]
>> >>
>> >> [root@odin ~]# ls -al /dev/disk/by-id/
>> >> total 0
>> >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 .
>> >> drwxr-xr-x. 6 root root  120 Sep 18 14:32 ..
>> >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 
>> >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda
>> >> lrwxrwxrwx. 1 root root  10 

[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Nir Soffer
On Tue, Sep 22, 2020 at 10:57 PM Strahil Nikolov  wrote:
>
> Obtaining the wwid is not exactly correct.

It is correct - for nvme devices, see:
https://github.com/oVirt/vdsm/blob/353e7b1e322aa02d4767b6617ed094be0643b094/lib/vdsm/storage/lvmfilter.py#L300

This matches the way that multipath lookup devices wwids.

> You can identify them via:
>
> multipath -v4 | grep 'got wwid of'
>
> Short example:
> [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of'
> Sep 22 22:55:58 | nvme0n1: got wwid of 
> 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001'
> Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S'
> Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7'
> Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189'
> Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133'

There are 2 issues with this:
- It detects and setup maps for all devices in the system, unwanted when you
  want to blacklist devices
- It depends on debug output that may change, not on public documented API

You can use these commands:

Show devices that multipath does not use yet without setting up maps:

$ sudo multipath -d

Show devices that multipath is already using:

$ sudo multipath -ll

But I'm not sure if these commands work if dm_multipath kernel module
is not loaded or multiapthd is not running.

Getting the device wwid using udevadm works regardless of
multipathd/dm_multipath module.

> Of course if you are planing to use only gluster it could be far easier to 
> set:
>
> [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf
> blacklist {
> devnode "*"
> }
>
>
>
> Best Regards,
> Strahil Nikolov
>
> В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer 
>  написа:
>
>
>
>
>
> On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise  wrote:
> >
> >
> > Agree about an NVMe Card being put under mpath control.
>
> NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
> https://bugzilla.redhat.com/1498546
>
> Of course when the NVMe device is local there is no point to use it
> via multipath.
> To avoid this, you need to blacklist the devices like this:
>
> 1. Find the device wwid
>
> For NVMe, you need the device ID_WWN:
>
> $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
> ID_WWN=eui.5cd2e42a81a11f69
>
> 2. Add local blacklist file:
>
> $ mkdir /etc/multipath/conf.d
> $ cat /etc/multipath/conf.d/local.conf
> blacklist {
> wwid "eui.5cd2e42a81a11f69"
> }
>
> 3. Reconfigure multipath
>
> $ multipathd reconfigure
>
> Gluster should do this for you automatically during installation, but
> it does not
> you can do this manually.
>
> > I have not even gotten to that volume / issue.  My guess is something weird 
> > in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block devices.
> >
> > I will post once I cross bridge of getting standard SSD volumes working
> >
> > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov  
> > wrote:
> >>
> >> Why is your NVME under multipath ? That doesn't make sense at all .
> >> I have modified my multipath.conf to block all local disks . Also ,don't 
> >> forget the '# VDSM PRIVATE' line somewhere in the top of the file.
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >>
> >>
> >>
> >>
> >>
> >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise 
> >>  написа:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> vdo: ERROR - Device /dev/sdc excluded by a filter
> >>
> >>
> >>
> >>
> >> Other server
> >> vdo: ERROR - Device 
> >> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
> >>  excluded by a filter.
> >>
> >>
> >> All systems when I go to create VDO volume on blank drives.. I get this 
> >> filter error.  All disk outside of the HCI wizard setup are now blocked 
> >> from creating new Gluster volume group.
> >>
> >> Here is what I see in /dev/lvm/lvm.conf |grep filter
> >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
> >> filter = 
> >> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|",
> >>  
> >> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", 
> >> "r|.*|"]
> >>
> >> [root@odin ~]# ls -al /dev/disk/by-id/
> >> total 0
> >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 .
> >> drwxr-xr-x. 6 root root  120 Sep 18 14:32 ..
> >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 
> >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda
> >> lrwxrwxrwx. 1 root root  10 Sep 18 22:40 
> >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1
> >> lrwxrwxrwx. 1 root root  10 Sep 18 22:40 
> >> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2
> >> lrwxrwxrwx. 1 root root9 Sep 18 14:32 
> >> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb
> >> lrwxrwxrwx. 1 root root9 Sep 18 22:40 
> >> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc
> >> lrwxrwxrwx. 1 root root  10 

[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Jeremey Wise
Correct..  on wwid


I do want to make clear here.  that to geta around the error you must ADD
(not remove ) drives to /etc/lvm/lvm.conf  so oVirt Gluster can complete
setup of drives.

[root@thor log]# cat /etc/lvm/lvm.conf |grep filter
# Broken for gluster in oVirt
#filter =
["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|",
"a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|",
"r|.*|"]
# working for gluster wizard in oVirt
filter =
["a|^/dev/disk/by-id/lvm-pv-uuid-AAHPao-R62q-8aac-410x-ZdA7-UL4i-Bh2bwJ$|",
"a|^/dev/disk/by-id/lvm-pv-uuid-bSnFU3-jtUj-AGds-07sw-zdYC-52fM-mujuvC$|",
"a|^/dev/disk/by-id/wwn-0x5001b448b847be41$|", "r|.*|"]



On Tue, Sep 22, 2020 at 3:57 PM Strahil Nikolov 
wrote:

> Obtaining the wwid is not exactly correct.
> You can identify them via:
>
> multipath -v4 | grep 'got wwid of'
>
> Short example:
> [root@ovirt conf.d]# multipath -v4 | grep 'got wwid of'
> Sep 22 22:55:58 | nvme0n1: got wwid of
> 'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001'
> Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S'
> Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7'
> Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189'
> Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133'
>
> Of course if you are planing to use only gluster it could be far easier to
> set:
>
> [root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf
> blacklist {
> devnode "*"
> }
>
>
>
> Best Regards,
> Strahil Nikolov
>
> В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer <
> nsof...@redhat.com> написа:
>
>
>
>
>
> On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise 
> wrote:
> >
> >
> > Agree about an NVMe Card being put under mpath control.
>
> NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
> https://bugzilla.redhat.com/1498546
>
> Of course when the NVMe device is local there is no point to use it
> via multipath.
> To avoid this, you need to blacklist the devices like this:
>
> 1. Find the device wwid
>
> For NVMe, you need the device ID_WWN:
>
> $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
> ID_WWN=eui.5cd2e42a81a11f69
>
> 2. Add local blacklist file:
>
> $ mkdir /etc/multipath/conf.d
> $ cat /etc/multipath/conf.d/local.conf
> blacklist {
> wwid "eui.5cd2e42a81a11f69"
> }
>
> 3. Reconfigure multipath
>
> $ multipathd reconfigure
>
> Gluster should do this for you automatically during installation, but
> it does not
> you can do this manually.
>
> > I have not even gotten to that volume / issue.  My guess is something
> weird in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block
> devices.
> >
> > I will post once I cross bridge of getting standard SSD volumes working
> >
> > On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov 
> wrote:
> >>
> >> Why is your NVME under multipath ? That doesn't make sense at all .
> >> I have modified my multipath.conf to block all local disks . Also
> ,don't forget the '# VDSM PRIVATE' line somewhere in the top of the file.
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >>
> >>
> >>
> >>
> >>
> >>
> >> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise <
> jeremey.w...@gmail.com> написа:
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> vdo: ERROR - Device /dev/sdc excluded by a filter
> >>
> >>
> >>
> >>
> >> Other server
> >> vdo: ERROR - Device
> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
> excluded by a filter.
> >>
> >>
> >> All systems when I go to create VDO volume on blank drives.. I get this
> filter error.  All disk outside of the HCI wizard setup are now blocked
> from creating new Gluster volume group.
> >>
> >> Here is what I see in /dev/lvm/lvm.conf |grep filter
> >> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
> >> filter =
> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|",
> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|",
> "r|.*|"]
> >>
> >> [root@odin ~]# ls -al /dev/disk/by-id/
> >> total 0
> >> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 .
> >> drwxr-xr-x. 6 root root  120 Sep 18 14:32 ..
> >> lrwxrwxrwx. 1 root root9 Sep 18 22:40
> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda
> >> lrwxrwxrwx. 1 root root  10 Sep 18 22:40
> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1
> >> lrwxrwxrwx. 1 root root  10 Sep 18 22:40
> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2
> >> lrwxrwxrwx. 1 root root9 Sep 18 14:32
> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb
> >> lrwxrwxrwx. 1 root root9 Sep 18 22:40
> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc
> >> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 dm-name-cl-home -> ../../dm-2
> >> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 dm-name-cl-root -> ../../dm-0
> >> lrwxrwxrwx. 1 root root  10 

[ovirt-users] Re: console breaks with signed SSL certs

2020-09-22 Thread Philip Brown
Hmm. 
that seems to be half the battle.
I updated the filels in /etc/pki/vdsm/libvirt-spice, and the debug output from 
remote-viewer changes.. but its not entirely happy.

(remote-viewer.exe:15808): Spice-WARNING **: 12:55:01.188: 
../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in 
certificate chain verification: unable to get issuer certificate 
(num=2:depth1:/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, 
Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate 
Authority - G2)

(remote-viewer.exe:15808): GSpice-WARNING **: 12:55:01.189: main-1:0: 
SSL_connect: error:0001:lib(0):func(0):reason(1)
(remote-viewer.exe:15808): virt-viewer-DEBUG: 12:55:01.192: Destroy SPICE 
channel SpiceMainChannel 0
(remote-viewer.exe:15808): virt-viewer-DEBUG: 12:55:01.192: zap main channel


I put the cert itself, in server-cert.pem
I put the key in server-key.pem
I put the bundle file from godaddy, which they call "gd_bundle-g2-g1", in 
"ca-cert.pem"

but its still complaining about error in chain?

Ive been updating a whoole bunch of SSL-requiring systems this month, and 
notice that one or two systems like a different order to the multiple-cert-CA 
stack.
Does libvirt-spice require yet another, different stacking?

Can you tell me what needs to be in each, and in what order, please? :-/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HRYDOGPSHT6XDUTRPOCTTS76VAKEMBU2/


[ovirt-users] Re: console breaks with signed SSL certs

2020-09-22 Thread Strahil Nikolov via Users
Most probably there is an option to tell it (I mean oVIrt) the exact keys to be 
used.

Yet, give the engine a gentle push and reboot it - just to be sure you are not 
chasing a ghost.

I'm using self-signed certs and I can't help much in this case.


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 22:54:28 Гринуич+3, Philip Brown 
 написа: 





Thanks for the initial start, Strahil,

my desktop is windows. but I took apart the console.vv file, and these are my 
findings:

in the console.vv file, there is a valid CA cert, which is for the signing CA 
for our valid wildcard SSL cert.

However, when I connected to the target host, on the tls-port, i noted that it 
is still using the original self-signed CA, generated by ovirt-engine for the 
host.
Digging with lsof says that the process is qemu-kvm
Looking at command line, that has
  x509-dir=/etc/pki/vdsm/libvirt-spice

So...


I guess I need to update server.key server.cert and ca-cert in there?

except there's a whoole lot of '*key.pem' files under  the /etc/pki directory 
tree.
Suggestions on which is best to update?
For example, there is also

/etc/pki/vdsm/keys/vdsmkey.pem




- Original Message -
From: "Strahil Nikolov" 
To: "users" , "Philip Brown" 
Sent: Tuesday, September 22, 2020 12:09:55 PM
Subject: Re: [ovirt-users] Re: console breaks with signed SSL certs

I assume you are working on linux (for windows you will need to ssh to a linux 
box or even one ofthe Hosts).

When you download the 'console.vv' file for Spice connection - you will have to 
note several stuff:

- host
- tls-port (not the plain 'port=' !!! )
- ca

Process the CA and replace the '\n' with new lines .

Then you can run:
openssl s_client -connect : -CAfile  
-showcerts

Then you can inspect the certificate chain.
I would then grep for the strings from openssl in the engine.

In my case I find these containing the line with the 'issuer':

/etc/pki/ovirt-engine/certs/websocket-proxy.cer
/etc/pki/ovirt-engine/certs/apache.cer
/etc/pki/ovirt-engine/certs/reports.cer
/etc/pki/ovirt-engine/certs/imageio-proxy.cer
/etc/pki/ovirt-engine/certs/ovn-ndb.cer
/etc/pki/ovirt-engine/certs/ovn-sdb.cer
/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer


Happy Hunting!

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 21:52:10 Гринуич+3, Philip Brown 
 написа: 





More detail on the problem.
after starting remote-viewer  --debug, I get



(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: New spice channel 
0608B240 SpiceMainChannel 0
(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: notebook show 
status 03479130

(remote-viewer.exe:18308): Spice-WARNING **: 11:45:30.691: 
../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in 
certificate chain verification: self signed certificate in certificate chain 
(num=19:depth1:/C=US/O=xx.65101)

(remote-viewer.exe:18308): GSpice-WARNING **: 11:45:30.692: main-1:0: 
SSL_connect: error:0001:lib(0):func(0):reason(1)
(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.693: Destroy SPICE 
channel SpiceMainChannel 0


So it seems like there's some additional thing that needs telling to use the 
official signed cert.
Any clues for me please?


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKSX7CLJ4N7PNCDE5IQ73BIVPAXS7RSF/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/545XR3UZJ3U4H5BKZ4A5PRQEUGWICYQY/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6GMZNLDLTAZKL5B2AJUOE5KQRGWNNML5/


[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Strahil Nikolov via Users
Obtaining the wwid is not exactly correct.
You can identify them via:

multipath -v4 | grep 'got wwid of'

Short example: 
[root@ovirt conf.d]# multipath -v4 | grep 'got wwid of'
Sep 22 22:55:58 | nvme0n1: got wwid of 
'nvme.1cc1-324a31313230303131343036-414441544120535838323030504e50-0001'
Sep 22 22:55:58 | sda: got wwid of 'TOSHIBA-TR200_Z7KB600SK46S'
Sep 22 22:55:58 | sdb: got wwid of 'ST500NM0011_Z1M00LM7'
Sep 22 22:55:58 | sdc: got wwid of 'WDC_WD5003ABYX-01WERA0_WD-WMAYP2303189'
Sep 22 22:55:58 | sdd: got wwid of 'WDC_WD15EADS-00P8B0_WD-WMAVU0115133'

Of course if you are planing to use only gluster it could be far easier to set:

[root@ovirt conf.d]# cat /etc/multipath/conf.d/blacklist.conf 
blacklist {
        devnode "*"
}



Best Regards,
Strahil Nikolov

В вторник, 22 септември 2020 г., 22:12:21 Гринуич+3, Nir Soffer 
 написа: 





On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise  wrote:
>
>
> Agree about an NVMe Card being put under mpath control.

NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
https://bugzilla.redhat.com/1498546

Of course when the NVMe device is local there is no point to use it
via multipath.
To avoid this, you need to blacklist the devices like this:

1. Find the device wwid

For NVMe, you need the device ID_WWN:

    $ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
    ID_WWN=eui.5cd2e42a81a11f69

2. Add local blacklist file:

    $ mkdir /etc/multipath/conf.d
    $ cat /etc/multipath/conf.d/local.conf
    blacklist {
        wwid "eui.5cd2e42a81a11f69"
    }

3. Reconfigure multipath

    $ multipathd reconfigure

Gluster should do this for you automatically during installation, but
it does not
you can do this manually.

> I have not even gotten to that volume / issue.  My guess is something weird 
> in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block devices.
>
> I will post once I cross bridge of getting standard SSD volumes working
>
> On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov  wrote:
>>
>> Why is your NVME under multipath ? That doesn't make sense at all .
>> I have modified my multipath.conf to block all local disks . Also ,don't 
>> forget the '# VDSM PRIVATE' line somewhere in the top of the file.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>>
>>
>>
>>
>> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise 
>>  написа:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> vdo: ERROR - Device /dev/sdc excluded by a filter
>>
>>
>>
>>
>> Other server
>> vdo: ERROR - Device 
>> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
>>  excluded by a filter.
>>
>>
>> All systems when I go to create VDO volume on blank drives.. I get this 
>> filter error.  All disk outside of the HCI wizard setup are now blocked from 
>> creating new Gluster volume group.
>>
>> Here is what I see in /dev/lvm/lvm.conf |grep filter
>> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
>> filter = 
>> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", 
>> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", 
>> "r|.*|"]
>>
>> [root@odin ~]# ls -al /dev/disk/by-id/
>> total 0
>> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 .
>> drwxr-xr-x. 6 root root  120 Sep 18 14:32 ..
>> lrwxrwxrwx. 1 root root    9 Sep 18 22:40 
>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda
>> lrwxrwxrwx. 1 root root  10 Sep 18 22:40 
>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1
>> lrwxrwxrwx. 1 root root  10 Sep 18 22:40 
>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2
>> lrwxrwxrwx. 1 root root    9 Sep 18 14:32 
>> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb
>> lrwxrwxrwx. 1 root root    9 Sep 18 22:40 
>> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc
>> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 dm-name-cl-home -> ../../dm-2
>> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 dm-name-cl-root -> ../../dm-0
>> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 dm-name-cl-swap -> ../../dm-1
>> lrwxrwxrwx. 1 root root  11 Sep 18 16:40 
>> dm-name-gluster_vg_sdb-gluster_lv_data -> ../../dm-11
>> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 
>> dm-name-gluster_vg_sdb-gluster_lv_engine -> ../../dm-6
>> lrwxrwxrwx. 1 root root  11 Sep 18 16:40 
>> dm-name-gluster_vg_sdb-gluster_lv_vmstore -> ../../dm-12
>> lrwxrwxrwx. 1 root root  10 Sep 18 23:35 
>> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001
>>  -> ../../dm-3
>> lrwxrwxrwx. 1 root root  10 Sep 18 23:49 
>> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
>>  -> ../../dm-4
>> lrwxrwxrwx. 1 root root  10 Sep 18 14:32 dm-name-vdo_sdb -> ../../dm-5
>> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 
>> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADc49gc6PWLRBCoJ2B3JC9tDJejyx5eDPT 
>> -> ../../dm-1
>> lrwxrwxrwx. 1 root root  10 Sep 18 16:40 
>> 

[ovirt-users] Re: console breaks with signed SSL certs

2020-09-22 Thread Philip Brown
Thanks for the initial start, Strahil,

my desktop is windows. but I took apart the console.vv file, and these are my 
findings:

in the console.vv file, there is a valid CA cert, which is for the signing CA 
for our valid wildcard SSL cert.

However, when I connected to the target host, on the tls-port, i noted that it 
is still using the original self-signed CA, generated by ovirt-engine for the 
host.
Digging with lsof says that the process is qemu-kvm
Looking at command line, that has
  x509-dir=/etc/pki/vdsm/libvirt-spice

So...


I guess I need to update server.key server.cert and ca-cert in there?

except there's a whoole lot of '*key.pem' files under  the /etc/pki directory 
tree.
Suggestions on which is best to update?
For example, there is also

/etc/pki/vdsm/keys/vdsmkey.pem




- Original Message -
From: "Strahil Nikolov" 
To: "users" , "Philip Brown" 
Sent: Tuesday, September 22, 2020 12:09:55 PM
Subject: Re: [ovirt-users] Re: console breaks with signed SSL certs

I assume you are working on linux (for windows you will need to ssh to a linux 
box or even one ofthe Hosts).

When you download the 'console.vv' file for Spice connection - you will have to 
note several stuff:

- host
- tls-port (not the plain 'port=' !!! )
- ca

Process the CA and replace the '\n' with new lines .

Then you can run:
openssl s_client -connect : -CAfile  
-showcerts

Then you can inspect the certificate chain.
I would then grep for the strings from openssl in the engine.

In my case I find these containing the line with the 'issuer':

/etc/pki/ovirt-engine/certs/websocket-proxy.cer
/etc/pki/ovirt-engine/certs/apache.cer
/etc/pki/ovirt-engine/certs/reports.cer
/etc/pki/ovirt-engine/certs/imageio-proxy.cer
/etc/pki/ovirt-engine/certs/ovn-ndb.cer
/etc/pki/ovirt-engine/certs/ovn-sdb.cer
/etc/pki/ovirt-engine/certs/ovirt-provider-ovn.cer


Happy Hunting!

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 21:52:10 Гринуич+3, Philip Brown 
 написа: 





More detail on the problem.
after starting remote-viewer  --debug, I get



(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: New spice channel 
0608B240 SpiceMainChannel 0
(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: notebook show 
status 03479130

(remote-viewer.exe:18308): Spice-WARNING **: 11:45:30.691: 
../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in 
certificate chain verification: self signed certificate in certificate chain 
(num=19:depth1:/C=US/O=xx.65101)

(remote-viewer.exe:18308): GSpice-WARNING **: 11:45:30.692: main-1:0: 
SSL_connect: error:0001:lib(0):func(0):reason(1)
(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.693: Destroy SPICE 
channel SpiceMainChannel 0


So it seems like there's some additional thing that needs telling to use the 
official signed cert.
Any clues for me please?


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKSX7CLJ4N7PNCDE5IQ73BIVPAXS7RSF/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/545XR3UZJ3U4H5BKZ4A5PRQEUGWICYQY/


[ovirt-users] Re: oVirt - vdo: ERROR - Device /dev/sd excluded by a filter

2020-09-22 Thread Nir Soffer
On Tue, Sep 22, 2020 at 1:50 AM Jeremey Wise  wrote:
>
>
> Agree about an NVMe Card being put under mpath control.

NVMe can be used via multipath, this is a new feature added in RHEL 8.1:
https://bugzilla.redhat.com/1498546

Of course when the NVMe device is local there is no point to use it
via multipath.
To avoid this, you need to blacklist the devices like this:

1. Find the device wwid

For NVMe, you need the device ID_WWN:

$ udevadm info -q property /dev/nvme0n1 | grep ID_WWN
ID_WWN=eui.5cd2e42a81a11f69

2. Add local blacklist file:

$ mkdir /etc/multipath/conf.d
$ cat /etc/multipath/conf.d/local.conf
blacklist {
wwid "eui.5cd2e42a81a11f69"
}

3. Reconfigure multipath

$ multipathd reconfigure

Gluster should do this for you automatically during installation, but
it does not
you can do this manually.

> I have not even gotten to that volume / issue.   My guess is something weird 
> in CentOS / 4.18.0-193.19.1.el8_2.x86_64  kernel with NVMe block devices.
>
> I will post once I cross bridge of getting standard SSD volumes working
>
> On Mon, Sep 21, 2020 at 4:12 PM Strahil Nikolov  wrote:
>>
>> Why is your NVME under multipath ? That doesn't make sense at all .
>> I have modified my multipath.conf to block all local disks . Also ,don't 
>> forget the '# VDSM PRIVATE' line somewhere in the top of the file.
>>
>> Best Regards,
>> Strahil Nikolov
>>
>>
>>
>>
>>
>>
>> В понеделник, 21 септември 2020 г., 09:04:28 Гринуич+3, Jeremey Wise 
>>  написа:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> vdo: ERROR - Device /dev/sdc excluded by a filter
>>
>>
>>
>>
>> Other server
>> vdo: ERROR - Device 
>> /dev/mapper/nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
>>  excluded by a filter.
>>
>>
>> All systems when I go to create VDO volume on blank drives.. I get this 
>> filter error.  All disk outside of the HCI wizard setup are now blocked from 
>> creating new Gluster volume group.
>>
>> Here is what I see in /dev/lvm/lvm.conf |grep filter
>> [root@odin ~]# cat /etc/lvm/lvm.conf |grep filter
>> filter = 
>> ["a|^/dev/disk/by-id/lvm-pv-uuid-e1fvwo-kEfX-v3lT-SKBp-cgze-TwsO-PtyvmC$|", 
>> "a|^/dev/disk/by-id/lvm-pv-uuid-mr9awW-oQH5-F4IX-CbEO-RgJZ-x4jK-e4YZS1$|", 
>> "r|.*|"]
>>
>> [root@odin ~]# ls -al /dev/disk/by-id/
>> total 0
>> drwxr-xr-x. 2 root root 1220 Sep 18 14:32 .
>> drwxr-xr-x. 6 root root  120 Sep 18 14:32 ..
>> lrwxrwxrwx. 1 root root9 Sep 18 22:40 
>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN -> ../../sda
>> lrwxrwxrwx. 1 root root   10 Sep 18 22:40 
>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part1 -> ../../sda1
>> lrwxrwxrwx. 1 root root   10 Sep 18 22:40 
>> ata-INTEL_SSDSC2BB080G4_BTWL40350DXP080KGN-part2 -> ../../sda2
>> lrwxrwxrwx. 1 root root9 Sep 18 14:32 
>> ata-Micron_1100_MTFDDAV512TBN_17401F699137 -> ../../sdb
>> lrwxrwxrwx. 1 root root9 Sep 18 22:40 
>> ata-WDC_WDS100T2B0B-00YS70_183533804564 -> ../../sdc
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 dm-name-cl-home -> ../../dm-2
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 dm-name-cl-root -> ../../dm-0
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 dm-name-cl-swap -> ../../dm-1
>> lrwxrwxrwx. 1 root root   11 Sep 18 16:40 
>> dm-name-gluster_vg_sdb-gluster_lv_data -> ../../dm-11
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 
>> dm-name-gluster_vg_sdb-gluster_lv_engine -> ../../dm-6
>> lrwxrwxrwx. 1 root root   11 Sep 18 16:40 
>> dm-name-gluster_vg_sdb-gluster_lv_vmstore -> ../../dm-12
>> lrwxrwxrwx. 1 root root   10 Sep 18 23:35 
>> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001
>>  -> ../../dm-3
>> lrwxrwxrwx. 1 root root   10 Sep 18 23:49 
>> dm-name-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001p1
>>  -> ../../dm-4
>> lrwxrwxrwx. 1 root root   10 Sep 18 14:32 dm-name-vdo_sdb -> ../../dm-5
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 
>> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADc49gc6PWLRBCoJ2B3JC9tDJejyx5eDPT 
>> -> ../../dm-1
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 
>> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADOMNJfgcat9ZLOpcNO7FyG8ixcl5s93TU 
>> -> ../../dm-2
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 
>> dm-uuid-LVM-GpvYIuypEfrR7nEDn5uHPenKwjrsn4ADzqPGk0yTQ19FIqgoAfsCxWg7cDMtl71r 
>> -> ../../dm-0
>> lrwxrwxrwx. 1 root root   10 Sep 18 16:40 
>> dm-uuid-LVM-ikNfztYY7KGT1SI2WYXPz4DhM2cyTelOq6Om5comvRFWJDbtVZAKtE5YGl4jciP9 
>> -> ../../dm-6
>> lrwxrwxrwx. 1 root root   11 Sep 18 16:40 
>> dm-uuid-LVM-ikNfztYY7KGT1SI2WYXPz4DhM2cyTelOqVheASEgerWSEIkjM1BR3us3D9ekHt0L 
>> -> ../../dm-11
>> lrwxrwxrwx. 1 root root   11 Sep 18 16:40 
>> dm-uuid-LVM-ikNfztYY7KGT1SI2WYXPz4DhM2cyTelOQz6vXuivIfup6cquKAjPof8wIGOSe4Vz 
>> -> ../../dm-12
>> lrwxrwxrwx. 1 root root   10 Sep 18 23:35 
>> dm-uuid-mpath-nvme.126f-4141303030303030303030303030303032343538-53504343204d2e32205043496520535344-0001
>>  -> ../../dm-3
>> lrwxrwxrwx. 1 root root   10 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Ovirt uses the "/rhev/mnt... mountpoints.

Do you have those (for each storage domain ) ?

Here is an example from one of my nodes:
[root@ovirt1 ~]# df -hT | grep rhev
gluster1:/engine                              fuse.glusterfs  100G   19G   82G  
19% /rhev/data-center/mnt/glusterSD/gluster1:_engine
gluster1:/fast4                               fuse.glusterfs  100G   53G   48G  
53% /rhev/data-center/mnt/glusterSD/gluster1:_fast4
gluster1:/fast1                               fuse.glusterfs  100G   56G   45G  
56% /rhev/data-center/mnt/glusterSD/gluster1:_fast1
gluster1:/fast2                               fuse.glusterfs  100G   56G   45G  
56% /rhev/data-center/mnt/glusterSD/gluster1:_fast2
gluster1:/fast3                               fuse.glusterfs  100G   55G   46G  
55% /rhev/data-center/mnt/glusterSD/gluster1:_fast3
gluster1:/data                                fuse.glusterfs  2.4T  535G  1.9T  
23% /rhev/data-center/mnt/glusterSD/gluster1:_data



Best Regards,
Strahil Nikolov


В вторник, 22 септември 2020 г., 19:44:54 Гринуич+3, Jeremey Wise 
 написа: 






Yes.

And at one time it was fine.   I did a graceful shutdown.. and after booting it 
always seems to now have issue with the one server... of course the one hosting 
the ovirt-engine :P

# Three nodes in cluster

# Error when you hover over node


# when i select node and choose "activate"



#Gluster is working fine... this is oVirt who is confused.
[root@medusa vmstore]# mount |grep media/vmstore
medusast.penguinpages.local:/vmstore on /media/vmstore type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072,_netdev)
[root@medusa vmstore]# echo > /media/vmstore/test.out
[root@medusa vmstore]# ssh -f thor 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# ssh -f odin 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# ssh -f medusa 'echo $HOSTNAME >> /media/vmstore/test.out'
[root@medusa vmstore]# cat /media/vmstore/test.out

thor.penguinpages.local
odin.penguinpages.local
medusa.penguinpages.local


Ideas to fix oVirt?



On Tue, Sep 22, 2020 at 10:42 AM Strahil Nikolov  wrote:
> By the way, did you add the third host in the oVirt ?
> 
> If not , maybe that is the real problem :)
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> Its like oVirt thinks there are only two nodes in gluster replication
> 
> 
> 
> 
> 
> # Yet it is clear the CLI shows three bricks.
> [root@medusa vms]# gluster volume status vmstore
> Status of volume: vmstore
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/vmstore/vmstore                        49154     0          Y       9444
> Brick odinst.penguinpages.local:/gluster_br
> icks/vmstore/vmstore                        49154     0          Y       3269
> Brick medusast.penguinpages.local:/gluster_
> bricks/vmstore/vmstore                      49154     0          Y       7841
> Self-heal Daemon on localhost               N/A       N/A        Y       80152
> Self-heal Daemon on odinst.penguinpages.loc
> al                                          N/A       N/A        Y       
> 141750
> Self-heal Daemon on thorst.penguinpages.loc
> al                                          N/A       N/A        Y       
> 245870
> 
> Task Status of Volume vmstore
> --
> There are no active volume tasks
> 
> 
> 
> How do I get oVirt to re-establish reality to what Gluster sees?
> 
> 
> 
> On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov  wrote:
>> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 
>> bricks up , but usually it was an UI issue and you go to UI and mark a 
>> "force start" which will try to start any bricks that were down (won't 
>> affect gluster) and will wake up the UI task to verify again brick status.
>> 
>> 
>> https://github.com/gluster/gstatus is a good one to verify your cluster 
>> health , yet human's touch is priceless in any kind of technology.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> when I posted last..  in the tread I paste a roling restart.    And...  now 
>> it is replicating.
>> 
>> oVirt still showing wrong.  BUT..   I did my normal test from each of the 
>> three nodes.
>> 
>> 1) Mount Gluster file system with localhost as primary and other two as 
>> tertiary to local mount (like a client would do)
>> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
>> 3) repeat from each node then read back that all are in sync.
>> 
>> I REALLY hate reboot (restart) as a fix.  I need to get better with root 
>> 

[ovirt-users] Re: console breaks with signed SSL certs

2020-09-22 Thread Philip Brown
More detail on the problem.
after starting remote-viewer  --debug, I get



(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: New spice channel 
0608B240 SpiceMainChannel 0
(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.594: notebook show 
status 03479130

(remote-viewer.exe:18308): Spice-WARNING **: 11:45:30.691: 
../subprojects/spice-common/common/ssl_verify.c:444:openssl_verify: Error in 
certificate chain verification: self signed certificate in certificate chain 
(num=19:depth1:/C=US/O=xx.65101)

(remote-viewer.exe:18308): GSpice-WARNING **: 11:45:30.692: main-1:0: 
SSL_connect: error:0001:lib(0):func(0):reason(1)
(remote-viewer.exe:18308): virt-viewer-DEBUG: 11:45:30.693: Destroy SPICE 
channel SpiceMainChannel 0


So it seems like there's some additional thing that needs telling to use the 
official signed cert.
Any clues for me please?

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VKSX7CLJ4N7PNCDE5IQ73BIVPAXS7RSF/


[ovirt-users] Re: oVirt - KVM QCow2 Import

2020-09-22 Thread Nir Soffer
On Tue, Sep 22, 2020 at 4:18 AM Jeremey Wise  wrote:
>
>
> Well.. to know how to do it with Curl is helpful.. but I think I did
>
> [root@odin ~]#  curl -s -k --user admin@internal:blahblah 
> https://ovirte01.penguinpages.local/ovirt-engine/api/storagedomains/ |grep 
> ''
> data
> hosted_storage
> ovirt-image-repository
>
> What I guess I did is translated that field --sd-name my-storage-domain \
> to " volume" name... My question is .. where do those fields come from?  And 
> which would you typically place all your VMs into?
>
>
>
>
> I just took a guess..  and figured "data" sounded like a good place to stick 
> raw images to build into VM...
>
> [root@medusa thorst.penguinpages.local:_vmstore]# python3 
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py --engine-url 
> https://ovirte01.penguinpages.local/ --username admin@internal 
> --password-file 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirt.password
>  --cafile 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirte01_pki-resource.cer
>  --sd-name data --disk-sparse 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/ns02.qcow2
> Checking image...
> Image format: qcow2
> Disk format: cow
> Disk content type: data
> Disk provisioned size: 21474836480
> Disk initial size: 11574706176
> Disk name: ns02.qcow2
> Disk backup: False
> Connecting...
> Creating disk...
> Disk ID: 9ccb26cf-dd4a-4c9a-830c-ee084074d7a1
> Creating image transfer...
> Transfer ID: 3a382f0b-1e7d-4397-ab16-4def0e9fe890
> Transfer host name: medusa
> Uploading image...
> [ 100.00% ] 20.00 GiB, 249.86 seconds, 81.97 MiB/s
> Finalizing image transfer...
> Upload completed successfully
> [root@medusa thorst.penguinpages.local:_vmstore]# python3 
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py --engine-url 
> https://ovirte01.penguinpages.local/ --username admin@internal 
> --password-file 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirt.password
>  --cafile 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirte01_pki-resource.cer
>  --sd-name data --disk-sparse 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/ns02_v^C
> [root@medusa thorst.penguinpages.local:_vmstore]# ls
> example.log  f118dcae-6162-4e9a-89e4-f30ffcfb9ccf  ns02_20200910.tgz  
> ns02.qcow2  ns02_var.qcow2
> [root@medusa thorst.penguinpages.local:_vmstore]# python3 
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py --engine-url 
> https://ovirte01.penguinpages.local/ --username admin@internal 
> --password-file 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirt.password
>  --cafile 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/.ovirte01_pki-resource.cer
>  --sd-name data --disk-sparse 
> /rhev/data-center/mnt/glusterSD/thorst.penguinpages.local:_vmstore/ns02_var.qcow2
> Checking image...
> Image format: qcow2
> Disk format: cow
> Disk content type: data
> Disk provisioned size: 107374182400
> Disk initial size: 107390828544
> Disk name: ns02_var.qcow2
> Disk backup: False
> Connecting...
> Creating disk...
> Disk ID: 26def4e7-1153-417c-88c1-fd3dfe2b0fb9
> Creating image transfer...
> Transfer ID: 41518eac-8881-453e-acc0-45391fd23bc7
> Transfer host name: medusa
> Uploading image...
> [  16.50% ] 16.50 GiB, 556.42 seconds, 30.37 MiB/s
>
> Now with those ID numbers and that it kept its name (very helpful)... I am 
> able to re-constitute the VM
>
>
> VM boots fine.  Fixing VLANs and manual macs on vNICs.. but this process 
> worked fine.
>
> Thanks for input.   Would be nice to have a GUI "upload" via http into system 
> :)

We have upload via GUI, but from your mail I understood the images are on
the hypervisor, so copying them to the machine running the browser would be
wasted of time.

Go to Storage > Disks and click "Upload" or "Download".

But this is less efficient, less correct, and not supporting all the
features like converting
image format and controlling sparseness.

For uploading and downloading qcow2 images it should be fine, but if
you have a qcow2
and want to upload to raw format this can be done only using the API,
for example with
upload_disk.py.

> On Mon, Sep 21, 2020 at 2:19 PM Nir Soffer  wrote:
>>
>> On Mon, Sep 21, 2020 at 8:37 PM penguin pages  wrote:
>> >
>> >
>> > I pasted old / file path not right example above.. But here is a cleaner 
>> > version with error i am trying to root cause
>> >
>> > [root@odin vmstore]# python3 
>> > /usr/share/doc/python3-ovirt-engine-sdk4/examples/upload_disk.py 
>> > --engine-url https://ovirte01.penguinpages.local/ --username 
>> > admin@internal --password-file 
>> > /gluster_bricks/vmstore/vmstore/.ovirt.password --cafile 
>> > /gluster_bricks/vmstore/vmstore/.ovirte01_pki-resource.cer --sd-name 
>> > vmstore --disk-sparse /gluster_bricks/vmstore/vmstore/ns01.qcow2
>> > Checking image...
>> 

[ovirt-users] console breaks with signed SSL certs

2020-09-22 Thread Philip Brown
Chrome didnt want to talk AT ALL to ovirt with self-signed certs (Because  HSTS 
is enabled)

So I installed signed wildcard certs to the engine, and the nodes, following

http://187.1.81.65/ovirt-engine/docs/manual/en-US/html/Administration_Guide/appe-Red_Hat_Enterprise_Virtualization_and_SSL.html
and
https://cockpit-project.org/guide/172/https.html

and chrome is happy now... except that suddenly, consoles refuse to work. and 
there are no useful errors that I see, other than

"Unable to connect to the graphic server"

from the remote viewer app.

I see someone not too long ago had the exact same problem, in
https://www.mail-archive.com/users@ovirt.org/msg58814.html

but.. no answer was given to him?

Help please



--
Philip Brown| Sr. Linux System Administrator | Medata, Inc. 
5 Peters Canyon Rd Suite 250 
Irvine CA 92606 
Office 714.918.1310| Fax 714.918.1325 
pbr...@medata.com| www.medata.com
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KNJGW2Z6XPK4CD5LSEPB3ILXQ5KLPQ6B/


[ovirt-users] Re: Upgrade Ovirt from 4.2 to 4.4 on CentOS7.4

2020-09-22 Thread Strahil Nikolov via Users
oVirt 4.4 requires EL8.2 , so no you cannot go to 4.4 without upgrading the OS 
to EL8.

Yet, you can still bump the version to 4.3.10 which is still EL7 based and it 
works quite good.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 17:39:52 Гринуич+3, 
 написа: 





Hi everyone,
I am writing for support regarding the ovirt upgrade.
I am using Ovirt with version 4.2 on CentOS 7.4 operating system.
The latest release of the Ovirt engine is 4.4 which is available for CentOS 
8.Can I upgrade without upgrading the operating system to centos8? 
I would not be wrong but it is not possible to switch from Centos7 to Centos8 
.Can anyone give me some advice?Thank you all!!!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IWFDBQVPDIX5JHZVIELIU7VIAOSRVROX/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KJYX6PDK6K2ZZROVACDHMSSRZ5PBRWUS/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
By the way, did you add the third host in the oVirt ?

If not , maybe that is the real problem :)


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 17:23:28 Гринуич+3, Jeremey Wise 
 написа: 





Its like oVirt thinks there are only two nodes in gluster replication





# Yet it is clear the CLI shows three bricks.
[root@medusa vms]# gluster volume status vmstore
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       9444
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       3269
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore                      49154     0          Y       7841
Self-heal Daemon on localhost               N/A       N/A        Y       80152
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       141750
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       245870

Task Status of Volume vmstore
--
There are no active volume tasks



How do I get oVirt to re-establish reality to what Gluster sees?



On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov  wrote:
> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 
> bricks up , but usually it was an UI issue and you go to UI and mark a "force 
> start" which will try to start any bricks that were down (won't affect 
> gluster) and will wake up the UI task to verify again brick status.
> 
> 
> https://github.com/gluster/gstatus is a good one to verify your cluster 
> health , yet human's touch is priceless in any kind of technology.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> 
> when I posted last..  in the tread I paste a roling restart.    And...  now 
> it is replicating.
> 
> oVirt still showing wrong.  BUT..   I did my normal test from each of the 
> three nodes.
> 
> 1) Mount Gluster file system with localhost as primary and other two as 
> tertiary to local mount (like a client would do)
> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
> 3) repeat from each node then read back that all are in sync.
> 
> I REALLY hate reboot (restart) as a fix.  I need to get better with root 
> cause of gluster issues if I am going to trust it.  Before when I manually 
> made the volumes and it was simply (vdo + gluster) then worst case was that 
> gluster would break... but I could always go into "brick" path and copy data 
> out.
> 
> Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
> simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
> environment  and data is lost.  This means nodes moved more to "pets" then 
> cattle.
> 
> And with three nodes.. I can't afford to loose any pets. 
> 
> I will post more when I get cluster settled and work on those wierd notes 
> about quorum volumes noted on two nodes when glusterd is restarted.
> 
> Thanks,
> 
> On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
>> Replication issue could mean that one of the client (FUSE mounts) is not 
>> attached to all bricks.
>> 
>> You can check the amount of clients via:
>> gluster volume status all client-list
>> 
>> 
>> As a prevention , just do a rolling restart:
>> - set a host in maintenance and mark it to stop glusterd service (I'm 
>> reffering to the UI)
>> - Activate the host , once it was moved to maintenance
>> 
>> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
>> proceed with the next one.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> 
>> I did.
>> 
>> Here are all three nodes with restart. I find it odd ... their has been a 
>> set of messages at end (see below) which I don't know enough about what 
>> oVirt laid out to know if it is bad.
>> 
>> ###
>> [root@thor vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
>> preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>>      Docs: man:glusterd(8)
>>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
>> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>>  Main PID: 2113 (glusterd)
>>     Tasks: 151 (limit: 1235410)
>>    Memory: 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
That's really wierd.
I would give the engine a 'Windows'-style fix (a.k.a. reboot).

I guess some of the engine's internal processes crashed/looped and it doesn't 
see the reality.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 16:27:25 Гринуич+3, Jeremey Wise 
 написа: 





Its like oVirt thinks there are only two nodes in gluster replication





# Yet it is clear the CLI shows three bricks.
[root@medusa vms]# gluster volume status vmstore
Status of volume: vmstore
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       9444
Brick odinst.penguinpages.local:/gluster_br
icks/vmstore/vmstore                        49154     0          Y       3269
Brick medusast.penguinpages.local:/gluster_
bricks/vmstore/vmstore                      49154     0          Y       7841
Self-heal Daemon on localhost               N/A       N/A        Y       80152
Self-heal Daemon on odinst.penguinpages.loc
al                                          N/A       N/A        Y       141750
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       245870

Task Status of Volume vmstore
--
There are no active volume tasks



How do I get oVirt to re-establish reality to what Gluster sees?



On Tue, Sep 22, 2020 at 8:59 AM Strahil Nikolov  wrote:
> Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 
> bricks up , but usually it was an UI issue and you go to UI and mark a "force 
> start" which will try to start any bricks that were down (won't affect 
> gluster) and will wake up the UI task to verify again brick status.
> 
> 
> https://github.com/gluster/gstatus is a good one to verify your cluster 
> health , yet human's touch is priceless in any kind of technology.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> 
> when I posted last..  in the tread I paste a roling restart.    And...  now 
> it is replicating.
> 
> oVirt still showing wrong.  BUT..   I did my normal test from each of the 
> three nodes.
> 
> 1) Mount Gluster file system with localhost as primary and other two as 
> tertiary to local mount (like a client would do)
> 2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
> 3) repeat from each node then read back that all are in sync.
> 
> I REALLY hate reboot (restart) as a fix.  I need to get better with root 
> cause of gluster issues if I am going to trust it.  Before when I manually 
> made the volumes and it was simply (vdo + gluster) then worst case was that 
> gluster would break... but I could always go into "brick" path and copy data 
> out.
> 
> Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
> simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
> environment  and data is lost.  This means nodes moved more to "pets" then 
> cattle.
> 
> And with three nodes.. I can't afford to loose any pets. 
> 
> I will post more when I get cluster settled and work on those wierd notes 
> about quorum volumes noted on two nodes when glusterd is restarted.
> 
> Thanks,
> 
> On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
>> Replication issue could mean that one of the client (FUSE mounts) is not 
>> attached to all bricks.
>> 
>> You can check the amount of clients via:
>> gluster volume status all client-list
>> 
>> 
>> As a prevention , just do a rolling restart:
>> - set a host in maintenance and mark it to stop glusterd service (I'm 
>> reffering to the UI)
>> - Activate the host , once it was moved to maintenance
>> 
>> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
>> proceed with the next one.
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> 
>> I did.
>> 
>> Here are all three nodes with restart. I find it odd ... their has been a 
>> set of messages at end (see below) which I don't know enough about what 
>> oVirt laid out to know if it is bad.
>> 
>> ###
>> [root@thor vmstore]# systemctl status glusterd
>> ● glusterd.service - GlusterFS, a clustered file-system server
>>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
>> preset: disabled)
>>   Drop-In: /etc/systemd/system/glusterd.service.d
>>            └─99-cpu.conf
>>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>>      Docs: man:glusterd(8)
>>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
>> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, 

[ovirt-users] Re: VM stuck in "reboot in progress" ("virtual machine XXX should be running in a host but it isn't.").

2020-09-22 Thread Gilboa Davara
Arik / Strahil,

Many thanks!

Just in-case anyone else is hitting the same issue (*NOTE* Host and VM
ID _will_ be different!)
0. Ran a backup:
1. Connect to the hosted-engine and DB:
$ ssh root@vmengine
$ su - postgres
$ psql engine
2. Execute a select query to verify that the VM's run_on_vds is NULL:
# select * from vm_dynamic where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
3. Execute Arik's update query:
# update vm_dynamic set
run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' where
vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
4. Re-started the engine:
$ systemctl restart ovirt-engine
5. Everything seems fine now. Profit!

Thanks again,
Gilboa

On Mon, Sep 21, 2020 at 4:28 PM Arik Hadas  wrote:
>
>
>
> On Sun, Sep 20, 2020 at 11:21 AM Gilboa Davara  wrote:
>>
>> On Sat, Sep 19, 2020 at 7:44 PM Arik Hadas  wrote:
>> >
>> >
>> >
>> > On Fri, Sep 18, 2020 at 8:27 AM Gilboa Davara  wrote:
>> >>
>> >> Hello all (and happy new year),
>> >>
>> >> (Note: Also reported as 
>> >> https://bugzilla.redhat.com/show_bug.cgi?id=1880251)
>> >>
>> >> Self hosted engine, single node, NFS.
>> >> Attempted to install CentOS over an existing Fedora VM with one host
>> >> device (USB printer).
>> >> Reboot failed, trying to boot from a non-existent CDROM.
>> >> Tried shutting the VM down, failed.
>> >> Tried powering off the VM, failed.
>> >> Dropped cluster to global maintenance, reboot host + engine (was
>> >> planning to upgrade it anyhow...), VM still stuck.
>> >>
>> >> When trying to power off the VM, the following message can be found
>> >> the in engine.log:
>> >> 2020-09-18 07:58:51,439+03 INFO
>> >> [org.ovirt.engine.core.bll.StopVmCommand]
>> >> (EE-ManagedThreadFactory-engine-Thread-42)
>> >> [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Running command: StopVmCommand
>> >> internal: false. Entities affected :  ID:
>> >> b411e573-bcda-4689-b61f-1811c6f03ad5 Type: VMAction group STOP_VM with
>> >> role type USER
>> >> 2020-09-18 07:58:51,441+03 WARN
>> >> [org.ovirt.engine.core.bll.StopVmCommand]
>> >> (EE-ManagedThreadFactory-engine-Thread-42)
>> >> [7bc4ac71-f0b2-4af7-b081-100dc99b6123] Strange, according to the
>> >> status 'RebootInProgress' virtual machine
>> >> 'b411e573-bcda-4689-b61f-1811c6f03ad5' should be running in a host but
>> >> it isn't.
>> >> 2020-09-18 07:58:51,594+03 ERROR
>> >> [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
>> >> (EE-ManagedThreadFactory-engine-Thread-42)
>> >> [7bc4ac71-f0b2-4af7-b081-100dc99b6123] EVENT_ID:
>> >> USER_FAILED_STOP_VM(56), Failed to power off VM kids-home-srv (Host:
>> >> , User: gilboa@internal-authz).
>> >>
>> >> My question is simple: Pending a solution to the bug, can I somehow
>> >> drop the state of the VM? It's currently holding a sizable disk image
>> >> and a USB device I need (printer).
>> >
>> >
>> > It would be best to modify the VM as if it should still be running on the 
>> > host and let the system discover that it's not running there and update 
>> > the VM accordingly.
>> >
>> > You can do it by changing the database with:
>> > update vm_dynamic set run_on_vds='82f92946-9130-4dbd-8663-1ac0b50668a1' 
>> > where vm_guid='b411e573-bcda-4689-b61f-1811c6f03ad5';
>> >
>> >
>> >>
>> >>
>> >> As it's my private VM cluster, I have no problem dropping the site
>> >> completely for maintenance.
>> >>
>> >> Thanks,
>> >>
>> >> Gilboa
>>
>>
>> Hello,
>>
>> Thanks for the prompt answer.
>>
>> Edward,
>>
>> Full reboot of both engine and host didn't help.
>> Most likely there's a consistency problem in the oVirt DB.
>>
>> Arik,
>>
>> To which DB I should connect and as which user?
>> E.g. psql -U user db_name
>
>
> To the 'engine' database.
> I usually connect to it by switching to the 'postgres' user as Strahil 
> described.
>
>>
>>
>> Thanks again,
>> - Gilboa
>>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UVUOGUI7N3AW2L4J2WCQBQUW4BTTCOA6/


[ovirt-users] Upgrade Ovirt from 4.2 to 4.4 on CentOS7.4

2020-09-22 Thread tiziano . pacioni
Hi everyone,
I am writing for support regarding the ovirt upgrade.
I am using Ovirt with version 4.2 on CentOS 7.4 operating system.
The latest release of the Ovirt engine is 4.4 which is available for CentOS 
8.Can I upgrade without upgrading the operating system to centos8? 
I would not be wrong but it is not possible to switch from Centos7 to Centos8 
.Can anyone give me some advice?Thank you all!!!
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IWFDBQVPDIX5JHZVIELIU7VIAOSRVROX/


[ovirt-users] Re: Question on "Memory" column/field in Virtual Machines list/table in ovirt GUI

2020-09-22 Thread Strahil Nikolov via Users
>Ok, May I know why you think it's only a bug in SLES?.
I never claimed it is a bug in SLES, but a bug in Ovirt detecting proper memory 
usage in SLES.
The behaviour you observe was normal for RHEL6/CentOS6/SLES11/openSUSE and 
bellow , so it is normal for some OSes.In my oVirt 4.3.10 , I see that the 
entry there is "SLES11+" , but I believe that it is checking the memory on 
SLES15 , just as if it is a SLES11.


>As I said before, ovirt is behaving the same way even for CentOS7 VMs. I am 
>attaching the details again here below.
Most probably oVirt is checking memory the RHEL6 style , which is not the 
correct one.

>My question is why ovirt is treating buff/cache memory as used memory and why 
>is not reporting memory usage just based on actual used memory?
Most probably it is a bug :D , every software has some. I would recommend you 
to open a bug in the bugzilla.redhat.com for each OS type (for example 1 for 
SLES/openSUSE and 1 for EL7/EL8-based).

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TQT22I3GTVLAZZPHJ6UAMPIW6Y2XEKEA/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Also in some rare cases, I have seen oVirt showing gluster as 2 out of 3 bricks 
up , but usually it was an UI issue and you go to UI and mark a "force start" 
which will try to start any bricks that were down (won't affect gluster) and 
will wake up the UI task to verify again brick status.


https://github.com/gluster/gstatus is a good one to verify your cluster health 
, yet human's touch is priceless in any kind of technology.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
 написа: 







when I posted last..  in the tread I paste a roling restart.    And...  now it 
is replicating.

oVirt still showing wrong.  BUT..   I did my normal test from each of the three 
nodes.

1) Mount Gluster file system with localhost as primary and other two as 
tertiary to local mount (like a client would do)
2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
3) repeat from each node then read back that all are in sync.

I REALLY hate reboot (restart) as a fix.  I need to get better with root cause 
of gluster issues if I am going to trust it.  Before when I manually made the 
volumes and it was simply (vdo + gluster) then worst case was that gluster 
would break... but I could always go into "brick" path and copy data out.

Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
environment  and data is lost.  This means nodes moved more to "pets" then 
cattle.

And with three nodes.. I can't afford to loose any pets. 

I will post more when I get cluster settled and work on those wierd notes about 
quorum volumes noted on two nodes when glusterd is restarted.

Thanks,

On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
> Replication issue could mean that one of the client (FUSE mounts) is not 
> attached to all bricks.
> 
> You can check the amount of clients via:
> gluster volume status all client-list
> 
> 
> As a prevention , just do a rolling restart:
> - set a host in maintenance and mark it to stop glusterd service (I'm 
> reffering to the UI)
> - Activate the host , once it was moved to maintenance
> 
> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
> proceed with the next one.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> I did.
> 
> Here are all three nodes with restart. I find it odd ... their has been a set 
> of messages at end (see below) which I don't know enough about what oVirt 
> laid out to know if it is bad.
> 
> ###
> [root@thor vmstore]# systemctl status glusterd
> ● glusterd.service - GlusterFS, a clustered file-system server
>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
> preset: disabled)
>   Drop-In: /etc/systemd/system/glusterd.service.d
>            └─99-cpu.conf
>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>      Docs: man:glusterd(8)
>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>  Main PID: 2113 (glusterd)
>     Tasks: 151 (limit: 1235410)
>    Memory: 3.8G
>       CPU: 6min 46.050s
>    CGroup: /glusterfs.slice/glusterd.service
>            ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level 
> INFO
>            ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
> /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log 
> -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option 
> *replicate*.node-uu>
>            ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
> /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
>  -S /var/r>
>            ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
> /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>            ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
> -p 
> /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>            └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
> /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
>  -S /var/run/glu>
> 
> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
> clustered file-system server...
> Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a 
> clustered file-system server.
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
> 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Usually I first start with:
'gluster volume heal  info summary'

Anything that is not 'Connected' is bad.

Yeah, the abstraction is not so nice, but the good thing is that you can always 
extract the data from a single node left (it will require to play a little bit 
with the quorum of the volume).

Usually I have seen that the FUSE fails to reconnect to a "gone bad and 
recovered" brick and then you got that endless healing (as FUSE will write the 
data to only 2 out of 3 bricks and then a heal is pending :D ).

I would go with the gluster logs and the brick logs and then you can dig deeper 
if you suspect network issue.


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 15:50:35 Гринуич+3, Jeremey Wise 
 написа: 







when I posted last..  in the tread I paste a roling restart.    And...  now it 
is replicating.

oVirt still showing wrong.  BUT..   I did my normal test from each of the three 
nodes.

1) Mount Gluster file system with localhost as primary and other two as 
tertiary to local mount (like a client would do)
2) run test file create Ex:   echo $HOSTNAME >> /media/glustervolume/test.out
3) repeat from each node then read back that all are in sync.

I REALLY hate reboot (restart) as a fix.  I need to get better with root cause 
of gluster issues if I am going to trust it.  Before when I manually made the 
volumes and it was simply (vdo + gluster) then worst case was that gluster 
would break... but I could always go into "brick" path and copy data out.

Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted from 
simple file recovery..  Without GLUSTER AND oVirt Engine up... all my 
environment  and data is lost.  This means nodes moved more to "pets" then 
cattle.

And with three nodes.. I can't afford to loose any pets. 

I will post more when I get cluster settled and work on those wierd notes about 
quorum volumes noted on two nodes when glusterd is restarted.

Thanks,

On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov  wrote:
> Replication issue could mean that one of the client (FUSE mounts) is not 
> attached to all bricks.
> 
> You can check the amount of clients via:
> gluster volume status all client-list
> 
> 
> As a prevention , just do a rolling restart:
> - set a host in maintenance and mark it to stop glusterd service (I'm 
> reffering to the UI)
> - Activate the host , once it was moved to maintenance
> 
> Wait for the host's HE score to recover (silver/gold crown in UI) and then 
> proceed with the next one.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> I did.
> 
> Here are all three nodes with restart. I find it odd ... their has been a set 
> of messages at end (see below) which I don't know enough about what oVirt 
> laid out to know if it is bad.
> 
> ###
> [root@thor vmstore]# systemctl status glusterd
> ● glusterd.service - GlusterFS, a clustered file-system server
>    Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
> preset: disabled)
>   Drop-In: /etc/systemd/system/glusterd.service.d
>            └─99-cpu.conf
>    Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>      Docs: man:glusterd(8)
>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>  Main PID: 2113 (glusterd)
>     Tasks: 151 (limit: 1235410)
>    Memory: 3.8G
>       CPU: 6min 46.050s
>    CGroup: /glusterfs.slice/glusterd.service
>            ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level 
> INFO
>            ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
> /var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log 
> -S /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option 
> *replicate*.node-uu>
>            ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
> /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
>  -S /var/r>
>            ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
> /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>            ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
> -p 
> /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>            └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
> --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
> /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
>  -S /var/run/glu>
> 
> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
> clustered file-system server...

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Jeremey Wise
when I posted last..  in the tread I paste a roling restart.And...  now
it is replicating.

oVirt still showing wrong.  BUT..   I did my normal test from each of the
three nodes.

1) Mount Gluster file system with localhost as primary and other two as
tertiary to local mount (like a client would do)
2) run test file create Ex:   echo $HOSTNAME >>
/media/glustervolume/test.out
3) repeat from each node then read back that all are in sync.

I REALLY hate reboot (restart) as a fix.  I need to get better with root
cause of gluster issues if I am going to trust it.  Before when I manually
made the volumes and it was simply (vdo + gluster) then worst case was that
gluster would break... but I could always go into "brick" path and copy
data out.

Now with oVirt.. .and LVM and thin provisioning etc..   I am abstracted
from simple file recovery..  Without GLUSTER AND oVirt Engine up... all my
environment  and data is lost.  This means nodes moved more to "pets" then
cattle.

And with three nodes.. I can't afford to loose any pets.

I will post more when I get cluster settled and work on those wierd notes
about quorum volumes noted on two nodes when glusterd is restarted.

Thanks,

On Tue, Sep 22, 2020 at 8:44 AM Strahil Nikolov 
wrote:

> Replication issue could mean that one of the client (FUSE mounts) is not
> attached to all bricks.
>
> You can check the amount of clients via:
> gluster volume status all client-list
>
>
> As a prevention , just do a rolling restart:
> - set a host in maintenance and mark it to stop glusterd service (I'm
> reffering to the UI)
> - Activate the host , once it was moved to maintenance
>
> Wait for the host's HE score to recover (silver/gold crown in UI) and then
> proceed with the next one.
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
> В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise <
> jeremey.w...@gmail.com> написа:
>
>
>
>
>
>
> I did.
>
> Here are all three nodes with restart. I find it odd ... their has been a
> set of messages at end (see below) which I don't know enough about what
> oVirt laid out to know if it is bad.
>
> ###
> [root@thor vmstore]# systemctl status glusterd
> ● glusterd.service - GlusterFS, a clustered file-system server
>Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
> vendor preset: disabled)
>   Drop-In: /etc/systemd/system/glusterd.service.d
>└─99-cpu.conf
>Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
>  Docs: man:glusterd(8)
>   Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
> --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
>  Main PID: 2113 (glusterd)
> Tasks: 151 (limit: 1235410)
>Memory: 3.8G
>   CPU: 6min 46.050s
>CGroup: /glusterfs.slice/glusterd.service
>├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
> INFO
>├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data
> -p /var/run/gluster/shd/data/data-shd.pid -l
> /var/log/glusterfs/glustershd.log -S
> /var/run/gluster/2f41374c2e36bf4d.socket --xlator-option
> *replicate*.node-uu>
>├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p
> /var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
> -S /var/r>
>├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine
> -p
> /var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
>├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id
> vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p
> /var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
>└─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local
> --volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p
> /var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
> -S /var/run/glu>
>
> Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a
> clustered file-system server...
> Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a
> clustered file-system server.
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
> 00:32:28.605674] C [MSGID: 106003]
> [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume data. Starting lo>
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
> 00:32:28.639490] C [MSGID: 106003]
> [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
> 0-management: Server quorum regained for volume engine. Starting >
> Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
> 00:32:28.680665] C [MSGID: 106003]
> [glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
> 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
At around Sep 21 20:33 local time , you got  a loss of quorum - that's not good.

Could it be a network 'hicup' ?

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 15:05:16 Гринуич+3, Jeremey Wise 
 написа: 






I did.

Here are all three nodes with restart. I find it odd ... their has been a set 
of messages at end (see below) which I don't know enough about what oVirt laid 
out to know if it is bad.

###
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─99-cpu.conf
   Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
     Docs: man:glusterd(8)
  Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2113 (glusterd)
    Tasks: 151 (limit: 1235410)
   Memory: 3.8G
      CPU: 6min 46.050s
   CGroup: /glusterfs.slice/glusterd.service
           ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
/var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S 
/var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu>
           ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
/var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
 -S /var/r>
           ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
/var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
           ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
-p 
/var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
           └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
/var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid 
-S /var/run/glu>

Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
clustered file-system server...
Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a 
clustered file-system server.
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.605674] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume data. Starting lo>
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.639490] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume engine. Starting >
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.680665] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume vmstore. Starting>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, discon>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, disc>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 
30 seconds, dis>
Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 
seconds, disconn>
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]# systemctl restart glusterd
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─99-cpu.conf
   Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago
     Docs: man:glusterd(8)
  Process: 245831 ExecStart=/usr/sbin/glusterd -p 

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Replication issue could mean that one of the client (FUSE mounts) is not 
attached to all bricks.

You can check the amount of clients via:
gluster volume status all client-list


As a prevention , just do a rolling restart:
- set a host in maintenance and mark it to stop glusterd service (I'm reffering 
to the UI)
- Activate the host , once it was moved to maintenance

Wait for the host's HE score to recover (silver/gold crown in UI) and then 
proceed with the next one.

Best Regards,
Strahil Nikolov




В вторник, 22 септември 2020 г., 14:55:35 Гринуич+3, Jeremey Wise 
 написа: 






I did.

Here are all three nodes with restart. I find it odd ... their has been a set 
of messages at end (see below) which I don't know enough about what oVirt laid 
out to know if it is bad.

###
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled; vendor 
preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
           └─99-cpu.conf
   Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
     Docs: man:glusterd(8)
  Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid 
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2113 (glusterd)
    Tasks: 151 (limit: 1235410)
   Memory: 3.8G
      CPU: 6min 46.050s
   CGroup: /glusterfs.slice/glusterd.service
           ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO
           ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data -p 
/var/run/gluster/shd/data/data-shd.pid -l /var/log/glusterfs/glustershd.log -S 
/var/run/gluster/2f41374c2e36bf4d.socket --xlator-option *replicate*.node-uu>
           ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p 
/var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
 -S /var/r>
           ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine -p 
/var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
           ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore 
-p 
/var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
           └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local 
--volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p 
/var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid 
-S /var/run/glu>

Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a 
clustered file-system server...
Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a 
clustered file-system server.
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.605674] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume data. Starting lo>
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.639490] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume engine. Starting >
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22 
00:32:28.680665] C [MSGID: 106003] 
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action] 0-management: 
Server quorum regained for volume vmstore. Starting>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
0-data-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, discon>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
2-engine-client-0: server 172.16.101.101:24007 has not responded in the last 30 
seconds, disc>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the last 
30 seconds, dis>
Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22 
00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired] 
3-iso-client-0: server 172.16.101.101:24007 has not responded in the last 42 
seconds, disconn>
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]# systemctl restart glusterd
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a 

[ovirt-users] Re: Cannot import VM disks from previously detached storage domain

2020-09-22 Thread Eyal Shenitzky
I will have a look.
Thank you for your support in oVirt!

On Tue, 22 Sep 2020 at 15:30, Strahil Nikolov  wrote:

> Hi Eyal,
>
> thanks for the reply - all the proposed options make sense.
> I have opened a RFE -> https://bugzilla.redhat.com/show_bug.cgi?id=1881457
> , but can you verify that the product/team are the correct one ?
>
> Best Regards,
> Strahil Nikolov
>
>
>
>
>
>
> В вторник, 22 септември 2020 г., 12:55:56 Гринуич+3, Eyal Shenitzky <
> eshen...@redhat.com> написа:
>
>
>
>
>
>
>
> On Mon, 21 Sep 2020 at 23:19, Strahil Nikolov 
> wrote:
> > Hey Eyal,
> >
> > it's really irritating that only ISOs can be imported as disks.
> >
> > I had to:
> > 1. Delete snapshot (but I really wanted to keep it)
> > 2. Detach all disks from existing VM
> > 3. Delete the VM
> > 4. Import the Vm from the data domain
> > 5. Delete the snapshot , so disks from data domain are "in sync" with
> the non-data disks
> > 6. Attach the non-data disks to the VM
> >
> > If all disks for a VM were on the same storage domain - I didn't have to
> wipe my snapshots.
> >
> > Should I file a RFE in order to allow disk import for non-ISO disks ?
> > If I wanted to rebuild the engine and import the sotrage domains I would
> have to import the VM the first time , just to delete it and import it
> again - so I can get my VM disks from the storage...
> >
>
> From what I understand you want to file an RFE that requests the option to
> split 'unregistered' entities in a data domain, but unfortunately this is
> not possible.
>
> But we may add different options:
> * merge/squash to identical partial VMs
> * Override an existing VM
> * Force import the VM with a different ID
> You can file an RFE with those suggest options.
>
> Also, please add the description of why do you think it is needed.
>
>
> >  Best Regards,
> > Strahil Nikolov
> >
> >
> >
> >
> >
> > В понеделник, 21 септември 2020 г., 11:47:04 Гринуич+3, Eyal Shenitzky <
> eshen...@redhat.com> написа:
> >
> >
> >
> >
> >
> > Hi Stranhil,
> >
> > Maybe those VMs has more disks on different data storage domains?
> > If so, those VMs will remain on the environment with the disks that are
> not based on the detached storage-domain.
> >
> > You can try to import the VM as partial, another option is to remove the
> VM that remained in the environment but
> > keep the disks so you will be able to import the VM and attach the disks
> to it.
> >
> > On Sat, 19 Sep 2020 at 15:49, Strahil Nikolov via Users 
> wrote:
> >> Hello All,
> >>
> >> I would like to ask how to proceed further.
> >>
> >> Here is what I have done so far on my ovirt 4.3.10:
> >> 1. Set in maintenance and detached my Gluster-based storage domain
> >> 2. Did some maintenance on the gluster
> >> 3. Reattached and activated my Gluster-based storage domain
> >> 4. I have imported my ISOs via the Disk Import tab in UI
> >>
> >> Next I tried to import the VM Disks , but they are unavailable in the
> disk tab
> >> So I tried to import the VM:
> >> 1. First try - import with partial -> failed due to MAC conflict
> >> 2. Second try - import with partial , allow MAC reassignment -> failed
> as VM id exists -> recommends to remove the original VM
> >> 3. I tried to detach the VMs disks , so I can delete it - but this is
> not possible as the Vm already got a snapshot.
> >>
> >>
> >> What is the proper way to import my non-OS disks (data domain is slower
> but has more space which is more suitable for "data") ?
> >>
> >>
> >> Best Regards,
> >> Strahil Nikolov
> >> ___
> >> Users mailing list -- users@ovirt.org
> >> To unsubscribe send an email to users-le...@ovirt.org
> >> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> >> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> >> List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTJXOIVDWU6DGVZQQ243VKGWJLPKHR4L/
> >
> >>
> >
> >
> > --
> > Regards,
> > Eyal Shenitzky
> >
> >
>
>
> --
> Regards,
> Eyal Shenitzky
>
>

-- 
Regards,
Eyal Shenitzky
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SL2I3WEQ3MS6TIVBL5SC42B4FAZNTSWX/


[ovirt-users] Re: Gluster Domain Storage full

2020-09-22 Thread Strahil Nikolov via Users
Any option to extend the Gluster Volume ?

Other approaches are quite destructive. I guess , you can obtain the VM's xml 
via virsh and then copy the disks to another pure-KVM host.
Then you can start the VM , while you are recovering from the situation.

virsh -c qemu:///system?authfile=/etc/ovirt-hosted-engine/virsh_auth.conf 
dumpxml  > /some/path/.xml

Once you got the VM running on a pure-KVM host , you can go to oVirt and try to 
wipe the VM from the UI. 


Usually those 10% reserve is just in case something like this one has happened, 
but Gluster doesn't check it every second (or the overhead will be crazy).

Maybe you can extend the Gluster volume temporarily , till you manage to move 
away the VM to a bigger storage. Then you can reduce the volume back to 
original size.

Best Regards,
Strahil Nikolov



В вторник, 22 септември 2020 г., 14:53:53 Гринуич+3, supo...@logicworks.pt 
 написа: 





Hello Strahil,

I just set cluster.min-free-disk to 1%:
# gluster volume info data

Volume Name: data
Type: Distribute
Volume ID: 2d3ea533-aca3-41c4-8cb6-239fe4f82bc3
Status: Started
Snapshot Count: 0
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: node2.domain.com:/home/brick1
Options Reconfigured:
cluster.min-free-disk: 1%
cluster.data-self-heal-algorithm: full
performance.low-prio-threads: 32
features.shard-block-size: 512MB
features.shard: on
storage.owner-gid: 36
storage.owner-uid: 36
transport.address-family: inet
nfs.disable: on

But still get the same error: Error while executing action: Cannot move Virtual 
Disk. Low disk space on Storage Domain
I restarted the glusterfs volume.
But I can not do anything with the VM disk.


I know that filling the bricks is very bad, we lost access to the VM. I think 
there should be a mechanism to prevent stopping the VM.
we should continue to have access to the VM to free some space.

If you have a VM with a Thin Provision disk, if the VM fills the entire disk, 
we got the same problem.

Any idea?

Thanks

José




De: "Strahil Nikolov" 
Para: "users" , supo...@logicworks.pt
Enviadas: Segunda-feira, 21 De Setembro de 2020 21:28:10
Assunto: Re: [ovirt-users] Gluster Domain Storage full

Usually gluster has a 10% reserver defined in 'cluster.min-free-disk' volume 
option.
You can power off the VM , then set cluster.min-free-disk
to 1% and immediately move any of the VM's disks to another storage domain.

Keep in mind that filling your bricks is bad and if you eat that reserve , the 
only option would be to try to export the VM as OVA and then wipe from current 
storage and import in a bigger storage domain.

Of course it would be more sensible to just expand the gluster volume (either 
scale-up the bricks -> add more disks, or scale-out -> adding more servers with 
disks on them), but I guess that is not an option - right ?

Best Regards,
Strahil Nikolov








В понеделник, 21 септември 2020 г., 15:58:01 Гринуич+3, supo...@logicworks.pt 
 написа: 





Hello,

I'm running oVirt Version 4.3.4.3-1.el7.
I have a small GlusterFS Domain storage brick on a dedicated filesystem serving 
only one VM.
The VM filled all the Domain storage.
The Linux filesystem has 4.1G available and 100% used, the mounted brick has 
0GB available and 100% used

I can not do anything with this disk, for example, if I try to move it to 
another Gluster Domain Storage get the message:

Error while executing action: Cannot move Virtual Disk. Low disk space on 
Storage Domain

Any idea?

Thanks

-- 

Jose Ferradeira
http://www.logicworks.pt
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WFN2VOQZPPVCGXAIFEYVIDEVJEUCSWY7/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AIJUP2HZIWRSQHN4XU3BGGT2ZDKEVJZ3/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBAJWBN3QSKWEPWVP4DIL7OGNTASVZLP/


[ovirt-users] Re: Cannot import VM disks from previously detached storage domain

2020-09-22 Thread Strahil Nikolov via Users
Hi Eyal,

thanks for the reply - all the proposed options make sense.
I have opened a RFE -> https://bugzilla.redhat.com/show_bug.cgi?id=1881457 , 
but can you verify that the product/team are the correct one ?

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 12:55:56 Гринуич+3, Eyal Shenitzky 
 написа: 







On Mon, 21 Sep 2020 at 23:19, Strahil Nikolov  wrote:
> Hey Eyal,
> 
> it's really irritating that only ISOs can be imported as disks.
> 
> I had to:
> 1. Delete snapshot (but I really wanted to keep it)
> 2. Detach all disks from existing VM
> 3. Delete the VM
> 4. Import the Vm from the data domain
> 5. Delete the snapshot , so disks from data domain are "in sync" with the 
> non-data disks
> 6. Attach the non-data disks to the VM
> 
> If all disks for a VM were on the same storage domain - I didn't have to wipe 
> my snapshots.
> 
> Should I file a RFE in order to allow disk import for non-ISO disks ?
> If I wanted to rebuild the engine and import the sotrage domains I would have 
> to import the VM the first time , just to delete it and import it again - so 
> I can get my VM disks from the storage...
> 

From what I understand you want to file an RFE that requests the option to 
split 'unregistered' entities in a data domain, but unfortunately this is not 
possible.

But we may add different options:
* merge/squash to identical partial VMs
* Override an existing VM
* Force import the VM with a different ID
You can file an RFE with those suggest options.

Also, please add the description of why do you think it is needed.

 
>  Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 11:47:04 Гринуич+3, Eyal Shenitzky 
>  написа: 
> 
> 
> 
> 
> 
> Hi Stranhil, 
> 
> Maybe those VMs has more disks on different data storage domains?
> If so, those VMs will remain on the environment with the disks that are not 
> based on the detached storage-domain.
> 
> You can try to import the VM as partial, another option is to remove the VM 
> that remained in the environment but 
> keep the disks so you will be able to import the VM and attach the disks to 
> it.
> 
> On Sat, 19 Sep 2020 at 15:49, Strahil Nikolov via Users  
> wrote:
>> Hello All,
>> 
>> I would like to ask how to proceed further.
>> 
>> Here is what I have done so far on my ovirt 4.3.10:
>> 1. Set in maintenance and detached my Gluster-based storage domain
>> 2. Did some maintenance on the gluster
>> 3. Reattached and activated my Gluster-based storage domain
>> 4. I have imported my ISOs via the Disk Import tab in UI
>> 
>> Next I tried to import the VM Disks , but they are unavailable in the disk 
>> tab
>> So I tried to import the VM:
>> 1. First try - import with partial -> failed due to MAC conflict
>> 2. Second try - import with partial , allow MAC reassignment -> failed as VM 
>> id exists -> recommends to remove the original VM
>> 3. I tried to detach the VMs disks , so I can delete it - but this is not 
>> possible as the Vm already got a snapshot.
>> 
>> 
>> What is the proper way to import my non-OS disks (data domain is slower but 
>> has more space which is more suitable for "data") ?
>> 
>> 
>> Best Regards,
>> Strahil Nikolov
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTJXOIVDWU6DGVZQQ243VKGWJLPKHR4L/
> 
>> 
> 
> 
> -- 
> Regards,
> Eyal Shenitzky
> 
> 


-- 
Regards,
Eyal Shenitzky
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FEU3KIA76YUA6EDI6SIOY43MHI2Z2ZNB/


[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Jeremey Wise
I did.

Here are all three nodes with restart. I find it odd ... their has been a
set of messages at end (see below) which I don't know enough about what
oVirt laid out to know if it is bad.

###
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
vendor preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
   └─99-cpu.conf
   Active: active (running) since Mon 2020-09-21 20:32:26 EDT; 10h ago
 Docs: man:glusterd(8)
  Process: 2001 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 2113 (glusterd)
Tasks: 151 (limit: 1235410)
   Memory: 3.8G
  CPU: 6min 46.050s
   CGroup: /glusterfs.slice/glusterd.service
   ├─ 2113 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level
INFO
   ├─ 2914 /usr/sbin/glusterfs -s localhost --volfile-id shd/data
-p /var/run/gluster/shd/data/data-shd.pid -l
/var/log/glusterfs/glustershd.log -S
/var/run/gluster/2f41374c2e36bf4d.socket --xlator-option
*replicate*.node-uu>
   ├─ 9342 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id data.thorst.penguinpages.local.gluster_bricks-data-data -p
/var/run/gluster/vols/data/thorst.penguinpages.local-gluster_bricks-data-data.pid
-S /var/r>
   ├─ 9433 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id engine.thorst.penguinpages.local.gluster_bricks-engine-engine
-p
/var/run/gluster/vols/engine/thorst.penguinpages.local-gluster_bricks-engine-engine.p>
   ├─ 9444 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id
vmstore.thorst.penguinpages.local.gluster_bricks-vmstore-vmstore -p
/var/run/gluster/vols/vmstore/thorst.penguinpages.local-gluster_bricks-vmstore-vms>
   └─35639 /usr/sbin/glusterfsd -s thorst.penguinpages.local
--volfile-id iso.thorst.penguinpages.local.gluster_bricks-iso-iso -p
/var/run/gluster/vols/iso/thorst.penguinpages.local-gluster_bricks-iso-iso.pid
-S /var/run/glu>

Sep 21 20:32:24 thor.penguinpages.local systemd[1]: Starting GlusterFS, a
clustered file-system server...
Sep 21 20:32:26 thor.penguinpages.local systemd[1]: Started GlusterFS, a
clustered file-system server.
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
00:32:28.605674] C [MSGID: 106003]
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume data. Starting lo>
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
00:32:28.639490] C [MSGID: 106003]
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume engine. Starting >
Sep 21 20:32:28 thor.penguinpages.local glusterd[2113]: [2020-09-22
00:32:28.680665] C [MSGID: 106003]
[glusterd-server-quorum.c:348:glusterd_do_volume_quorum_action]
0-management: Server quorum regained for volume vmstore. Starting>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:24.813409] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
0-data-client-0: server 172.16.101.101:24007 has not responded in the last
30 seconds, discon>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:24.815147] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
2-engine-client-0: server 172.16.101.101:24007 has not responded in the
last 30 seconds, disc>
Sep 21 20:33:24 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:24.818735] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
4-vmstore-client-0: server 172.16.101.101:24007 has not responded in the
last 30 seconds, dis>
Sep 21 20:33:36 thor.penguinpages.local glustershd[2914]: [2020-09-22
00:33:36.816978] C [rpc-clnt-ping.c:155:rpc_clnt_ping_timer_expired]
3-iso-client-0: server 172.16.101.101:24007 has not responded in the last
42 seconds, disconn>
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]#
[root@thor vmstore]# systemctl restart glusterd
[root@thor vmstore]# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled;
vendor preset: disabled)
  Drop-In: /etc/systemd/system/glusterd.service.d
   └─99-cpu.conf
   Active: active (running) since Tue 2020-09-22 07:24:34 EDT; 2s ago
 Docs: man:glusterd(8)
  Process: 245831 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid
--log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 245832 (glusterd)
Tasks: 151 (limit: 1235410)
   Memory: 3.8G
  CPU: 132ms
   CGroup: /glusterfs.slice/glusterd.service
   ├─  2914 /usr/sbin/glusterfs -s localhost 

[ovirt-users] Re: Gluster Domain Storage full

2020-09-22 Thread suporte
Hello Strahil, 

I just set cluster.min-free-disk to 1%: 
# gluster volume info data 

Volume Name: data 
Type: Distribute 
Volume ID: 2d3ea533-aca3-41c4-8cb6-239fe4f82bc3 
Status: Started 
Snapshot Count: 0 
Number of Bricks: 1 
Transport-type: tcp 
Bricks: 
Brick1: node2.domain.com:/home/brick1 
Options Reconfigured: 
cluster.min-free-disk: 1% 
cluster.data-self-heal-algorithm: full 
performance.low-prio-threads: 32 
features.shard-block-size: 512MB 
features.shard: on 
storage.owner-gid: 36 
storage.owner-uid: 36 
transport.address-family: inet 
nfs.disable: on 

But still get the same error: Error while executing action: Cannot move Virtual 
Disk. Low disk space on Storage Domain 
I restarted the glusterfs volume. 
But I can not do anything with the VM disk. 


I know that filling the bricks is very bad, we lost access to the VM. I think 
there should be a mechanism to prevent stopping the VM. 
we should continue to have access to the VM to free some space. 

If you have a VM with a Thin Provision disk, if the VM fills the entire disk, 
we got the same problem. 

Any idea? 

Thanks 

José 




De: "Strahil Nikolov"  
Para: "users" , supo...@logicworks.pt 
Enviadas: Segunda-feira, 21 De Setembro de 2020 21:28:10 
Assunto: Re: [ovirt-users] Gluster Domain Storage full 

Usually gluster has a 10% reserver defined in 'cluster.min-free-disk' volume 
option. 
You can power off the VM , then set cluster.min-free-disk 
to 1% and immediately move any of the VM's disks to another storage domain. 

Keep in mind that filling your bricks is bad and if you eat that reserve , the 
only option would be to try to export the VM as OVA and then wipe from current 
storage and import in a bigger storage domain. 

Of course it would be more sensible to just expand the gluster volume (either 
scale-up the bricks -> add more disks, or scale-out -> adding more servers with 
disks on them), but I guess that is not an option - right ? 

Best Regards, 
Strahil Nikolov 








В понеделник, 21 септември 2020 г., 15:58:01 Гринуич+3, supo...@logicworks.pt 
 написа: 





Hello, 

I'm running oVirt Version 4.3.4.3-1.el7. 
I have a small GlusterFS Domain storage brick on a dedicated filesystem serving 
only one VM. 
The VM filled all the Domain storage. 
The Linux filesystem has 4.1G available and 100% used, the mounted brick has 
0GB available and 100% used 

I can not do anything with this disk, for example, if I try to move it to 
another Gluster Domain Storage get the message: 

Error while executing action: Cannot move Virtual Disk. Low disk space on 
Storage Domain 

Any idea? 

Thanks 

-- 
 
Jose Ferradeira 
http://www.logicworks.pt 
___ 
Users mailing list -- users@ovirt.org 
To unsubscribe send an email to users-le...@ovirt.org 
Privacy Statement: https://www.ovirt.org/privacy-policy.html 
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/ 
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WFN2VOQZPPVCGXAIFEYVIDEVJEUCSWY7/
 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/AIJUP2HZIWRSQHN4XU3BGGT2ZDKEVJZ3/


[ovirt-users] Re: Cannot import VM disks from previously detached storage domain

2020-09-22 Thread Eyal Shenitzky
On Mon, 21 Sep 2020 at 23:19, Strahil Nikolov  wrote:

> Hey Eyal,
>
> it's really irritating that only ISOs can be imported as disks.
>
> I had to:
> 1. Delete snapshot (but I really wanted to keep it)
> 2. Detach all disks from existing VM
> 3. Delete the VM
> 4. Import the Vm from the data domain
> 5. Delete the snapshot , so disks from data domain are "in sync" with the
> non-data disks
> 6. Attach the non-data disks to the VM
>
> If all disks for a VM were on the same storage domain - I didn't have to
> wipe my snapshots.
>
> Should I file a RFE in order to allow disk import for non-ISO disks ?
> If I wanted to rebuild the engine and import the sotrage domains I would
> have to import the VM the first time , just to delete it and import it
> again - so I can get my VM disks from the storage...
>
>
>From what I understand you want to file an RFE that requests the option to
split 'unregistered' entities in a data domain, but unfortunately this is
not possible.

But we may add different options:

   - merge/squash to identical partial VMs
   - Override an existing VM
   - Force import the VM with a different ID

You can file an RFE with those suggest options.

Also, please add the description of why do you think it is needed.



> Best Regards,
> Strahil Nikolov
>
>
>
>
>
> В понеделник, 21 септември 2020 г., 11:47:04 Гринуич+3, Eyal Shenitzky <
> eshen...@redhat.com> написа:
>
>
>
>
>
> Hi Stranhil,
>
> Maybe those VMs has more disks on different data storage domains?
> If so, those VMs will remain on the environment with the disks that are
> not based on the detached storage-domain.
>
> You can try to import the VM as partial, another option is to remove the
> VM that remained in the environment but
> keep the disks so you will be able to import the VM and attach the disks
> to it.
>
> On Sat, 19 Sep 2020 at 15:49, Strahil Nikolov via Users 
> wrote:
> > Hello All,
> >
> > I would like to ask how to proceed further.
> >
> > Here is what I have done so far on my ovirt 4.3.10:
> > 1. Set in maintenance and detached my Gluster-based storage domain
> > 2. Did some maintenance on the gluster
> > 3. Reattached and activated my Gluster-based storage domain
> > 4. I have imported my ISOs via the Disk Import tab in UI
> >
> > Next I tried to import the VM Disks , but they are unavailable in the
> disk tab
> > So I tried to import the VM:
> > 1. First try - import with partial -> failed due to MAC conflict
> > 2. Second try - import with partial , allow MAC reassignment -> failed
> as VM id exists -> recommends to remove the original VM
> > 3. I tried to detach the VMs disks , so I can delete it - but this is
> not possible as the Vm already got a snapshot.
> >
> >
> > What is the proper way to import my non-OS disks (data domain is slower
> but has more space which is more suitable for "data") ?
> >
> >
> > Best Regards,
> > Strahil Nikolov
> > ___
> > Users mailing list -- users@ovirt.org
> > To unsubscribe send an email to users-le...@ovirt.org
> > Privacy Statement: https://www.ovirt.org/privacy-policy.html
> > oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> > List Archives:
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/WTJXOIVDWU6DGVZQQ243VKGWJLPKHR4L/
> >
>
>
> --
> Regards,
> Eyal Shenitzky
>
>

-- 
Regards,
Eyal Shenitzky
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5A7IOHPR6VOOMBXQIJT5FAN2O6FTKVHQ/


[ovirt-users] Re: Question on "Memory" column/field in Virtual Machines list/table in ovirt GUI

2020-09-22 Thread KISHOR K
Ok, May I know why you think it's only a bug in SLES?.
As I said before, ovirt is behaving the same way even for CentOS7 VMs. I am 
attaching the details again here below.

One of running CentOS VM memory details are as below. 

[centos@centos-vm1 ~]$ free -m
total  used   free  shared 
buff/cache   available
Mem:   78161257 176 386  6383   
 5874
Swap: 00   0

Here, out of total allocated memory of 7816 MB, we can see that total actual 
available memory is 5874 MB and the actual used memory is just 1257 MB, 
excluding buff/cache.

But in ovirt GUI, memory usage field/column for above VM (Compute -> Virtual 
Machines and then Select VM and Check Memory field/Column) shows usage as 98%. 
That means that, it says only 2% memory (considering 176 MB) is free and 98% is 
used (considering used + buff/cache i.e. 1257 MB + 6383 MB). 
My question is why ovirt is treating buff/cache memory as used memory and why 
is not reporting memory usage just based on actual used memory?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/G7GTDFRI36RGFL3OKXRL35MP5N4LHUQ7/


[ovirt-users] Re: Fail install SHE ovirt-engine from backupfile (4.3 -> 4.4)

2020-09-22 Thread Francesco via Users

Ok, solved.

Simply the server node2 could not mount via NFS the data domain of the 
node 1. Added node1 in the node2 firewall and in /etc/exports, tested 
and everything went fine.


Regards,
Francesco

Il 21/09/2020 17:44, francesco--- via Users ha scritto:

Hi Everyone,

In a test environment I'm trying to deploy a single node self hosted engine 4.4 
on CentOS 8 from a 4.3 backup. The actual setup is:
- node1 with CentOS7, oVirt 4.3 with a working SH engine. The data domain is a 
local NFS;
- node2 with CentOS8, where we are triyng to deploy the engine starting from 
the node1 engine backup
- host1, with CentOS78, running a couple of VMs (4.3)

I'm following the guide: 
https://www.ovirt.org/documentation/upgrade_guide/#Upgrading_the_Manager_to_4-4_4-3_SHE
Everything seems working fine, the engine on the node1 is in maintenance:global 
mode and the ovirt-engine service i stopped. The deploy on the node2 stucks in 
the following error:

TASK [ovirt.hosted_engine_setup : Wait for OVF_STORE disk content]

[ ERROR ] {'msg': 'non-zero return code', 'cmd': "vdsm-client Image prepare 
storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 
storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 
imageID=e48a66dd-74c9-43eb-890e-778e9c4ee8db volumeID=06bb5f34-112d-4214-91d2-53d0bdb84321 | 
grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - 
6023764f-5547-4b23-92ca-422eafdf3f87.ovf", 'stdout': '', 'stderr': "vdsm-client: 
Command Image.prepare with args {'storagepoolID': '06c58622-f99b-11ea-9122-00163e1bbc93', 
'storagedomainID': '2a4a3cce-f2f6-4ddd-b337-df5ef562f520', 'imageID': 
'e48a66dd-74c9-43eb-890e-778e9c4ee8db',
'volumeID': '06bb5f34-112d-4214-91d2-53d0bdb84321'} failed:\n(code=309, message=Unknown pool 
id, pool not connected: ('06c58622-f99b-11ea-9122-00163e1bbc93',))\ntar: This does not look 
like a tar archive\ntar: 6023764f-5547-4b23-92ca-422eafdf3f87.ovf: Not found in archive\ntar: 
Exiting with failure status due to previous errors", 'rc': 2, 'start': '2020-09-21 
17:14:17.293090', 'end': '2020-09-21 17:14:17.644253', 'delta': '0:00:00.351163', 'changed': 
True, 'failed': True, 'invocation': {'module_args': {'warn': False, '_raw_params': 
"vdsm-client Image prepare storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 
storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 
imageID=e48a66dd-74c9-43eb-890e-778e9c4ee8db volumeID=06bb5f34-112d-4214-91d2-53d0bdb84321 | 
grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - 
6023764f-5547-4b23-92ca-422eafdf3f87.ovf", '_uses_shell': True, 'stdin_add_newline': 
True, 'strip_empty_ends': True, 'argv': None, 'chdir': None, 'executable
  ': None, 'creates': None, 'removes': None, 'stdin': None}}, 'stdout_lines': [], 'stderr_lines': 
["vdsm-client: Command Image.prepare with args {'storagepoolID': 
'06c58622-f99b-11ea-9122-00163e1bbc93', 'storagedomainID': '2a4a3cce-f2f6-4ddd-b337-df5ef562f520', 
'imageID': 'e48a66dd-74c9-43eb-890e-778e9c4ee8db', 'volumeID': 
'06bb5f34-112d-4214-91d2-53d0bdb84321'} failed:", "(code=309, message=Unknown pool id, 
pool not connected: ('06c58622-f99b-11ea-9122-00163e1bbc93',))", 'tar: This does not look like 
a tar archive', 'tar: 6023764f-5547-4b23-92ca-422eafdf3f87.ovf: Not found in archive', 'tar: 
Exiting with failure status due to previous errors'], '_ansible_no_log': False, 'attempts':
12, 'item': {'name': 'OVF_STORE', 'image_id': 
'06bb5f34-112d-4214-91d2-53d0bdb84321', 'id': 
'e48a66dd-74c9-43eb-890e-778e9c4ee8db'}, 'ansible_loop_var': 'item', 
'_ansible_item_label': {'name': 'OVF_STORE', 'image_id': 
'06bb5f34-112d-4214-91d2-53d0bdb84321', 'id': 
'e48a66dd-74c9-43eb-890e-778e9c4ee8db'}}
[ ERROR ] {'msg': 'non-zero return code', 'cmd': "vdsm-client Image prepare 
storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 
storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 
imageID=750428bd-1273-467f-9b27-7f6fe58a446c volumeID=1c89c678-f883-4e61-945c-5f7321add343 | 
grep path | awk '{ print $2 }' | xargs -I{} sudo -u vdsm dd if={} | tar -tvf - 
6023764f-5547-4b23-92ca-422eafdf3f87.ovf", 'stdout': '', 'stderr': "vdsm-client: 
Command Image.prepare with args {'storagepoolID': '06c58622-f99b-11ea-9122-00163e1bbc93', 
'storagedomainID': '2a4a3cce-f2f6-4ddd-b337-df5ef562f520', 'imageID': 
'750428bd-1273-467f-9b27-7f6fe58a446c',
'volumeID': '1c89c678-f883-4e61-945c-5f7321add343'} failed:\n(code=309, message=Unknown pool 
id, pool not connected: ('06c58622-f99b-11ea-9122-00163e1bbc93',))\ntar: This does not look 
like a tar archive\ntar: 6023764f-5547-4b23-92ca-422eafdf3f87.ovf: Not found in archive\ntar: 
Exiting with failure status due to previous errors", 'rc': 2, 'start': '2020-09-21 
17:16:26.030343', 'end': '2020-09-21 17:16:26.381862', 'delta': '0:00:00.351519', 'changed': 
True, 'failed': True, 'invocation': {'module_args': {'warn': False, '_raw_params': 
"vdsm-client Image prepare storagepoolID=06c58622-f99b-11ea-9122-00163e1bbc93 
storagedomainID=2a4a3cce-f2f6-4ddd-b337-df5ef562f520 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi again Strahil,

It’s oVirt 4.3.10. Same CPU on the entire cluster, it’s three machines with 
Xeon E5-2620v2 (Ivy Bridge), all the machines are identical in model and specs.

I’ve changed the VM CPU Model to:
Nehalem,+spec-ctrl,+ssbd

Let’s see how it behaves. If it crashes again I’ll definitely look at rolling 
back the OS updates.

Thank you all.

PS: I can try upgrading to 4.4.

> On 22 Sep 2020, at 04:28, Strahil Nikolov  wrote:
> 
> This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept 
> a pretty valid instruction and it was a bug in KVM.
> 
> Maybe you can try to :
> - power off the VM
> - pick an older CPU type for that VM only
> - power on and monitor in the next days 
> 
> Do you have a cluster with different cpu vendor (if currently on AMD -> Intel 
> and if currently Intel -> AMD)? Maybe you can move it to another cluster and 
> identify if the issue happens there too.
> 
> Another option is to try to rollback the windows updates , to identify if any 
> of them has caused the problem. Yet, that's aworkaround and not a fix .
> 
> 
> Are you using oVirt 4.3 or 4.4 ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil, yes I can’t find anything recently either. You digged way further 
> then me, I found some regressions on the kernel but I don’t know if it’s 
> related or not: 
> 
> 
> 
> https://patchwork.kernel.org/patch/5526561/
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027
> 
> 
> 
> 
> Regarding the OS, nothing new was installed, just regular Windows Updates.
> 
> And finally about nested virtualisation, it’s disabled on hypervisor.
> 
> 
> 
> 
> One thing that caught my attention on the link you’ve sent is regarding a 
> rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443
> 
> 
> 
> 
> But come on, it’s from 2006…
> 
> 
> 
> 
> Well, I’m up to other ideas, VM just crashed once again:
> 
> 
> 
> 
> EAX= EBX=075c5180 ECX=75432002 EDX=000400b6
> ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770
> EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
> ES =   00809300
> CS =9900 7ff99000  00809300
> SS =   00809300
> DS =   00809300
> FS =   00809300
> GS =   00809300
> LDT=  000f 
> TR =0040 075da000 0067 8b00
> GDT= 075dbfb0 0057
> IDT=  
> CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=
> DR0= DR1= DR2= 
> DR3= 
> DR6=4ff0 DR7=0400
> EFER=
> Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff 
> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
> ff
> 
> 
> 
> 
> [519192.536247] *** Guest State ***
> [519192.536275] CR0: actual=0x00050032, shadow=0x00050032, 
> gh_mask=fff7
> [519192.536324] CR4: actual=0x2050, shadow=0x, 
> gh_mask=f871
> [519192.537322] CR3 = 0x001ad002
> [519192.538166] RSP = 0xfb047db5d770  RIP = 0x8000
> [519192.539017] RFLAGS=0x0002 DR7 = 0x0400
> [519192.539861] Sysenter RSP= CS:RIP=:
> [519192.540690] CS:   sel=0x9900, attr=0x08093, limit=0x, 
> base=0x7ff99000
> [519192.541523] DS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.542356] SS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.543167] ES:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.543961] FS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.544747] GS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [519192.545511] GDTR:   limit=0x0057, 
> base=0xad01075dbfb0
> [519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, 
> base=0x
> [519192.547052] IDTR:   limit=0x, 
> base=0x
> [519192.547841] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
> base=0xad01075da000
> [519192.548639] EFER = 0x  PAT = 0x0007010600070106
> [519192.549460] DebugCtl = 0x  DebugExceptions = 
> 0x
> [519192.550302] Interruptibility = 0009  ActivityState = 
> [519192.551137] *** Host State ***
> [519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
> [519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
> [519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
> TRBase=88d45f2c4000
> [519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
> [519192.555347] 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi Gianluca.

On 22 Sep 2020, at 04:24, Gianluca Cecchi 
mailto:gianluca.cec...@gmail.com>> wrote:



On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not:

https://patchwork.kernel.org/patch/5526561/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027

Regarding the OS, nothing new was installed, just regular Windows Updates.
And finally about nested virtualisation, it’s disabled on hypervisor.



In your original post you wrote about the VM going suspended.
So I think there could be something useful in engine.log on the engine and/or 
vdsm.log on the hypervisor.
Could you check those?

Yes I goes to suspend. I think this is just the engine don’t knowing what 
really happened and guessing it was suspended. On engine.log I only have this 
two lines:

# grep "2020-09-22 01:51" /var/log/ovirt-engine/engine.log
2020-09-22 01:51:52,604-03 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] VM 
'351db98a-5f74-439f-99a4-31f611b2d250'(cerulean) moved from 'Up' --> 'Paused'
2020-09-22 01:51:52,699-03 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] 
EVENT_ID: VM_PAUSED(1,025), VM cerulean has been paused.

Note that I’ve “grepped” with time. There’s only this two lines when it crashed 
like 2h30m ago.

On vdsm.log on the near time with the name of the VM I only found an huge JSON, 
with the characteristics of the VM. If there something that I should check 
specifically? Tried some combinations of “grep” but nothing really useful.

Also, do you see anything in event viewer of the WIndows VM and/or in Freenas 
logs?

FreeNAS is just cool, nothing wrong there. No errors on dmesg, nor resource 
starvation on ZFS. No overload on the disks, nothing… the storage is running 
easy.

About Windows Event Viewer it’s my Achilles’ heel; nothing relevant either as 
far as I’m concerned. There’s of course some mentions of improperly shutdown 
due to the crash, but nothing else. I’m looking further here, will report back 
if I found something useful.

Thanks,


Gianluca

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XTTUYAGYB6EE5I3XNNLBZEBWY363XTIQ/


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Strahil Nikolov via Users
This looks much like my openBSD 6.6 under Latest AMD CPUs. KVM did not accept a 
pretty valid instruction and it was a bug in KVM.

Maybe you can try to :
- power off the VM
- pick an older CPU type for that VM only
- power on and monitor in the next days 

Do you have a cluster with different cpu vendor (if currently on AMD -> Intel 
and if currently Intel -> AMD)? Maybe you can move it to another cluster and 
identify if the issue happens there too.

Another option is to try to rollback the windows updates , to identify if any 
of them has caused the problem. Yet, that's aworkaround and not a fix .


Are you using oVirt 4.3 or 4.4 ?

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 10:08:44 Гринуич+3, Vinícius Ferrão 
 написа: 





Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not: 



https://patchwork.kernel.org/patch/5526561/

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027




Regarding the OS, nothing new was installed, just regular Windows Updates.

And finally about nested virtualisation, it’s disabled on hypervisor.




One thing that caught my attention on the link you’ve sent is regarding a 
rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443




But come on, it’s from 2006…




Well, I’m up to other ideas, VM just crashed once again:




EAX= EBX=075c5180 ECX=75432002 EDX=000400b6
ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =9900 7ff99000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 075da000 0067 8b00
GDT=     075dbfb0 0057
IDT=      
CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3= 
DR6=4ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff




[519192.536247] *** Guest State ***
[519192.536275] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[519192.536324] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[519192.537322] CR3 = 0x001ad002
[519192.538166] RSP = 0xfb047db5d770  RIP = 0x8000
[519192.539017] RFLAGS=0x0002         DR7 = 0x0400
[519192.539861] Sysenter RSP= CS:RIP=:
[519192.540690] CS:   sel=0x9900, attr=0x08093, limit=0x, 
base=0x7ff99000
[519192.541523] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.542356] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543167] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543961] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.544747] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.545511] GDTR:                           limit=0x0057, 
base=0xad01075dbfb0
[519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[519192.547052] IDTR:                           limit=0x, 
base=0x
[519192.547841] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xad01075da000
[519192.548639] EFER =     0x  PAT = 0x0007010600070106
[519192.549460] DebugCtl = 0x  DebugExceptions = 
0x
[519192.550302] Interruptibility = 0009  ActivityState = 
[519192.551137] *** Host State ***
[519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
[519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
TRBase=88d45f2c4000
[519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
[519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0
[519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0
[519192.557058] EFER = 0x0d01  PAT = 0x0007050600070106
[519192.557913] *** Control State ***
[519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[519192.559605] EntryControls=d1ff ExitControls=002fefff
[519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[519192.561306] VMEntry: intr_info= errcode=0006 ilen=
[519192.562158] VMExit: intr_info= errcode= ilen=0001
[519192.563006]         reason=8021 qualification=
[519192.563860] IDTVectoring: 

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Gianluca Cecchi
On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users 
wrote:

> Hi Strahil, yes I can’t find anything recently either. You digged way
> further then me, I found some regressions on the kernel but I don’t know if
> it’s related or not:
>
> https://patchwork.kernel.org/patch/5526561/
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027
>
> Regarding the OS, nothing new was installed, just regular Windows Updates.
> And finally about nested virtualisation, it’s disabled on hypervisor.
>
>
>
In your original post you wrote about the VM going suspended.
So I think there could be something useful in engine.log on the engine
and/or vdsm.log on the hypervisor.
Could you check those?
Also, do you see anything in event viewer of the WIndows VM and/or in
Freenas logs?

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X52ZUYHMIVBVFYWOSQDTTV75YYCHDC5L/


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not:

https://patchwork.kernel.org/patch/5526561/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027

Regarding the OS, nothing new was installed, just regular Windows Updates.
And finally about nested virtualisation, it’s disabled on hypervisor.

One thing that caught my attention on the link you’ve sent is regarding a 
rootkit: https://devblogs.microsoft.com/oldnewthing/20060421-12/?p=31443

But come on, it’s from 2006…

Well, I’m up to other ideas, VM just crashed once again:

EAX= EBX=075c5180 ECX=75432002 EDX=000400b6
ESI=c8ddc080 EDI=075d6800 EBP=a19bbdfe ESP=7db5d770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =9900 7ff99000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 075da000 0067 8b00
GDT= 075dbfb0 0057
IDT=  
CR0=00050032 CR2=242cb25a CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3=
DR6=4ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

[519192.536247] *** Guest State ***
[519192.536275] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[519192.536324] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[519192.537322] CR3 = 0x001ad002
[519192.538166] RSP = 0xfb047db5d770  RIP = 0x8000
[519192.539017] RFLAGS=0x0002 DR7 = 0x0400
[519192.539861] Sysenter RSP= CS:RIP=:
[519192.540690] CS:   sel=0x9900, attr=0x08093, limit=0x, 
base=0x7ff99000
[519192.541523] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.542356] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543167] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.543961] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.544747] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[519192.545511] GDTR:   limit=0x0057, 
base=0xad01075dbfb0
[519192.546275] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[519192.547052] IDTR:   limit=0x, 
base=0x
[519192.547841] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xad01075da000
[519192.548639] EFER = 0x  PAT = 0x0007010600070106
[519192.549460] DebugCtl = 0x  DebugExceptions = 
0x
[519192.550302] Interruptibility = 0009  ActivityState = 
[519192.551137] *** Host State ***
[519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
[519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
TRBase=88d45f2c4000
[519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
[519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0
[519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0
[519192.557058] EFER = 0x0d01  PAT = 0x0007050600070106
[519192.557913] *** Control State ***
[519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[519192.559605] EntryControls=d1ff ExitControls=002fefff
[519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[519192.561306] VMEntry: intr_info= errcode=0006 ilen=
[519192.562158] VMExit: intr_info= errcode= ilen=0001
[519192.563006] reason=8021 qualification=
[519192.563860] IDTVectoring: info= errcode=
[519192.564695] TSC Offset = 0xfffcc6c7d53f16d7
[519192.565526] TPR Threshold = 0x00
[519192.566345] EPT pointer = 0x000b9397901e
[519192.567162] PLE Gap=0080 Window=1000
[519192.567984] Virtual processor ID = 0x0005


Thank you!


On 22 Sep 2020, at 02:30, Strahil Nikolov 
mailto:hunter86...@yahoo.com>> wrote:

Interesting is that I don't find anything recent , but this one:
https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653

Can you check if anything in the OS was updated/changed recently ?

Also check if the VM is with nested virtualization enabled.

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão 
 написа:





Strahil, thank you man. We finally got some output:


[ovirt-users] Re: hosted engine migration

2020-09-22 Thread Strahil Nikolov via Users
So, let's summarize:

- Cannot migrate the HE due to "CPU policy".
- HE's CPU is westmere - just like hosts
- You have enough resources on the second HE host (both CPU + MEMORY)

What is the Cluster's CPU type (you can check in UI) ?

Maybe you should enable debugging on various locations to identify the issue.

Anything interesting in the libvirt's log for the HostedEngine.xml on the 
destination host ?


Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 05:37:18 Гринуич+3, ddqlo  
написа: 





Yes. I can. The host which does not host the HE could be reinstalled 
sucessfully in web UI. After this is done nothing has changed.






在 2020-09-22 03:08:18,"Strahil Nikolov"  写道:
>Can you put 1 host in maintenance and use the "Installation" -> "Reinstall" 
>and enable the HE deployment from one of the tabs ?
>
>Best Regards,
>Strahil Nikolov
>
>
>
>
>
>
>В понеделник, 21 септември 2020 г., 06:38:06 Гринуич+3, ddqlo  
>написа: 
>
>
>
>
>
>so strange! After I set global maintenance, powered off and started H The cpu 
>of HE became 'Westmere'(did not change anything). But HE still could not be 
>migrated.
>
>HE xml:
>  
>    Westmere
>    
>    
>    
>    
>    
>    
>    
>      
>    
>  
>
>host capabilities: 
>Westmere
>
>cluster cpu type (UI): 
>
>
>host cpu type (UI):
>
>
>HE cpu type (UI):
>
>
>
>
>
>
>
>在 2020-09-19 13:27:35,"Strahil Nikolov"  写道:
>>Hm... interesting.
>>
>>The VM is using 'Haswell-noTSX'  while the host is 'Westmere'.
>>
>>In my case I got no difference:
>>
>>[root@ovirt1 ~]# virsh  dumpxml HostedEngine | grep Opteron
>>   Opteron_G5
>>[root@ovirt1 ~]# virsh capabilities | grep Opteron
>> Opteron_G5
>>
>>Did you update the cluster holding the Hosted Engine ?
>>
>>
>>I guess you can try to:
>>
>>- Set global maintenance
>>- Power off the HostedEngine VM
>>- virsh dumpxml HostedEngine > /root/HE.xml
>>- use virsh edit to change the cpu of the HE (non-permanent) change
>>- try to power on the modified HE
>>
>>If it powers on , you can try to migrate it and if it succeeds - then you 
>>should make it permanent.
>>
>>
>>
>>
>>
>>Best Regards,
>>Strahil Nikolov
>>
>>В петък, 18 септември 2020 г., 04:40:39 Гринуич+3, ddqlo  
>>написа: 
>>
>>
>>
>>
>>
>>HE:
>>
>>
>>  HostedEngine
>>  b4e805ff-556d-42bd-a6df-02f5902fd01c
>>  http://ovirt.org/vm/tune/1.0; 
>>xmlns:ovirt-vm="http://ovirt.org/vm/1.0;>
>>    
>>    http://ovirt.org/vm/1.0;>
>>    4.3
>>    False
>>    false
>>    1024
>>    >type="int">1024
>>    auto_resume
>>    1600307555.19
>>    
>>        external
>>        
>>            4
>>        
>>    
>>    
>>        ovirtmgmt
>>        
>>            4
>>        
>>    
>>    
>>        
>>c17c1934-332f-464c-8f89-ad72463c00b3
>>        /dev/vda2
>>        
>>8eca143a-4535-4421-bd35-9f5764d67d70
>>        
>>----
>>        exclusive
>>        
>>ae961104-c3b3-4a43-9f46-7fa6bdc2ac33
>>        
>>            1
>>        
>>        
>>            
>>                
>>c17c1934-332f-464c-8f89-ad72463c00b3
>>                
>>8eca143a-4535-4421-bd35-9f5764d67d70
>>                >type="int">108003328
>>                
>>/dev/c17c1934-332f-464c-8f89-ad72463c00b3/leases
>>                
>>/rhev/data-center/mnt/blockSD/c17c1934-332f-464c-8f89-ad72463c00b3/images/8eca143a-4535-4421-bd35-9f5764d67d70/ae961104-c3b3-4a43-9f46-7fa6bdc2ac33
>>                
>>ae961104-c3b3-4a43-9f46-7fa6bdc2ac33
>>            
>>        
>>    
>>    
>>
>>  
>>  67108864
>>  16777216
>>  16777216
>>  64
>>  1
>>  
>>    /machine
>>  
>>  
>>    
>>      oVirt
>>      oVirt Node
>>      7-5.1804.el7.centos
>>      ----0CC47A6B3160
>>      b4e805ff-556d-42bd-a6df-02f5902fd01c
>>    
>>  
>>  
>>    hvm
>>    
>>    
>>    
>>  
>>  
>>    
>>  
>>  
>>    Haswell-noTSX
>>    
>>    
>>    
>>    
>>    
>>    
>>    
>>    
>>    
>>      
>>    
>>  
>>  
>>    
>>    
>>    
>>  
>>  destroy
>>  destroy
>>  destroy
>>  
>>    
>>    
>>  
>>  
>>    /usr/libexec/qemu-kvm
>>    
>>      
>>      
>>      
>>      
>>      
>>      
>>    
>>    
>>      >io='native' iothread='1'/>
>>      >dev='/var/run/vdsm/storage/c17c1934-332f-464c-8f89-ad72463c00b3/8eca143a-4535-4421-bd35-9f5764d67d70/ae961104-c3b3-4a43-9f46-7fa6bdc2ac33'>
>>        
>>      
>>      
>>      
>>      8eca143a-4535-4421-bd35-9f5764d67d70
>>      
>>      >function='0x0'/>
>>    
>>    
>>      
>>      
>>      >function='0x0'/>
>>    
>>    
>>      
>>      >function='0x1'/>
>>    
>>    
>>      
>>      >function='0x0'/>
>>    
>>    
>>      
>>      >function='0x2'/>
>>    
>>    
>>      
>>    
>>    
>>      c17c1934-332f-464c-8f89-ad72463c00b3
>>      ae961104-c3b3-4a43-9f46-7fa6bdc2ac33
>>      >offset='108003328'/>
>>    
>>    
>>      
>>      
>>      
>>      
>>      
>>      
>>      
>>      
>>      
>>      >function='0x0'/>
>>    
>>    
>>      
>>      
>>      
>>      
>>      
>>      
>>      
>>      
>>      
>>      

[ovirt-users] Re: oVirt - Gluster Node Offline but Bricks Active

2020-09-22 Thread Strahil Nikolov via Users
Have you restarted glusterd.service on the affected node.
glusterd is just management layer and it won't affect the brick processes.

Best Regards,
Strahil Nikolov






В вторник, 22 септември 2020 г., 01:43:36 Гринуич+3, Jeremey Wise 
 написа: 






Start is not an option.

It notes two bricks.  but command line denotes three bricks and all present

[root@odin thorst.penguinpages.local:_vmstore]# gluster volume status data
Status of volume: data
Gluster process                             TCP Port  RDMA Port  Online  Pid
--
Brick thorst.penguinpages.local:/gluster_br
icks/data/data                              49152     0          Y       33123
Brick odinst.penguinpages.local:/gluster_br
icks/data/data                              49152     0          Y       2970
Brick medusast.penguinpages.local:/gluster_
bricks/data/data                            49152     0          Y       2646
Self-heal Daemon on localhost               N/A       N/A        Y       3004
Self-heal Daemon on thorst.penguinpages.loc
al                                          N/A       N/A        Y       33230
Self-heal Daemon on medusast.penguinpages.l
ocal                                        N/A       N/A        Y       2475

Task Status of Volume data
--
There are no active volume tasks

[root@odin thorst.penguinpages.local:_vmstore]# gluster peer status
Number of Peers: 2

Hostname: thorst.penguinpages.local
Uuid: 7726b514-e7c3-4705-bbc9-5a90c8a966c9
State: Peer in Cluster (Connected)

Hostname: medusast.penguinpages.local
Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
State: Peer in Cluster (Connected)
[root@odin thorst.penguinpages.local:_vmstore]#




On Mon, Sep 21, 2020 at 4:32 PM Strahil Nikolov  wrote:
> Just select the volume and press "start" . It will automatically mark "force 
> start" and will fix itself.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 20:53:15 Гринуич+3, Jeremey Wise 
>  написа: 
> 
> 
> 
> 
> 
> 
> oVirt engine shows  one of the gluster servers having an issue.  I did a 
> graceful shutdown of all three nodes over weekend as I have to move around 
> some power connections in prep for UPS.
> 
> Came back up.. but
> 
> 
> 
> And this is reflected in 2 bricks online (should be three for each volume)
> 
> 
> Command line shows gluster should be happy.
> 
> [root@thor engine]# gluster peer status
> Number of Peers: 2
> 
> Hostname: odinst.penguinpages.local
> Uuid: 83c772aa-33cd-430f-9614-30a99534d10e
> State: Peer in Cluster (Connected)
> 
> Hostname: medusast.penguinpages.local
> Uuid: 977b2c1d-36a8-4852-b953-f75850ac5031
> State: Peer in Cluster (Connected)
> [root@thor engine]#
> 
> # All bricks showing online
> [root@thor engine]# gluster volume status
> Status of volume: data
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/data/data                              49152     0          Y       11001
> Brick odinst.penguinpages.local:/gluster_br
> icks/data/data                              49152     0          Y       2970
> Brick medusast.penguinpages.local:/gluster_
> bricks/data/data                            49152     0          Y       2646
> Self-heal Daemon on localhost               N/A       N/A        Y       50560
> Self-heal Daemon on odinst.penguinpages.loc
> al                                          N/A       N/A        Y       3004
> Self-heal Daemon on medusast.penguinpages.l
> ocal                                        N/A       N/A        Y       2475
> 
> Task Status of Volume data
> --
> There are no active volume tasks
> 
> Status of volume: engine
> Gluster process                             TCP Port  RDMA Port  Online  Pid
> --
> Brick thorst.penguinpages.local:/gluster_br
> icks/engine/engine                          49153     0          Y       11012
> Brick odinst.penguinpages.local:/gluster_br
> icks/engine/engine                          49153     0          Y       2982
> Brick medusast.penguinpages.local:/gluster_
> bricks/engine/engine                        49153     0          Y       2657
> Self-heal Daemon on localhost               N/A       N/A        Y       50560
> Self-heal Daemon on odinst.penguinpages.loc
> al                                          N/A       N/A        Y       3004
> Self-heal Daemon on medusast.penguinpages.l
> ocal                                        N/A       N/A        Y       2475
> 
> Task Status of Volume engine
> --
> There are no active