[ovirt-users] Re: Random reboots

2022-02-17 Thread Strahil Nikolov via Users
As the rest of the cluster didn't have issues (check dmesg on the Hypervisors), 
in 99% of the cases it's network.Check Server NICs, enclosure network devices 
,switches, backup running during the same time .
I would start with any firmware upgrades of the server (if there are any).
Best Regards,Strahil Nikolov 
 
  On Thu, Feb 17, 2022 at 18:22, Pablo Olivera wrote:   Hi 
Nir,

Thank you very much for all the help and information.
We will continue to investigate the NFS server side. To find what may be 
causing one of the hosts to lose access to storage.
The strange thing is that it happens only on one of the NFS client hosts 
and not on all of them at the same time.

Pablo.


El 17/02/2022 a las 11:02, Nir Soffer escribió:
> On Thu, Feb 17, 2022 at 11:58 AM Nir Soffer  wrote:
>> On Thu, Feb 17, 2022 at 11:20 AM Pablo Olivera  wrote:
>>> Hi Nir,
>>>
>>>
>>> Thank you very much for your detailed explanations.
>>>
>>> The pid 6398 looks like it's HostedEngine:
>>>
>>> audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): pid=3629 
>>> uid=0 auid=4294967295 ses=4294967295 
>>> subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start 
>>> reason=booted vm="HostedEngine" uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc 
>>> vm-pid=6398 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
>>> res=success'UID="root" AUID="unset"
>>>
>>> So, I understand that SanLock has problems with the storage (it loses 
>>> connection with NFS storage). The watchdog begins to check connectivity 
>>> with the MV and after the established time, the order to
>>> reboot the machine.
>>>
>>> I don't know if I can somehow increase these timeouts, or try to make 
>>> sanlock force the reconnection or renewal with the storage and in this way 
>>> try to avoid host reboots for this reason.
>> You can do one of these:
>> 1. Use lower timeouts on the NFS server mount, so the NFS server at
>> the same time
>>    the sanlock lease times out.
>> 2. Use larger sanlock timeout so sanlock lease time out when the NFS
>> server times out.
>> 3. Both 1 and 2
>>
>> The problem is that NFS timeouts are not predictable. In the past we used:
>> "timeo=600,retrans=6" which can lead to 21 minutes timeout, but practically
>> we saw up to a 30 minutes timeout.
>>
>> In 
>> https://github.com/oVirt/vdsm/commit/672a98bbf3e55d1077669f06c37305185fbdc289
>> we change this to the recommended seting:
>> "timeo=100,retrans=3"
>>
>> Which according to the docs, should fail in 60 seconds if all retries
>> fail. But practically we
>> saw up to 270 seconds timeout with this setting, which does not play
>> well with sanlock.
>>
>> We assumed that the timeout value should not be less than sanlock io timeout
>> (10 seconds) but I'm not sure this assumption is correct.
>>
>> You can smaller timeout value in engine storage domain
>> "custom connections parameters"
>> - Retransmissions - mapped to "retrans" mount option
>> - Timeout (deciseconds) - mapped to "timeo" mount option
>>
>> For example:
>> Retransmissions: 3
>> Timeout: 5
> Correction:
>
>      Timeout: 50 (5 seconds, 50 deciseconds)
>
>> Theoretically this will behave like this:
>>
>>      00:00  retry 1 (5 seconds timeout)
>>      00:10  retry 2 (10 seconds timeout)
>>      00:30  retry 3 (15 seconds timeout)
>>      00:45  request fail
>>
>> But based on what we see with the defaults, this is likely to take more time.
>> If it fails before 140 seconds, the VM will be killed and the host
>> will not reboot.
>>
>> The other way is to increase sanlock timeout, in vdsm configuration.
>> note that changing sanlock timeout requires also changing other
>> settings (e.g. spm:watchdog_interval).
>>
>> Add this file on all hosts:
>>
>> $ cat /etc/vdsm/vdsm.conf.d/99-local.conf
>> [spm]
>>
>> # If enabled, montior the SPM lease status and panic if the lease
>> # status is not expected. The SPM host will lose the SPM role, and
>> # engine will select a new SPM host. (default true)
>> # watchdog_enable = true
>>
>> # Watchdog check internal in seconds. The recommended value is
>> # sanlock:io_timeout * 2. (default 20)
>> watchdog_interval = 40
>>
>> [sanlock]
>>
>> # I/O timeout in seconds. All sanlock timeouts are computed based on
>> # this value. Using larger timeout will make VMs more resilient to
>> # short storage outage, but increase VM failover time and the time to
>> # acquire a host id. For more info on sanlock timeouts please check
>> # sanlock source:
>> # https://pagure.io/sanlock/raw/master/f/src/timeouts.h. If your
>> # storage requires larger timeouts, you can increase the value to 15
>> # or 20 seconds. If you change this you need to update also multipath
>> # no_path_retry. For more info onconfiguring multipath please check
>> # /etc/multipath.conf.oVirt is tested only with the default value (10
>> # seconds)
>> io_timeout = 20
>>
>>
>> You can check https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md
>> to learn more about sanlock timeouts.
>>
>> Alternatively, you can 

[ovirt-users] Re: Random reboots

2022-02-17 Thread Pablo Olivera

Hi Nir,

Thank you very much for all the help and information.
We will continue to investigate the NFS server side. To find what may be 
causing one of the hosts to lose access to storage.
The strange thing is that it happens only on one of the NFS client hosts 
and not on all of them at the same time.


Pablo.


El 17/02/2022 a las 11:02, Nir Soffer escribió:

On Thu, Feb 17, 2022 at 11:58 AM Nir Soffer  wrote:

On Thu, Feb 17, 2022 at 11:20 AM Pablo Olivera  wrote:

Hi Nir,


Thank you very much for your detailed explanations.

The pid 6398 looks like it's HostedEngine:

audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): pid=3629 uid=0 auid=4294967295 ses=4294967295 
subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start reason=booted vm="HostedEngine" 
uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc vm-pid=6398 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
res=success'UID="root" AUID="unset"

So, I understand that SanLock has problems with the storage (it loses 
connection with NFS storage). The watchdog begins to check connectivity with 
the MV and after the established time, the order to
reboot the machine.

I don't know if I can somehow increase these timeouts, or try to make sanlock 
force the reconnection or renewal with the storage and in this way try to avoid 
host reboots for this reason.

You can do one of these:
1. Use lower timeouts on the NFS server mount, so the NFS server at
the same time
the sanlock lease times out.
2. Use larger sanlock timeout so sanlock lease time out when the NFS
server times out.
3. Both 1 and 2

The problem is that NFS timeouts are not predictable. In the past we used:
"timeo=600,retrans=6" which can lead to 21 minutes timeout, but practically
we saw up to a 30 minutes timeout.

In https://github.com/oVirt/vdsm/commit/672a98bbf3e55d1077669f06c37305185fbdc289
we change this to the recommended seting:
"timeo=100,retrans=3"

Which according to the docs, should fail in 60 seconds if all retries
fail. But practically we
saw up to 270 seconds timeout with this setting, which does not play
well with sanlock.

We assumed that the timeout value should not be less than sanlock io timeout
(10 seconds) but I'm not sure this assumption is correct.

You can smaller timeout value in engine storage domain
"custom connections parameters"
- Retransmissions - mapped to "retrans" mount option
- Timeout (deciseconds) - mapped to "timeo" mount option

For example:
Retransmissions: 3
Timeout: 5

Correction:

 Timeout: 50 (5 seconds, 50 deciseconds)


Theoretically this will behave like this:

 00:00   retry 1 (5 seconds timeout)
 00:10   retry 2 (10 seconds timeout)
 00:30   retry 3 (15 seconds timeout)
 00:45   request fail

But based on what we see with the defaults, this is likely to take more time.
If it fails before 140 seconds, the VM will be killed and the host
will not reboot.

The other way is to increase sanlock timeout, in vdsm configuration.
note that changing sanlock timeout requires also changing other
settings (e.g. spm:watchdog_interval).

Add this file on all hosts:

$ cat /etc/vdsm/vdsm.conf.d/99-local.conf
[spm]

# If enabled, montior the SPM lease status and panic if the lease
# status is not expected. The SPM host will lose the SPM role, and
# engine will select a new SPM host. (default true)
# watchdog_enable = true

# Watchdog check internal in seconds. The recommended value is
# sanlock:io_timeout * 2. (default 20)
watchdog_interval = 40

[sanlock]

# I/O timeout in seconds. All sanlock timeouts are computed based on
# this value. Using larger timeout will make VMs more resilient to
# short storage outage, but increase VM failover time and the time to
# acquire a host id. For more info on sanlock timeouts please check
# sanlock source:
# https://pagure.io/sanlock/raw/master/f/src/timeouts.h. If your
# storage requires larger timeouts, you can increase the value to 15
# or 20 seconds. If you change this you need to update also multipath
# no_path_retry. For more info onconfiguring multipath please check
# /etc/multipath.conf.oVirt is tested only with the default value (10
# seconds)
io_timeout = 20


You can check https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md
to learn more about sanlock timeouts.

Alternatively, you can make a small change in NFS timeout and small change in
sanlock timeout to make them work better together.

All this is of course to handle the case when the NFS server is not accessible,
but this is something that should not happen in a healthy cluster. You need
to check why the server was not accessible and fix this problem.

Nir

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 

[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread emiliano . pozzessere
Thanks.
Best regards
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KYLCWJIFYGMDAB6HRU42CTOMZSN54VK2/


[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread Angus Clarke
oVirt's ilo4 fence agent is not dependant on kdump - I think this answers your 
question.

Regards
Angus


From: emiliano.pozzess...@satservizi.eu 
Sent: 17 February 2022 14:19
To: users@ovirt.org 
Subject: [ovirt-users] Re: Enable Power Management Ovirt 4.3

Hi Angus,
is kdump integration recommended to enable it?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.htmldata=04%7C01%7C%7C4e1db7191be440e6d2c208d9f218697d%7C84df9e7fe9f640afb435%7C1%7C0%7C637807008946322331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FAeN7YX2VL1dkox3EkgKTDV8NJItB6Br1jpGvsc8NyU%3Dreserved=0
oVirt Code of Conduct: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2Fdata=04%7C01%7C%7C4e1db7191be440e6d2c208d9f218697d%7C84df9e7fe9f640afb435%7C1%7C0%7C637807008946322331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=7OTl6RDFszxkign8ovRqyXSv%2BRgGDSuOwey7%2F%2BuC8eE%3Dreserved=0
List Archives: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FMAJKG4V53GRUOW4XJGWS3KOAGBUB3ZVO%2Fdata=04%7C01%7C%7C4e1db7191be440e6d2c208d9f218697d%7C84df9e7fe9f640afb435%7C1%7C0%7C637807008946322331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=xwkhKU0Zx3L9HkfR%2F3X99k8NahOAP63dr%2Fsj3wcKAdg%3Dreserved=0


From: emiliano.pozzess...@satservizi.eu 
Sent: 17 February 2022 14:19
To: users@ovirt.org 
Subject: [ovirt-users] Re: Enable Power Management Ovirt 4.3

Hi Angus,
is kdump integration recommended to enable it?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.htmldata=04%7C01%7C%7C4e1db7191be440e6d2c208d9f218697d%7C84df9e7fe9f640afb435%7C1%7C0%7C637807008946322331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FAeN7YX2VL1dkox3EkgKTDV8NJItB6Br1jpGvsc8NyU%3Dreserved=0
oVirt Code of Conduct: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2Fdata=04%7C01%7C%7C4e1db7191be440e6d2c208d9f218697d%7C84df9e7fe9f640afb435%7C1%7C0%7C637807008946322331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=7OTl6RDFszxkign8ovRqyXSv%2BRgGDSuOwey7%2F%2BuC8eE%3Dreserved=0
List Archives: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FMAJKG4V53GRUOW4XJGWS3KOAGBUB3ZVO%2Fdata=04%7C01%7C%7C4e1db7191be440e6d2c208d9f218697d%7C84df9e7fe9f640afb435%7C1%7C0%7C637807008946322331%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=xwkhKU0Zx3L9HkfR%2F3X99k8NahOAP63dr%2Fsj3wcKAdg%3Dreserved=0
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4LL35BX3SVSEFKEWLX53R7OPAFICYBEV/


[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread emiliano . pozzessere
Hi Angus,
is kdump integration recommended to enable it?
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MAJKG4V53GRUOW4XJGWS3KOAGBUB3ZVO/


[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread emiliano . pozzessere
Ok thanks for support :-)
Best regards
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TIARLE5I7L3LGL2YFNQ47BGCWI7WZ7JM/


[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread Angus Clarke
There is a fence agent in oVirt 4.3 called "ilo4" - use this agent for HPE-ilo4 
devices. You should not need any Options.

[cid:9a900aa1-5cac-4a09-b5cf-15076a6c087d]

Regards
Angus


From: emiliano.pozzess...@satservizi.eu 
Sent: 17 February 2022 13:25
To: users@ovirt.org 
Subject: [ovirt-users] Re: Enable Power Management Ovirt 4.3

Thanks for reply Angus,
IPMI/DCMI over LAN is enable
On ILO4 I created account for Ovirt with full credential and now it's works 
with ILO4
The ports and protocol are enable, but Ovirt works only ip_ssh and ILO4, no 
ipmilan.
Do you know which keys I have to set in the option section of the agent fence 
on Ovirt?
Thanks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.htmldata=04%7C01%7C%7C59f2e197198542073b2f08d9f2113820%7C84df9e7fe9f640afb435%7C1%7C0%7C637806978049711882%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=%2FvyPn%2BuymUuQnStktr7%2Bwmgj2e705mvhPBiafsDNOi4%3Dreserved=0
oVirt Code of Conduct: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2Fdata=04%7C01%7C%7C59f2e197198542073b2f08d9f2113820%7C84df9e7fe9f640afb435%7C1%7C0%7C637806978049711882%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=U8asV%2BfeHus55GnjljpWmxPcaEGBkWucvdlYKcmjl1c%3Dreserved=0
List Archives: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FFI6XRHU2QISRMZMGFM3B7VFXX2AQCY4O%2Fdata=04%7C01%7C%7C59f2e197198542073b2f08d9f2113820%7C84df9e7fe9f640afb435%7C1%7C0%7C637806978049711882%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=zZmxxALq63ukiCinIsoIAl2VAC5YuYwoIKRNGgyTBZg%3Dreserved=0
Y
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5IEWLRBBFBJPX7HE3MPZHNYNG6OSVTQR/


[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread emiliano . pozzessere
Thanks for reply Angus, 
IPMI/DCMI over LAN is enable
On ILO4 I created account for Ovirt with full credential and now it's works 
with ILO4
The ports and protocol are enable, but Ovirt works only ip_ssh and ILO4, no 
ipmilan.
Do you know which keys I have to set in the option section of the agent fence 
on Ovirt?
Thanks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FI6XRHU2QISRMZMGFM3B7VFXX2AQCY4O/


[ovirt-users] Re: Cannot add Virtual Disk. Disk configuration (RAW Sparse backup-None) is incompatible with the storage domain type.

2022-02-17 Thread Benny Zlotnik
The referenced bug[1] was fixed in 4.4.9, the workarounds mentioned
are to use web admin or API to create the disks

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1957830

On Thu, Feb 17, 2022 at 1:06 PM  wrote:
>
> Hi,
>
> We're using oVirt 4.4.8.6. We make an intensive use of the VM portal
> because we've hundreds of students creating their own VMs. Recently, one
> of the professors reported that they are encountering an error when
> adding a disk to a newly created VM.
>
> They are creating ISO based VMs (CentOS-8-Stream in this case),
> everything goes smoothly but when adding a thin-provisioned disk, this
> error shows up:
>
>2022-02-17 09:56:28,073Z INFO
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default
> task-39078) [0332e7b6-80b1-48e9-b849-80698f2ce7ab] Lock Acquired to
> object
> 'EngineLock:{exclusiveLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM_DISK_BOOT]',
> sharedLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM]'}'
>2022-02-17 09:56:28,446Z WARN
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default
> task-39078) [0332e7b6-80b1-48e9-b849-80698f2ce7ab] Validation of action
> 'AddDisk' failed for user aluX@domain-authz. Reasons:
> VAR__ACTION__ADD,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_CONFIGURATION_NOT_SUPPORTED,$volumeFormat
> RAW,$volumeType Sparse,$backup None
>2022-02-17 09:56:28,446Z INFO
> [org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default
> task-39078) [0332e7b6-80b1-48e9-b849-80698f2ce7ab] Lock freed to object
> 'EngineLock:{exclusiveLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM_DISK_BOOT]',
> sharedLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM]'}'
>2022-02-17 09:56:28,504Z ERROR
> [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default
> task-39078) [] Operation Failed: [Cannot add Virtual Disk. Disk
> configuration (RAW Sparse backup-None) is incompatible with the storage
> domain type.]
>
> However, changing the provisioning to thick does work and the disk can
> be added.
>
> I found [1] which talks about this but I'm not sure if it's the same
> issue, nor it has a solution yet.
>
> Is this a known bug? Does it have any workaround beyond creating
> thick-provisioned disks?
>
> Thanks.
>
> Nicolás
>
>[1]: https://access.redhat.com/solutions/6022811
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPGUDWMDPHWTTTECVEYH57XL5RPXJ7CY/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z7MJCPIQKKTF6VZWG2RU35RWVUSGVSVM/


[ovirt-users] Cannot add Virtual Disk. Disk configuration (RAW Sparse backup-None) is incompatible with the storage domain type.

2022-02-17 Thread nicolas

Hi,

We're using oVirt 4.4.8.6. We make an intensive use of the VM portal 
because we've hundreds of students creating their own VMs. Recently, one 
of the professors reported that they are encountering an error when 
adding a disk to a newly created VM.


They are creating ISO based VMs (CentOS-8-Stream in this case), 
everything goes smoothly but when adding a thin-provisioned disk, this 
error shows up:


  2022-02-17 09:56:28,073Z INFO  
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default 
task-39078) [0332e7b6-80b1-48e9-b849-80698f2ce7ab] Lock Acquired to 
object 
'EngineLock:{exclusiveLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM_DISK_BOOT]', 
sharedLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM]'}'
  2022-02-17 09:56:28,446Z WARN  
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default 
task-39078) [0332e7b6-80b1-48e9-b849-80698f2ce7ab] Validation of action 
'AddDisk' failed for user aluX@domain-authz. Reasons: 
VAR__ACTION__ADD,VAR__TYPE__DISK,ACTION_TYPE_FAILED_DISK_CONFIGURATION_NOT_SUPPORTED,$volumeFormat 
RAW,$volumeType Sparse,$backup None
  2022-02-17 09:56:28,446Z INFO  
[org.ovirt.engine.core.bll.storage.disk.AddDiskCommand] (default 
task-39078) [0332e7b6-80b1-48e9-b849-80698f2ce7ab] Lock freed to object 
'EngineLock:{exclusiveLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM_DISK_BOOT]', 
sharedLocks='[e4a02ab9-31e4-4e8c-8999-91700263ff08=VM]'}'
  2022-02-17 09:56:28,504Z ERROR 
[org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default 
task-39078) [] Operation Failed: [Cannot add Virtual Disk. Disk 
configuration (RAW Sparse backup-None) is incompatible with the storage 
domain type.]


However, changing the provisioning to thick does work and the disk can 
be added.


I found [1] which talks about this but I'm not sure if it's the same 
issue, nor it has a solution yet.


Is this a known bug? Does it have any workaround beyond creating 
thick-provisioned disks?


Thanks.

Nicolás

  [1]: https://access.redhat.com/solutions/6022811
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OPGUDWMDPHWTTTECVEYH57XL5RPXJ7CY/


[ovirt-users] Re: Random reboots

2022-02-17 Thread Nir Soffer
On Thu, Feb 17, 2022 at 11:58 AM Nir Soffer  wrote:
>
> On Thu, Feb 17, 2022 at 11:20 AM Pablo Olivera  wrote:
> >
> > Hi Nir,
> >
> >
> > Thank you very much for your detailed explanations.
> >
> > The pid 6398 looks like it's HostedEngine:
> >
> > audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): pid=3629 
> > uid=0 auid=4294967295 ses=4294967295 
> > subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start 
> > reason=booted vm="HostedEngine" uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc 
> > vm-pid=6398 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
> > res=success'UID="root" AUID="unset"
> >
> > So, I understand that SanLock has problems with the storage (it loses 
> > connection with NFS storage). The watchdog begins to check connectivity 
> > with the MV and after the established time, the order to
> > reboot the machine.
> >
> > I don't know if I can somehow increase these timeouts, or try to make 
> > sanlock force the reconnection or renewal with the storage and in this way 
> > try to avoid host reboots for this reason.
>
> You can do one of these:
> 1. Use lower timeouts on the NFS server mount, so the NFS server at
> the same time
>the sanlock lease times out.
> 2. Use larger sanlock timeout so sanlock lease time out when the NFS
> server times out.
> 3. Both 1 and 2
>
> The problem is that NFS timeouts are not predictable. In the past we used:
> "timeo=600,retrans=6" which can lead to 21 minutes timeout, but practically
> we saw up to a 30 minutes timeout.
>
> In 
> https://github.com/oVirt/vdsm/commit/672a98bbf3e55d1077669f06c37305185fbdc289
> we change this to the recommended seting:
> "timeo=100,retrans=3"
>
> Which according to the docs, should fail in 60 seconds if all retries
> fail. But practically we
> saw up to 270 seconds timeout with this setting, which does not play
> well with sanlock.
>
> We assumed that the timeout value should not be less than sanlock io timeout
> (10 seconds) but I'm not sure this assumption is correct.
>
> You can smaller timeout value in engine storage domain
> "custom connections parameters"
> - Retransmissions - mapped to "retrans" mount option
> - Timeout (deciseconds) - mapped to "timeo" mount option
>
> For example:
> Retransmissions: 3
> Timeout: 5

Correction:

Timeout: 50 (5 seconds, 50 deciseconds)

>
> Theoretically this will behave like this:
>
> 00:00   retry 1 (5 seconds timeout)
> 00:10   retry 2 (10 seconds timeout)
> 00:30   retry 3 (15 seconds timeout)
> 00:45   request fail
>
> But based on what we see with the defaults, this is likely to take more time.
> If it fails before 140 seconds, the VM will be killed and the host
> will not reboot.
>
> The other way is to increase sanlock timeout, in vdsm configuration.
> note that changing sanlock timeout requires also changing other
> settings (e.g. spm:watchdog_interval).
>
> Add this file on all hosts:
>
> $ cat /etc/vdsm/vdsm.conf.d/99-local.conf
> [spm]
>
> # If enabled, montior the SPM lease status and panic if the lease
> # status is not expected. The SPM host will lose the SPM role, and
> # engine will select a new SPM host. (default true)
> # watchdog_enable = true
>
> # Watchdog check internal in seconds. The recommended value is
> # sanlock:io_timeout * 2. (default 20)
> watchdog_interval = 40
>
> [sanlock]
>
> # I/O timeout in seconds. All sanlock timeouts are computed based on
> # this value. Using larger timeout will make VMs more resilient to
> # short storage outage, but increase VM failover time and the time to
> # acquire a host id. For more info on sanlock timeouts please check
> # sanlock source:
> # https://pagure.io/sanlock/raw/master/f/src/timeouts.h. If your
> # storage requires larger timeouts, you can increase the value to 15
> # or 20 seconds. If you change this you need to update also multipath
> # no_path_retry. For more info onconfiguring multipath please check
> # /etc/multipath.conf.oVirt is tested only with the default value (10
> # seconds)
> io_timeout = 20
>
>
> You can check https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md
> to learn more about sanlock timeouts.
>
> Alternatively, you can make a small change in NFS timeout and small change in
> sanlock timeout to make them work better together.
>
> All this is of course to handle the case when the NFS server is not 
> accessible,
> but this is something that should not happen in a healthy cluster. You need
> to check why the server was not accessible and fix this problem.
>
> Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5MSXZ6PCKQFTMCC3KIFJJWZJXAKCPIAP/


[ovirt-users] Re: Random reboots

2022-02-17 Thread Nir Soffer
On Thu, Feb 17, 2022 at 11:20 AM Pablo Olivera  wrote:
>
> Hi Nir,
>
>
> Thank you very much for your detailed explanations.
>
> The pid 6398 looks like it's HostedEngine:
>
> audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): pid=3629 
> uid=0 auid=4294967295 ses=4294967295 
> subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start 
> reason=booted vm="HostedEngine" uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc 
> vm-pid=6398 exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
> res=success'UID="root" AUID="unset"
>
> So, I understand that SanLock has problems with the storage (it loses 
> connection with NFS storage). The watchdog begins to check connectivity with 
> the MV and after the established time, the order to
> reboot the machine.
>
> I don't know if I can somehow increase these timeouts, or try to make sanlock 
> force the reconnection or renewal with the storage and in this way try to 
> avoid host reboots for this reason.

You can do one of these:
1. Use lower timeouts on the NFS server mount, so the NFS server at
the same time
   the sanlock lease times out.
2. Use larger sanlock timeout so sanlock lease time out when the NFS
server times out.
3. Both 1 and 2

The problem is that NFS timeouts are not predictable. In the past we used:
"timeo=600,retrans=6" which can lead to 21 minutes timeout, but practically
we saw up to a 30 minutes timeout.

In https://github.com/oVirt/vdsm/commit/672a98bbf3e55d1077669f06c37305185fbdc289
we change this to the recommended seting:
"timeo=100,retrans=3"

Which according to the docs, should fail in 60 seconds if all retries
fail. But practically we
saw up to 270 seconds timeout with this setting, which does not play
well with sanlock.

We assumed that the timeout value should not be less than sanlock io timeout
(10 seconds) but I'm not sure this assumption is correct.

You can smaller timeout value in engine storage domain
"custom connections parameters"
- Retransmissions - mapped to "retrans" mount option
- Timeout (deciseconds) - mapped to "timeo" mount option

For example:
Retransmissions: 3
Timeout: 5

Theoretically this will behave like this:

00:00   retry 1 (5 seconds timeout)
00:10   retry 2 (10 seconds timeout)
00:30   retry 3 (15 seconds timeout)
00:45   request fail

But based on what we see with the defaults, this is likely to take more time.
If it fails before 140 seconds, the VM will be killed and the host
will not reboot.

The other way is to increase sanlock timeout, in vdsm configuration.
note that changing sanlock timeout requires also changing other
settings (e.g. spm:watchdog_interval).

Add this file on all hosts:

$ cat /etc/vdsm/vdsm.conf.d/99-local.conf
[spm]

# If enabled, montior the SPM lease status and panic if the lease
# status is not expected. The SPM host will lose the SPM role, and
# engine will select a new SPM host. (default true)
# watchdog_enable = true

# Watchdog check internal in seconds. The recommended value is
# sanlock:io_timeout * 2. (default 20)
watchdog_interval = 40

[sanlock]

# I/O timeout in seconds. All sanlock timeouts are computed based on
# this value. Using larger timeout will make VMs more resilient to
# short storage outage, but increase VM failover time and the time to
# acquire a host id. For more info on sanlock timeouts please check
# sanlock source:
# https://pagure.io/sanlock/raw/master/f/src/timeouts.h. If your
# storage requires larger timeouts, you can increase the value to 15
# or 20 seconds. If you change this you need to update also multipath
# no_path_retry. For more info onconfiguring multipath please check
# /etc/multipath.conf.oVirt is tested only with the default value (10
# seconds)
io_timeout = 20


You can check https://github.com/oVirt/vdsm/blob/master/doc/io-timeouts.md
to learn more about sanlock timeouts.

Alternatively, you can make a small change in NFS timeout and small change in
sanlock timeout to make them work better together.

All this is of course to handle the case when the NFS server is not accessible,
but this is something that should not happen in a healthy cluster. You need
to check why the server was not accessible and fix this problem.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XRJXOF3CSDKBKN3ZH3BWAKCWCZ3XETC2/


[ovirt-users] Re: Random reboots

2022-02-17 Thread Jiří Sléžka

On 2/16/22 23:37, Nir Soffer wrote:

On Wed, Feb 16, 2022 at 9:18 PM Nir Soffer  wrote:


On Wed, Feb 16, 2022 at 5:12 PM Nir Soffer  wrote:


On Wed, Feb 16, 2022 at 10:10 AM Pablo Olivera  wrote:


Hi community,

We're dealing with an issue as we occasionally have random reboots on
any of our hosts.
We're using ovirt 4.4.3 in production with about 60 VM distributed over
5 hosts. We've a virtualized engine and a DRBD storage mounted by NFS.
The infrastructure is interconnected by a Cisco 9000 switch.
The last random reboot was yesterday February 14th at 03:03 PM (in the
log it appears as: 15:03 due to our time configuration) of the host:
'nodo1'.
At the moment of the reboot we detected in the log of the switch a
link-down in the port where the host is connected.
I attach log of the engine and host 'nodo1' in case you can help us to
find the cause of these random reboots.



According to messages:

1. Sanlock could not renew the lease for 80 seconds:

Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
[2017]: s1 check_our_lease failed 80


2. In this case sanlock must terminate the processes holding a lease
on the that storage - I guess that pid 6398 is vdsm.

Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
[2017]: s1 kill 6398 sig 15 count 1
Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655258
[2017]: s1 kill 6398 sig 15 count 2


pid 6398 is not vdsm:

Feb 14 15:02:51 nodo1 vdsm[4338]

The fact that we see "sig 15" means sanlock is trying to send SIGTERM.
If pid 6398 is a VM (hosted engine vm?) we would expect to see:


[2017]: s1 kill 6398 sig 100 count 1


Exactly once - which means run the killpath program registered by libvirt,
which will terminate the vm.


I reproduce this issue locally - we never use killpath program, because we
don't configure libvirt on_lockfailure in the domain xml.

So we get the default behavior, which is sanlock terminating the vm.



So my guess is that this is not a VM, so the only other option is hosted
engine broker, using a lease on the whiteboard.


...
Feb 14 15:03:36 nodo1 sanlock[2017]: 2022-02-14 15:03:36 1655288
[2017]: s1 kill 6398 sig 15 count 32

3. Terminating pid 6398 stopped here, and we see:

Feb 14 15:03:36 nodo1 wdmd[2033]: test failed rem 19 now 1655288 ping
1655237 close 1655247 renewal 1655177 expire 1655257 client 2017
sanlock_a5c35d19-4c34-4571-ac77-1b10de484426:1


According to David, this means we have 19 more attempts to kill the process
holding the lease.



4. So it looks like wdmd rebooted the host.

Feb 14 15:08:09 nodo1 kernel: Linux version
4.18.0-193.28.1.el8_2.x86_64 (mockbu...@kbuilder.bsys.centos.org) (gcc
version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Oct 22
00:20:22 UTC 2020


This is strange, since sanlock should try to kill pid 6398 40 times,
and then switch
to SIGKILL. The watchdog should not reboot the host before sanlock
finish the attempt to kill the processes.

David, do you think this is expected? do we have any issue in sanlock?


I discussed it with David (sanlock author). What we see here may be truncated
logs when a host is rebooted by the watchdog. The last time logs were synced
to storage was probably Feb 14 15:03:36. Any message written after that was
lost in the host page cache.



It is possible that sanlock will not be able to terminate a process if
the process is blocked on inaccessible storage. This seems to be the
case here.

In vdsm log we see that storage is indeed inaccessible:

2022-02-14 15:03:03,149+0100 WARN  (check/loop) [storage.check]
Checker 
'/rhev/data-center/mnt/newstoragedrbd.andromeda.com:_var_nfsshare_data/a5c35d19-4c34-4571-ac77-1b10de484426/dom_md/metadata'
is blocked for 60.00 seconds (check:282)

But we don't see any termination request - so this host is not the SPM.

I guess this host was running the hosted engine vm, which uses a storage lease.
If you lose access to storage, sanlcok will kill the hosted engine vm,
so the system
can start it elsewhere. If the hosted engine vm is stuck on storage, sanlock
cannot kill it and it will reboot the host.


Pablo, can you locate the process with pid 6398?

Looking in hosted engine logs and other logs on the system may reveal what
was this process. When we find the process, we can check the source to
understand
why it was not terminating - likely blocked on the inaccessible NFS server.


The process is most likely a VM - I reproduced the exact scenario locally.

You can file a vdsm bug for this. The system behave as designed, but the design
is problematic; one VM with a lease stuck on NFS server can cause the
entire host
to be rebooted.

>

With block storage we don't have this issue, since we have exact
control over multipath
timeouts. Multipath will fail I/O in 80 seconds, after sanlock failed
to renew the lease.
When I/O fails, the process block on storage will unblocked an will be
terminated
by the kernel.


I observe this or similar behavior also in my glusterfs HCI cluster (but 
not on 

[ovirt-users] Re: Random reboots

2022-02-17 Thread Pablo Olivera

Hi Nir,


Thank you very much for your detailed explanations.

The pid 6398 looks like it's HostedEngine:

/audit/audit.log:type=VIRT_CONTROL msg=audit(1644587639.935:7895): 
pid=3629 uid=0 auid=4294967295 ses=4294967295 
subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg='virt=kvm op=start 
reason=booted 
//*vm="HostedEngine"*//uuid=37a75c8e-50a2-4abd-a887-8a62a75814cc 
//*vm-pid=6398*//exe="/usr/sbin/libvirtd" hostname=? addr=? terminal=? 
res=success'UID="root" AUID="unset"/


So, I understand that SanLock has problems with the storage (it loses 
connection with NFS storage). The watchdog begins to check connectivity 
with the MV and after the established time, the order to

reboot the machine.

I don't know if I can somehow increase these timeouts, or try to make 
sanlock force the reconnection or renewal with the storage and in this 
way try to avoid host reboots for this reason.


Pablo.


El 16/02/2022 a las 23:37, Nir Soffer escribió:

On Wed, Feb 16, 2022 at 9:18 PM Nir Soffer  wrote:

On Wed, Feb 16, 2022 at 5:12 PM Nir Soffer  wrote:

On Wed, Feb 16, 2022 at 10:10 AM Pablo Olivera  wrote:

Hi community,

We're dealing with an issue as we occasionally have random reboots on
any of our hosts.
We're using ovirt 4.4.3 in production with about 60 VM distributed over
5 hosts. We've a virtualized engine and a DRBD storage mounted by NFS.
The infrastructure is interconnected by a Cisco 9000 switch.
The last random reboot was yesterday February 14th at 03:03 PM (in the
log it appears as: 15:03 due to our time configuration) of the host:
'nodo1'.
At the moment of the reboot we detected in the log of the switch a
link-down in the port where the host is connected.
I attach log of the engine and host 'nodo1' in case you can help us to
find the cause of these random reboots.


According to messages:

1. Sanlock could not renew the lease for 80 seconds:

Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
[2017]: s1 check_our_lease failed 80


2. In this case sanlock must terminate the processes holding a lease
on the that storage - I guess that pid 6398 is vdsm.

Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655257
[2017]: s1 kill 6398 sig 15 count 1
Feb 14 15:03:06 nodo1 sanlock[2017]: 2022-02-14 15:03:06 1655258
[2017]: s1 kill 6398 sig 15 count 2

pid 6398 is not vdsm:

Feb 14 15:02:51 nodo1 vdsm[4338]

The fact that we see "sig 15" means sanlock is trying to send SIGTERM.
If pid 6398 is a VM (hosted engine vm?) we would expect to see:


[2017]: s1 kill 6398 sig 100 count 1

Exactly once - which means run the killpath program registered by libvirt,
which will terminate the vm.

I reproduce this issue locally - we never use killpath program, because we
don't configure libvirt on_lockfailure in the domain xml.

So we get the default behavior, which is sanlock terminating the vm.


So my guess is that this is not a VM, so the only other option is hosted
engine broker, using a lease on the whiteboard.


...
Feb 14 15:03:36 nodo1 sanlock[2017]: 2022-02-14 15:03:36 1655288
[2017]: s1 kill 6398 sig 15 count 32

3. Terminating pid 6398 stopped here, and we see:

Feb 14 15:03:36 nodo1 wdmd[2033]: test failed rem 19 now 1655288 ping
1655237 close 1655247 renewal 1655177 expire 1655257 client 2017
sanlock_a5c35d19-4c34-4571-ac77-1b10de484426:1

According to David, this means we have 19 more attempts to kill the process
holding the lease.


4. So it looks like wdmd rebooted the host.

Feb 14 15:08:09 nodo1 kernel: Linux version
4.18.0-193.28.1.el8_2.x86_64 (mockbu...@kbuilder.bsys.centos.org) (gcc
version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)) #1 SMP Thu Oct 22
00:20:22 UTC 2020


This is strange, since sanlock should try to kill pid 6398 40 times,
and then switch
to SIGKILL. The watchdog should not reboot the host before sanlock
finish the attempt to kill the processes.

David, do you think this is expected? do we have any issue in sanlock?

I discussed it with David (sanlock author). What we see here may be truncated
logs when a host is rebooted by the watchdog. The last time logs were synced
to storage was probably Feb 14 15:03:36. Any message written after that was
lost in the host page cache.



It is possible that sanlock will not be able to terminate a process if
the process is blocked on inaccessible storage. This seems to be the
case here.

In vdsm log we see that storage is indeed inaccessible:

2022-02-14 15:03:03,149+0100 WARN  (check/loop) [storage.check]
Checker 
'/rhev/data-center/mnt/newstoragedrbd.andromeda.com:_var_nfsshare_data/a5c35d19-4c34-4571-ac77-1b10de484426/dom_md/metadata'
is blocked for 60.00 seconds (check:282)

But we don't see any termination request - so this host is not the SPM.

I guess this host was running the hosted engine vm, which uses a storage lease.
If you lose access to storage, sanlcok will kill the hosted engine vm,
so the system
can start it elsewhere. If the hosted engine vm is stuck on storage, sanlock
cannot kill it and it will reboot the 

[ovirt-users] Re: Unable to update self hosted Engine due to missing mirrors

2022-02-17 Thread Yedidyah Bar David
On Wed, Feb 16, 2022 at 5:08 PM Gianluca Merlo  wrote:
>
> Il giorno mer 16 feb 2022 alle ore 13:33 Yedidyah Bar David  
> ha scritto:
>>
>> On Wed, Feb 16, 2022 at 2:09 PM Gianluca Merlo  
>> wrote:
>> >
>> > Il giorno mer 16 feb 2022 alle ore 10:22 Yedidyah Bar David 
>> >  ha scritto:
>> >>
>> >> On Tue, Feb 15, 2022 at 6:48 PM Gianluca Merlo  
>> >> wrote:
>> >> >
>> >> > Hi Jonas,
>> >> >
>> >> > I would like to say that I also have a 4.4 HE (4.4.5.11-1) that I am 
>> >> > trying to upgrade to the latest one, and have this issue. I read through
>> >> >
>> >> > 
>> >> > https://lists.ovirt.org/archives/list/users@ovirt.org/thread/D3FEXQ5KOLN3SST3MPGIHSCEF52IBTKY/
>> >> > 
>> >> > https://lists.ovirt.org/archives/list/users@ovirt.org/thread/4EBNIFJ3ANERON47E2XX75T3IOQ52AYD/
>> >> >
>> >> > but did not manage to get out of the issue. I tried:
>> >> >
>> >> > - yum remove ovirt-release44
>> >> > - Download and install manually (rpm -i) 
>> >> > https://resources.ovirt.org/pub/yum-repo/ovirt-release44.rpm
>> >> >
>> >> > After this I have regained a functional package manager on the engine. 
>> >> > However trying to go further in the update procedure 
>> >> > (https://ovirt.org/documentation/upgrade_guide/#Updating_a_self-hosted_engine_minor_updates)
>> >> >  running
>> >> >
>> >> > engine-upgrade-check
>> >> >
>> >> > fails with
>> >> >
>> >> > > OK:   Downloaded CentOS Linux 8 - AppStream
>> >> > > FAIL: Failed to download metadata for repo 'appstream': Cannot 
>> >> > > prepare internal mirrorlist: No URLs in mirrorlist
>> >> > > Error: Failed to download metadata for repo 'appstream': Cannot 
>> >> > > prepare internal mirrorlist: No URLs in mirrorlist
>> >> >
>> >> > So I am stuck in wait of a comment from someone more knowledgeable too!
>> >>
>> >> Did you upgrade the machine to CentOS Stream? CentOS Linux archives are 
>> >> gone.
>> >
>> >
>> > No, I did not do anything apart from trying to fix sources set up from 
>> > ovirt-release44. May I ask if you could share some guidance on the process?
>>
>> Sorry, I only did this on very few development machines, so do not
>> have any significant "real life experience" to share.
>>
>> > I can see that various third parties provide their "guide" on how to 
>> > perform the migration, the most "neutral" one I could find is
>> >
>> > 
>> > https://unix.stackexchange.com/questions/552873/how-to-switch-from-centos-8-to-centos-stream
>> >
>> > but I am having a hard time finding any "official" documentation from 
>> > CentOS
>>
>> I think the "official" doc is:
>>
>> https://www.centos.org/centos-stream/ -> Press "8" -> check
>> "Converting from CentOS Linux 8 to CentOS Stream 8"
>
>
> I completely missed that and it was in plain view! It is brief, but gives 
> weight to other procedures online! Thank you.

I admit I missed it too on first glance. Not sure why... But I was
certain it should be there so looked harder :-).

>
>>
>> This is very similar to, and slightly more specific/safe, than:
>>
>> https://centos.org/distro-faq/#q7-how-do-i-migrate-my-centos-linux-8-installation-to-centos-stream
>>
>> The stackexchange answer above is also slightly more detailed than
>> both of these, based on a specific user's experience.
>>
>> I definitely recommend trying first on test machines, as similar as
>> possible to your production ones.
>
>
> Fortunately it is not strictly a production environment, I am not in a hurry 
> and can afford some downtime, but I cannot afford total loss. I will consider 
> if it is possible to spin up a 4.4.5 setup, but I had some bad luck in the 
> past with a similar scenario (testing an upgrade of things that were no 
> longer in the mirrors)

Perhaps you remember more details?

Rollback, inside engine-setup, in case it ran into some problem,
indeed relies on having the currently-installed version available for
reinstall.
Generally speaking, when we release new versions, we keep old ones
in-place. This might not be enough, though, depending on external
dependencies availability etc.

> that makes me skeptical about the feasibility of the test.

Understood.

>
>> > or oVirt.
>>
>> You are right - there are no oVirt-specific migration procedures I am aware 
>> of.
>>
>> > May I also ask if this is the way to do it on Hosted Engine (not 
>> > standalone) deployments, or may it be that there are easier ways to 
>> > accomplish this in this scenario (e.g. redeployment from newer image)?
>>
>> There is a way, which is to follow the general hosted-engine
>> backup/restore procedure. This isn't an in-place upgrade - requires a
>> new storage space for the hosted-engine domain, etc.
>>
>> The appliance that the oVirt project releases is based on Stream since
>> 4.4.6 IIRC. Same for ovirt-node - so if you use node, the normal
>> upgrade process for it will get you Stream.
>>
>> > It would be nice to know if there is any documentation I missed on how to 
>> > best handle the Centos 8 -> Centos 8 Stream conversion in my case 

[ovirt-users] Re: Enable Power Management Ovirt 4.3

2022-02-17 Thread Angus Clarke
Hi Emiliano

Please check you have enabled IPMI/DCMI on the ilo device in order to use the 
fence_type "ilo4" and that network access is in place (port 623, not sure if 
TCP is required or not.)

The ilo account needs various privileges on the device also, to start with give 
the user full administration access and then keep revoking roles and testing 
the account to clarify the baseline requirement.

Regards
Angus



From: emiliano.pozzess...@satservizi.eu 
Sent: 16 February 2022 15:26
To: users@ovirt.org 
Subject: [ovirt-users] Enable Power Management Ovirt 4.3

Hi guys,
I would like enable Power Management on Ovirt 4.3.
My Hosts are HP with ILO4, I configured on ILO4 account for Ovirt.
On the host enable Power Management, flag on Kdump Integration and I added the 
fence agent, I put the user and password and set up ILO4 but the test fails, if 
I put ilo_ssh it works.
But I don't know which keys I have to enter and with which values ​​(if 
necessary)
Can anyone explain to me?
Thanks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fprivacy-policy.htmldata=04%7C01%7C%7C8fa6aa6e84554f44a6d608d9f1ee21a0%7C84df9e7fe9f640afb435%7C1%7C0%7C637806827351458260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=OFVtWmnsl6GDEYhEiPXdNc6hYyjyp2U8FiijzbWkmI4%3Dreserved=0
oVirt Code of Conduct: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ovirt.org%2Fcommunity%2Fabout%2Fcommunity-guidelines%2Fdata=04%7C01%7C%7C8fa6aa6e84554f44a6d608d9f1ee21a0%7C84df9e7fe9f640afb435%7C1%7C0%7C637806827351458260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=hH%2Bl%2FAoMAWIrW3e2CoI%2FfDdpU28Vn2oBpby7gni935k%3Dreserved=0
List Archives: 
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.ovirt.org%2Farchives%2Flist%2Fusers%40ovirt.org%2Fmessage%2FLDBXZEYSP26CRCJALW7LO7OVS74KVAV7%2Fdata=04%7C01%7C%7C8fa6aa6e84554f44a6d608d9f1ee21a0%7C84df9e7fe9f640afb435%7C1%7C0%7C637806827351458260%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=UJSURIUrB2QOnMxHswfwRZx8wUeSQvVY07QDDyhGLjE%3Dreserved=0
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5UASGV7DRX2XC7GVFLFY7PMIUUYN4POJ/


[ovirt-users] Enable Power Management Ovirt 4.3

2022-02-17 Thread emiliano . pozzessere
Good morning, I need a clarification on enabling Power Management. I have an 
Ovirt 4.3 and some hosts with ILO4, if I enable power management on ILO4 the 
test fails if instead I enable it on ilo_ssh it works, but after I have enabled 
it I don't know which keys I have to set, can you help me?
Thanks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OTHLMINO5KARNHZTLH3CITYXM6FQBUOB/


[ovirt-users] Enable Power Management Ovirt 4.3

2022-02-17 Thread emiliano . pozzessere
Hi guys,
I would like enable Power Management on Ovirt 4.3.
My Hosts are HP with ILO4, I configured on ILO4 account for Ovirt.
On the host enable Power Management, flag on Kdump Integration and I added the 
fence agent, I put the user and password and set up ILO4 but the test fails, if 
I put ilo_ssh it works.
But I don't know which keys I have to enter and with which values ​​(if 
necessary)
Can anyone explain to me?
Thanks
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LDBXZEYSP26CRCJALW7LO7OVS74KVAV7/


[ovirt-users] Re: ovirt-engine manager, certificate issue

2022-02-17 Thread Yedidyah Bar David
On Thu, Feb 17, 2022 at 9:37 AM david  wrote:
>
> hello
> I have a problem to log in to ovirt-engine manager in my browser
> the warning message in the browser display me this text:
> PKIX path validation failed: java.security.cert.CertPathValidatorException: 
> validity check failed
>
> to solve this problem I am offered to run engine-setup

Where?

>
> and here is a question: the engine-setup will have no impact to the 
> hosts(hypervisors) working?
>
> ovirt version 4.4.4.7-1.el8

You can run 'engine-setup --offline', to prevent it from trying to
upgrade (which is what it's supposed to do normally).

If it's a hosted-engine, you should first set global maintenance.

Other than that, it should not affect your hosts.

Best regards,
-- 
Didi
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DUCXSJZ4ZYUTQUVECQMG65SR4QEWRYNC/