[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2021-01-14 Thread Vinícius Ferrão via Users
Hi all.

DISCLAIMER: read this if you are running TrueNAS 12.0-RELEASE or 12.0-U1.

After struggling for almost 2 months I’ve finally nailed down the issue to the 
storage subsystem.

Everything we’ve tried to solve the issue were only mitigations. In fact 
there’s nothing wrong with oVirt in first place, nor with the NFSv4 storage 
backend. Indeed changing to NFSv3 as recommended by Abhishek greatly mitigated 
the issue but the issue still existed.

The issue was due to a bug (not fixed yet - but already identified) on a new 
feature of TrueNAS 12.0-RELEASE and 12.0-U1, which is the Asynchronous 
Copy-on-Write. After numerous days of testing and constantly losing data even 
on other hypervisor solutions the storage was identified as the only common 
denominator in everything I’ve test.

So that’s it, I’ll leave the links here for iXsystems Jira issue with all the 
data for those who want to check it out.
Jira Issue: https://jira.ixsystems.com/browse/NAS-108627
TrueNAS Forums: 
https://www.truenas.com/community/threads/freenas-now-truenas-is-no-longer-stable.89445

I would like to thank specially Strahil and Abhishek for giving ideas and 
suggestions to figure out what’s may be happening. And as a final disclaimer, 
if you’re running FreeNAS up to 11.3-U5 do not upgrade to 12.0 yet. Wait for 
12.0-U1.1 or 12.0-U2 because I think they will either have the feature disabled 
or fixed in the cited future versions.

Thank you all,
Vinícius.

On 13 Dec 2020, at 00:34, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hi Abhishek,

I haven’t found any critical corruption after the change. But I’m not sure if 
this was the issue, right now I’m suspecting of the storage subsystem. I’ll 
give some more days to see how things will end up.

Definitely there’s an improvement but, again, not sure yet if it was solved.

Thanks,

On 2 Dec 2020, at 09:21, Abhishek Sahni 
mailto:abhishek.sahni1...@gmail.com>> wrote:

I have been through a similar type of weird situation and ended up knowing that 
it was because of NFS mounting.

ENV:
STORAGE: Dell EMC VNX 5200 - NFS shares.

a) Given below is the mounting of the storage (NFSv4 SHARE) on the nodes when 
creating new VMs failed while installation. However existing VMs are running 
fine.  [nfsv4]

# mount

A.B.C.D:/VIRT_CC on /rhev/data-center/mnt/A.B.C.D:_VIRT__CC type nfs4 
(rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=A.B.C.D,local_lock=none,addr=A.B.C.D)


b) Switched back to NFSv3 and everything came back to normal.

# mount

A.B.C.D:/VIRT_CC on /rhev/data-center/mnt/A.B.C.D:_VIRT__CC type nfs 
(rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=A.B.C.D,mountvers=3,mountport=1234,mountproto=udp,local_lock=all,addr=A.B.C.D)

Conclusion: Checked logs everywhere (Nodes and Storage.) but didnt find 
anything which can lead to the error.

WORKAROUND:
NFS Storage domain was configured with "AUTO" negotiated  option.

1) I put the storage domain in maintainance mode.
2) Changed it to NFS v3 and remove it from the maintanance mode.

and Boom everything came back to normal.

You can check if that workaround will work for you.

On Wed, Dec 2, 2020 at 10:42 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Can this be related the case?
https://bugzilla.redhat.com/show_bug.cgi?id=810082

On 1 Dec 2020, at 10:25, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

ECC RAM everywhere: hosts and storage.

I even run Memtest86 on both hypervisor hosts just be sure. No errors. I 
haven’t had the opportunity to run it on the storage yet.

After I’ve sent that message yesterday, the engine VM crashed again, filesystem 
went offline. There was some discards (again) on the switch, probably due to 
the “boot storm” of other VM’s. But this time a simple reboot fixed the 
filesystem and the hosted engine VM was back.

Since it was an extremely small amount of time, I’ve checked everything again, 
and only the discards issues came up, there are ~90k discards on Po2 (which is 
the LACP interface of the hypervisor). Since the fact, I enabled hardware flow 
control on the ports of the switch, but discards are still happening:

PortAlign-Err FCS-ErrXmit-Err Rcv-Err  UnderSize  
OutDiscards
Po1 0   0   0   0  00
Po2 0   0   0   0  0 
3650
Po3 0   0   0   0  00
Po4 0   0   0   0  00
Po5 0   0   0   0  00
Po6 0   0   0   0  00
Po7 0   0   0   0  00
Po20   

[ovirt-users] Re: Shrink iSCSI Domain

2020-12-29 Thread Vinícius Ferrão via Users
It provides but isn’t enabled.

I run on TrueNAS, so in the past FreeNAS didn’t recommend deduplication due to 
some messy requirement that the deduplication table must fit on RAM or else the 
pool will be unable to mount. So I’ve avoided using it. Not sure how it’s today…

Thanks.

> On 28 Dec 2020, at 13:43, Strahil Nikolov  wrote:
> 
> Vinius,
> 
> does your storage provide dedpulication ? If yes, then you can provide a new 
> thin-provisioned LUN and migrate the data from the old LUN to the new one.
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 28 декември 2020 г., 18:27:38 Гринуич+2, Vinícius Ferrão via 
> Users  написа: 
> 
> 
> 
> 
> 
> Hi Shani, thank you! 
> 
> 
> 
> It’s only one LUN :(
> 
> 
> 
> 
> So it may be a best practice to split an SD in multiple LUNs?
> 
> 
> 
> 
> Thank you.
> 
> 
> 
>>   
>> On 28 Dec 2020, at 09:08, Shani Leviim  wrote:
>> 
>> 
>>   
>>   
>>   Hi,
>> 
>>   You can reduce LUNs from an iSCSI storage domain once it's in maintenance. 
>> [1]
>> 
>> 
>>   On the UI, after putting the storage domain in maintenance > Manage Domain 
>> > select the LUNs to be removed from the storage domain.
>> 
>>   
>> 
>> 
>>   Note that reducing LUNs is applicable in case the storage domain has more 
>> than 1 LUN.
>> 
>>   (Otherwise, removing the single LUN means removing the whole storage 
>> domain).
>> 
>> 
>>   
>> 
>> 
>>   [1]  
>> https://www.ovirt.org/develop/release-management/features/storage/reduce-luns-from-sd.html
>> 
>>   
>> 
>> 
>>   
>>   
>>   
>>   
>>   
>> Regards,
>> Shani Leviim
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>   
>> On Sun, Dec 27, 2020 at 8:16 PM Vinícius Ferrão via Users  
>> wrote:
>> 
>> 
>>>   Hello,
>>> 
>>> Is there any way to reduce the size of an iSCSI Storage Domain? I can’t 
>>> seem to figure this myself. It’s probably unsupported, and the path would 
>>> be create a new iSCSI Storage Domain with the reduced size and move the 
>>> virtual disks to there and them delete the old one.
>>> 
>>> But I would like to confirm if this is the only way to do this…
>>> 
>>> In the past I had a requirement, so I’ve created the VM Domains with 10TB, 
>>> now it’s just too much, and I need to use the space on the storage for 
>>> other activities.
>>> 
>>> Thanks all and happy new year.
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to  users-le...@ovirt.org
>>> Privacy Statement:  https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:  
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:  
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4B26ZBZUMRXZ6MLJ6YQTK26SZNZOYQLF/
>>> 
>> 
>> 
>> 
> 
> 
> 
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/OWQ2WQZ35U3XEU67MWKPB7CJK7YMNTTG/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MYAM2QM7D2RZJZ632IRHLTIZ6XGMPC4Y/


[ovirt-users] Re: Shrink iSCSI Domain

2020-12-28 Thread Vinícius Ferrão via Users
Hi Shani, thank you!

It’s only one LUN :(

So it may be a best practice to split an SD in multiple LUNs?

Thank you.

On 28 Dec 2020, at 09:08, Shani Leviim 
mailto:slev...@redhat.com>> wrote:

Hi,
You can reduce LUNs from an iSCSI storage domain once it's in maintenance. [1]
On the UI, after putting the storage domain in maintenance > Manage Domain > 
select the LUNs to be removed from the storage domain.

Note that reducing LUNs is applicable in case the storage domain has more than 
1 LUN.
(Otherwise, removing the single LUN means removing the whole storage domain).

[1] 
https://www.ovirt.org/develop/release-management/features/storage/reduce-luns-from-sd.html

Regards,
Shani Leviim


On Sun, Dec 27, 2020 at 8:16 PM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello,

Is there any way to reduce the size of an iSCSI Storage Domain? I can’t seem to 
figure this myself. It’s probably unsupported, and the path would be create a 
new iSCSI Storage Domain with the reduced size and move the virtual disks to 
there and them delete the old one.

But I would like to confirm if this is the only way to do this…

In the past I had a requirement, so I’ve created the VM Domains with 10TB, now 
it’s just too much, and I need to use the space on the storage for other 
activities.

Thanks all and happy new year.
___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4B26ZBZUMRXZ6MLJ6YQTK26SZNZOYQLF/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OWQ2WQZ35U3XEU67MWKPB7CJK7YMNTTG/


[ovirt-users] Shrink iSCSI Domain

2020-12-27 Thread Vinícius Ferrão via Users
Hello,

Is there any way to reduce the size of an iSCSI Storage Domain? I can’t seem to 
figure this myself. It’s probably unsupported, and the path would be create a 
new iSCSI Storage Domain with the reduced size and move the virtual disks to 
there and them delete the old one.

But I would like to confirm if this is the only way to do this…

In the past I had a requirement, so I’ve created the VM Domains with 10TB, now 
it’s just too much, and I need to use the space on the storage for other 
activities.

Thanks all and happy new year.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4B26ZBZUMRXZ6MLJ6YQTK26SZNZOYQLF/


[ovirt-users] Re: CentOS 8 is dead

2020-12-25 Thread Vinícius Ferrão via Users
Oracle took that college meme — just change the variables name — too seriously.

> On 25 Dec 2020, at 16:35, James Loker-Steele via Users  
> wrote:
> 
> Yes.
> We use OEL and have setup oracles branded ovirt as well as test ovirt on 
> oracle and it works a treat.
> 
> 
> Sent from my iPhone
> 
>> On 25 Dec 2020, at 18:23, Diggy Mc  wrote:
>> 
>> Is Oracle Linux a viable alternative for the oVirt project?  It is, after 
>> all, a rebuild of RHEL like CentOS.  If not viable, why not?  I need to make 
>> some decisions posthaste about my pending oVirt 4.4 deployments.
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct: 
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives: 
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/PXCL7XVD7BLKKLPWIZJPNUMAFP3A3B5D/
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/M2TMEQZH6LB65RJUPFAFTSBWYPAXSCZ3/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6IZXVHFR6JV6CQCPQFJALFBI5ZORBB7M/


[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-12-12 Thread Vinícius Ferrão via Users
Hi Abhishek,

I haven’t found any critical corruption after the change. But I’m not sure if 
this was the issue, right now I’m suspecting of the storage subsystem. I’ll 
give some more days to see how things will end up.

Definitely there’s an improvement but, again, not sure yet if it was solved.

Thanks,

On 2 Dec 2020, at 09:21, Abhishek Sahni 
mailto:abhishek.sahni1...@gmail.com>> wrote:

I have been through a similar type of weird situation and ended up knowing that 
it was because of NFS mounting.

ENV:
STORAGE: Dell EMC VNX 5200 - NFS shares.

a) Given below is the mounting of the storage (NFSv4 SHARE) on the nodes when 
creating new VMs failed while installation. However existing VMs are running 
fine.  [nfsv4]

# mount

A.B.C.D:/VIRT_CC on /rhev/data-center/mnt/A.B.C.D:_VIRT__CC type nfs4 
(rw,relatime,vers=4.1,rsize=65536,wsize=65536,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=A.B.C.D,local_lock=none,addr=A.B.C.D)


b) Switched back to NFSv3 and everything came back to normal.

# mount

A.B.C.D:/VIRT_CC on /rhev/data-center/mnt/A.B.C.D:_VIRT__CC type nfs 
(rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,mountaddr=A.B.C.D,mountvers=3,mountport=1234,mountproto=udp,local_lock=all,addr=A.B.C.D)

Conclusion: Checked logs everywhere (Nodes and Storage.) but didnt find 
anything which can lead to the error.

WORKAROUND:
NFS Storage domain was configured with "AUTO" negotiated  option.

1) I put the storage domain in maintainance mode.
2) Changed it to NFS v3 and remove it from the maintanance mode.

and Boom everything came back to normal.

You can check if that workaround will work for you.

On Wed, Dec 2, 2020 at 10:42 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Can this be related the case?
https://bugzilla.redhat.com/show_bug.cgi?id=810082

On 1 Dec 2020, at 10:25, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

ECC RAM everywhere: hosts and storage.

I even run Memtest86 on both hypervisor hosts just be sure. No errors. I 
haven’t had the opportunity to run it on the storage yet.

After I’ve sent that message yesterday, the engine VM crashed again, filesystem 
went offline. There was some discards (again) on the switch, probably due to 
the “boot storm” of other VM’s. But this time a simple reboot fixed the 
filesystem and the hosted engine VM was back.

Since it was an extremely small amount of time, I’ve checked everything again, 
and only the discards issues came up, there are ~90k discards on Po2 (which is 
the LACP interface of the hypervisor). Since the fact, I enabled hardware flow 
control on the ports of the switch, but discards are still happening:

PortAlign-Err FCS-ErrXmit-Err Rcv-Err  UnderSize  
OutDiscards
Po1 0   0   0   0  00
Po2 0   0   0   0  0 
3650
Po3 0   0   0   0  00
Po4 0   0   0   0  00
Po5 0   0   0   0  00
Po6 0   0   0   0  00
Po7 0   0   0   0  00
Po200   0   0   0  0
13788

I think this may be related… but it’s just a guess.

Thanks,


On 1 Dec 2020, at 05:06, Strahil Nikolov 
mailto:hunter86...@yahoo.com>> wrote:

Could it be faulty ram ?
Do you use ECC ram ?

Best Regards,
Strahil Nikolov






В вторник, 1 декември 2020 г., 06:17:10 Гринуич+2, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> написа:






Hi again,



I had to shutdown everything because of a power outage in the office. When 
trying to get the infra up again, even the Engine have corrupted:



[  772.466982] XFS (dm-4): Invalid superblock magic number
mount: /var: wrong fs type, bad option, bad superblock on 
/dev/mapper/ovirt-var, missing codepage or helper program, or other error.
[  772.472885] XFS (dm-3): Mounting V5 Filesystem
[  773.629700] XFS (dm-3): Starting recovery (logdev: internal)
[  773.731104] XFS (dm-3): Metadata CRC error detected at 
xfs_agfl_read_verify+0xa1/0xf0 [xfs], xfs_agfl block 0xf3
[  773.734352] XFS (dm-3): Unmount and run xfs_repair
[  773.736216] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[  773.738458] : 23 31 31 35 36 35 35 34 29 00 2d 20 52 65 62 75  
#1156554).- Rebu
[  773.741044] 0010: 69 6c 74 20 66 6f 72 20 68 74 74 70 73 3a 2f 2f  ilt 
for https://
[  773.743636] 0020: 66 65 64 6f 72 61 70 72 6f 6a 65 63 74 2e 6f 72  
fedoraproject.or
[  773.746191] 0030: 67 2f 77 69 6b 69 2f 46 65 64 6f 72 61 5f 32 33  
g/wiki/Fedora_23
[  773.748818] 0040: 5f 4d 6

[ovirt-users] Re: CentOS 8 is dead

2020-12-08 Thread Vinícius Ferrão via Users
CentOS Stream is unstable at best.

I’ve used it recently and it was just a mess. There’s no binary compatibility 
with the current point release and there’s no version pinning. So it will be 
really difficult to keep track of things.

I’m really curious how oVirt will handle this.

From: Wesley Stewart 
Sent: Tuesday, December 8, 2020 4:56 PM
To: Strahil Nikolov 
Cc: users 
Subject: [ovirt-users] Re: CentOS 8 is dead

This is a little concerning.

But it seems pretty easy to convert:
https://www.centos.org/centos-stream/

However I would be curious to see if someone tests this with having an active 
ovirt node!

On Tue, Dec 8, 2020 at 2:39 PM Strahil Nikolov via Users 
mailto:users@ovirt.org>> wrote:
Hello All,

I'm really worried about the following news:
https://blog.centos.org/2020/12/future-is-centos-stream/

Did anyone tried to port oVirt to SLES/openSUSE or any Debian-based
distro ?

Best Regards,
Strahil Nikolov
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to 
users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HZC4D4OSYL64DX5VYXDJCHDNRZDRGIT6/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZLTWP255MVLDGSBYEG266FDMGZKOE4J5/


[ovirt-users] Re: difference between CPU server and client family

2020-12-08 Thread Vinícius Ferrão via Users
AFAIK Client is for the i3/i5/i7/i9 families and the other one is for Xeon 
platforms.

But you have pretty unusually Xeon, so it may be missing some flags that will 
properly classify the CPU.

You can run this on the host to check what’s detected:


[root]# vdsm-client Host getCapabilities

Sent from my iPhone

On 8 Dec 2020, at 10:52, jb  wrote:

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RM3AE7FLYVNDIESMXCGUAABHWIEK5AG2/


[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-12-01 Thread Vinícius Ferrão via Users
Can this be related the case?
https://bugzilla.redhat.com/show_bug.cgi?id=810082

On 1 Dec 2020, at 10:25, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

ECC RAM everywhere: hosts and storage.

I even run Memtest86 on both hypervisor hosts just be sure. No errors. I 
haven’t had the opportunity to run it on the storage yet.

After I’ve sent that message yesterday, the engine VM crashed again, filesystem 
went offline. There was some discards (again) on the switch, probably due to 
the “boot storm” of other VM’s. But this time a simple reboot fixed the 
filesystem and the hosted engine VM was back.

Since it was an extremely small amount of time, I’ve checked everything again, 
and only the discards issues came up, there are ~90k discards on Po2 (which is 
the LACP interface of the hypervisor). Since the fact, I enabled hardware flow 
control on the ports of the switch, but discards are still happening:

PortAlign-Err FCS-ErrXmit-Err Rcv-Err  UnderSize  
OutDiscards
Po1 0   0   0   0  00
Po2 0   0   0   0  0 
3650
Po3 0   0   0   0  00
Po4 0   0   0   0  00
Po5 0   0   0   0  00
Po6 0   0   0   0  00
Po7 0   0   0   0  00
Po200   0   0   0  0
13788

I think this may be related… but it’s just a guess.

Thanks,


On 1 Dec 2020, at 05:06, Strahil Nikolov 
mailto:hunter86...@yahoo.com>> wrote:

Could it be faulty ram ?
Do you use ECC ram ?

Best Regards,
Strahil Nikolov






В вторник, 1 декември 2020 г., 06:17:10 Гринуич+2, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> написа:






Hi again,



I had to shutdown everything because of a power outage in the office. When 
trying to get the infra up again, even the Engine have corrupted:



[  772.466982] XFS (dm-4): Invalid superblock magic number
mount: /var: wrong fs type, bad option, bad superblock on 
/dev/mapper/ovirt-var, missing codepage or helper program, or other error.
[  772.472885] XFS (dm-3): Mounting V5 Filesystem
[  773.629700] XFS (dm-3): Starting recovery (logdev: internal)
[  773.731104] XFS (dm-3): Metadata CRC error detected at 
xfs_agfl_read_verify+0xa1/0xf0 [xfs], xfs_agfl block 0xf3
[  773.734352] XFS (dm-3): Unmount and run xfs_repair
[  773.736216] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
[  773.738458] : 23 31 31 35 36 35 35 34 29 00 2d 20 52 65 62 75  
#1156554).- Rebu
[  773.741044] 0010: 69 6c 74 20 66 6f 72 20 68 74 74 70 73 3a 2f 2f  ilt 
for https://
[  773.743636] 0020: 66 65 64 6f 72 61 70 72 6f 6a 65 63 74 2e 6f 72  
fedoraproject.or
[  773.746191] 0030: 67 2f 77 69 6b 69 2f 46 65 64 6f 72 61 5f 32 33  
g/wiki/Fedora_23
[  773.748818] 0040: 5f 4d 61 73 73 5f 52 65 62 75 69 6c 64 00 2d 20  
_Mass_Rebuild.-
[  773.751399] 0050: 44 72 6f 70 20 6f 62 73 6f 6c 65 74 65 20 64 65  Drop 
obsolete de
[  773.753933] 0060: 66 61 74 74 72 20 73 74 61 6e 7a 61 73 20 28 23  fattr 
stanzas (#
[  773.756428] 0070: 31 30 34 37 30 33 31 29 00 2d 20 49 6e 73 74 61  
1047031).- Insta
[  773.758873] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at 
daddr 0xf3 len 1 error 74
[  773.763756] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 446 of 
file fs/xfs/libxfs/xfs_defer.c. Return address = 962bd5ee
[  773.769363] XFS (dm-3): Corruption of in-memory data detected.  Shutting 
down filesystem
[  773.772643] XFS (dm-3): Please unmount the filesystem and rectify the 
problem(s)
[  773.776079] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned error 
-5.
[  773.779113] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
3. Continuing.
[  773.783039] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned error 
-5.
[  773.785698] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
3. Continuing.
[  773.790023] XFS (dm-3): Ending recovery (logdev: internal)
[  773.792489] XFS (dm-3): Error -5 recovering leftover CoW allocations.
mount: /var/log: can't read superblock on /dev/mapper/ovirt-log.
mount: /var/log/audit: mount point does not exist.




/var seems to be completely trashed.




The only time that I’ve seem something like this was faulty hardware. But 
nothing shows up on logs, as far as I know.




After forcing repairs with -L I’ve got other issues:




mount -a
[  326.170941] XFS (dm-4): Mounting V5 Filesystem
[  326.404788] XFS (dm-4): Ending clean mount
[  326.415291] XFS (dm-3): Mounting V5 Filesystem
[  326.611673] XFS (dm-3): Ending clean mount
[  326.621705] XFS (dm-2): Mounting V5 F

[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-12-01 Thread Vinícius Ferrão via Users
ECC RAM everywhere: hosts and storage.

I even run Memtest86 on both hypervisor hosts just be sure. No errors. I 
haven’t had the opportunity to run it on the storage yet.

After I’ve sent that message yesterday, the engine VM crashed again, filesystem 
went offline. There was some discards (again) on the switch, probably due to 
the “boot storm” of other VM’s. But this time a simple reboot fixed the 
filesystem and the hosted engine VM was back.

Since it was an extremely small amount of time, I’ve checked everything again, 
and only the discards issues came up, there are ~90k discards on Po2 (which is 
the LACP interface of the hypervisor). Since the fact, I enabled hardware flow 
control on the ports of the switch, but discards are still happening:

PortAlign-Err FCS-ErrXmit-Err Rcv-Err  UnderSize  
OutDiscards 
Po1 0   0   0   0  0
0 
Po2 0   0   0   0  0 
3650 
Po3 0   0   0   0  0
0 
Po4 0   0   0   0  0
0 
Po5 0   0   0   0  0
0 
Po6 0   0   0   0  0
0 
Po7 0   0   0   0  0
0 
Po200   0   0   0  0
13788 

I think this may be related… but it’s just a guess.

Thanks,


> On 1 Dec 2020, at 05:06, Strahil Nikolov  wrote:
> 
> Could it be faulty ram ?
> Do you use ECC ram ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В вторник, 1 декември 2020 г., 06:17:10 Гринуич+2, Vinícius Ferrão via Users 
>  написа: 
> 
> 
> 
> 
> 
> 
> Hi again,
> 
> 
> 
> I had to shutdown everything because of a power outage in the office. When 
> trying to get the infra up again, even the Engine have corrupted: 
> 
> 
> 
> [  772.466982] XFS (dm-4): Invalid superblock magic number
> mount: /var: wrong fs type, bad option, bad superblock on 
> /dev/mapper/ovirt-var, missing codepage or helper program, or other error.
> [  772.472885] XFS (dm-3): Mounting V5 Filesystem
> [  773.629700] XFS (dm-3): Starting recovery (logdev: internal)
> [  773.731104] XFS (dm-3): Metadata CRC error detected at 
> xfs_agfl_read_verify+0xa1/0xf0 [xfs], xfs_agfl block 0xf3 
> [  773.734352] XFS (dm-3): Unmount and run xfs_repair
> [  773.736216] XFS (dm-3): First 128 bytes of corrupted metadata buffer:
> [  773.738458] : 23 31 31 35 36 35 35 34 29 00 2d 20 52 65 62 75  
> #1156554).- Rebu
> [  773.741044] 0010: 69 6c 74 20 66 6f 72 20 68 74 74 70 73 3a 2f 2f  ilt 
> for https://
> [  773.743636] 0020: 66 65 64 6f 72 61 70 72 6f 6a 65 63 74 2e 6f 72  
> fedoraproject.or
> [  773.746191] 0030: 67 2f 77 69 6b 69 2f 46 65 64 6f 72 61 5f 32 33  
> g/wiki/Fedora_23
> [  773.748818] 0040: 5f 4d 61 73 73 5f 52 65 62 75 69 6c 64 00 2d 20  
> _Mass_Rebuild.- 
> [  773.751399] 0050: 44 72 6f 70 20 6f 62 73 6f 6c 65 74 65 20 64 65  
> Drop obsolete de
> [  773.753933] 0060: 66 61 74 74 72 20 73 74 61 6e 7a 61 73 20 28 23  
> fattr stanzas (#
> [  773.756428] 0070: 31 30 34 37 30 33 31 29 00 2d 20 49 6e 73 74 61  
> 1047031).- Insta
> [  773.758873] XFS (dm-3): metadata I/O error in "xfs_trans_read_buf_map" at 
> daddr 0xf3 len 1 error 74
> [  773.763756] XFS (dm-3): xfs_do_force_shutdown(0x8) called from line 446 of 
> file fs/xfs/libxfs/xfs_defer.c. Return address = 962bd5ee
> [  773.769363] XFS (dm-3): Corruption of in-memory data detected.  Shutting 
> down filesystem
> [  773.772643] XFS (dm-3): Please unmount the filesystem and rectify the 
> problem(s)
> [  773.776079] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned 
> error -5.
> [  773.779113] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
> 3. Continuing.
> [  773.783039] XFS (dm-3): xfs_imap_to_bp: xfs_trans_read_buf() returned 
> error -5.
> [  773.785698] XFS (dm-3): xlog_recover_clear_agi_bucket: failed to clear agi 
> 3. Continuing.
> [  773.790023] XFS (dm-3): Ending recovery (logdev: internal)
> [  773.792489] XFS (dm-3): Error -5 recovering leftover CoW allocations.
> mount: /var/log: can't read superblock on /dev/mapper/ovirt-log.
> mount: /var/log/audit: mount point does not exist.
> 
> 
> 
> 
> /var seems to be completely trashed.
> 
> 
> 
> 
> The only time that I’ve seem something like this was faulty hardware. But 
> nothing shows up on logs, as far as I know.
> 
> 
> 
> 
> After forcing repairs with -L I’ve got other issues:
> 
> 

[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-11-30 Thread Vinícius Ferrão via Users
a"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/;
BUG_REPORT_URL="https://bugzilla.redhat.com/;

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3"
Red Hat Enterprise Linux release 8.3 (Ootpa)
Red Hat Enterprise Linux release 8.3 (Ootpa)

[root@kontainerscomk ~]# sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 30
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

[root@kontainerscomk ~]# xfs_db -r /dev/dm-0
xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 
0xa82a)
Use -F to force a read attempt.
[root@kontainerscomk ~]# xfs_db -r /dev/dm-0 -F
xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 
0xa82a)
xfs_db: size check failed
xfs_db: V1 inodes unsupported. Please try an older xfsprogs.

[root@kontainerscomk ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Thu Nov 19 22:40:39 2020
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
/dev/mapper/rhel-root  /  xfsdefaults0 0
UUID=ad84d1ea-c9cc-4b22-8338-d1a6b2c7d27e /boot  xfs
defaults0 0
UUID=4642-2FF6  /boot/efi  vfat
umask=0077,shortname=winnt 0 2
/dev/mapper/rhel-swap  noneswapdefaults0 0

Thanks,


-Original Message-
From: Strahil Nikolov mailto:hunter86...@yahoo.com>>
Sent: Sunday, November 29, 2020 2:33 PM
To: Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>>
Cc: users mailto:users@ovirt.org>>
Subject: Re: [ovirt-users] Re: Constantly XFS in memory corruption inside VMs

Can you check the output on the VM that was affected:
# cat /etc/*release
# sysctl -a | grep dirty


Best Regards,
Strahil Nikolov





В неделя, 29 ноември 2020 г., 19:07:48 Гринуич+2, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> написа:





Hi Strahil.

I’m not using barrier options on mount. It’s the default settings from CentOS 
install.

I have some additional findings, there’s a big number of discarded packages on 
the switch on the hypervisor interfaces.

Discards are OK as far as I know, I hope TCP handles this and do the proper 
retransmissions, but I ask if this may be related or not. Our storage is over 
NFS. My general expertise is with iSCSI and I’ve never seen this kind of issue 
with iSCSI, not that I’m aware of.

In other clusters, I’ve seen a high number of discards with iSCSI on XenServer 
7.2 but there’s no corruption on the VMs there...

Thanks,

Sent from my iPhone

On 29 Nov 2020, at 04:00, Strahil Nikolov 
mailto:hunter86...@yahoo.com>> wrote:

Are you using "nobarrier" mount options in the VM ?

If yes, can you try to remove the "nobarrrier" option.


Best Regards,
Strahil Nikolov






В събота, 28 ноември 2020 г., 19:25:48 Гринуич+2, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> написа:





Hi Strahil,

I moved a running VM to other host, rebooted and no corruption was found. If 
there's any corruption it may be silent corruption... I've cases where the VM 
was new, just installed, run dnf -y update to get the updated packages, 
rebooted, and boom XFS corruption. So perhaps the motion process isn't the one 
to blame.

But, in fact, I remember when moving a VM that it went down during the process 
and when I rebooted it was corrupted. But this may not seems related. It 
perhaps was already in a inconsistent state.

Anyway, here's the mount options:

Host1:
192.168.10.14:/mnt/pool0/ovirt/vm on
/rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,noshar
ecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,l
ocal_lock=none,addr=192.168.10.14)

Host2:
192.168.10.14:/mnt/pool0/ovirt/vm on
/rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,noshar
ecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,l
ocal_lock=none,addr=192.168.10.14)

The options are the default ones. I haven't changed anything when configuring 
this cluster.

Thanks.



-Original Message-
From: Strahil Nikolov mailto:hunter86...@yahoo.com>>
Sent: Saturday, November 28, 2020 1:54 PM
To: users mailto:users@ovirt.org>>; Vinícius Ferrão
mail

[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-11-29 Thread Vinícius Ferrão via Users
Hi Strahil,

The majority of the VMs are UEFI. But I do have some Legacy BIOS VMs and they 
are corrupting too. I have a mix of RHEL/CentOS 7 and 8.

All of them are corrupting. XFS on everything with default values from 
installation.

There’s one VM with Ubuntu 18.04 LTS and ext4 that corruption is not found 
there. And the three NTFS VMs that I have are good too.

So the common denominator is XFS on Enterprise Linux (7 or 8).

Any other ideas?

Thanks.

PS: That VM that will die after the reboot is almost new. Installed on November 
19th, and oVirt is even with the Run Once flag because it never rebooted since 
installation.


Sent from my iPhone

> On 29 Nov 2020, at 17:03, Strahil Nikolov  wrote:
> 
> Damn...
> 
> You are using EFI boot. Does this happen only to EFI machines ?
> Did you notice if only EL 8 is affected ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В неделя, 29 ноември 2020 г., 19:36:09 Гринуич+2, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Yes!
> 
> I have a live VM right now that will de dead on a reboot:
> 
> [root@kontainerscomk ~]# cat /etc/*release
> NAME="Red Hat Enterprise Linux"
> VERSION="8.3 (Ootpa)"
> ID="rhel"
> ID_LIKE="fedora"
> VERSION_ID="8.3"
> PLATFORM_ID="platform:el8"
> PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
> ANSI_COLOR="0;31"
> CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
> HOME_URL="https://www.redhat.com/;
> BUG_REPORT_URL="https://bugzilla.redhat.com/;
> 
> REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
> REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
> REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
> REDHAT_SUPPORT_PRODUCT_VERSION="8.3"
> Red Hat Enterprise Linux release 8.3 (Ootpa)
> Red Hat Enterprise Linux release 8.3 (Ootpa)
> 
> [root@kontainerscomk ~]# sysctl -a | grep dirty
> vm.dirty_background_bytes = 0
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 30
> vm.dirty_writeback_centisecs = 500
> vm.dirtytime_expire_seconds = 43200
> 
> [root@kontainerscomk ~]# xfs_db -r /dev/dm-0
> xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 
> 0xa82a)
> Use -F to force a read attempt.
> [root@kontainerscomk ~]# xfs_db -r /dev/dm-0 -F
> xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 
> 0xa82a)
> xfs_db: size check failed
> xfs_db: V1 inodes unsupported. Please try an older xfsprogs.
> 
> [root@kontainerscomk ~]# cat /etc/fstab
> #
> # /etc/fstab
> # Created by anaconda on Thu Nov 19 22:40:39 2020
> #
> # Accessible filesystems, by reference, are maintained under '/dev/disk/'.
> # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
> #
> # After editing this file, run 'systemctl daemon-reload' to update systemd
> # units generated from this file.
> #
> /dev/mapper/rhel-root  /  xfsdefaults0 0
> UUID=ad84d1ea-c9cc-4b22-8338-d1a6b2c7d27e /boot  xfs
> defaults0 0
> UUID=4642-2FF6  /boot/efi  vfat
> umask=0077,shortname=winnt 0 2
> /dev/mapper/rhel-swap  noneswapdefaults0 0
> 
> Thanks,
> 
> 
> -Original Message-
> From: Strahil Nikolov  
> Sent: Sunday, November 29, 2020 2:33 PM
> To: Vinícius Ferrão 
> Cc: users 
> Subject: Re: [ovirt-users] Re: Constantly XFS in memory corruption inside VMs
> 
> Can you check the output on the VM that was affected:
> # cat /etc/*release
> # sysctl -a | grep dirty
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> В неделя, 29 ноември 2020 г., 19:07:48 Гринуич+2, Vinícius Ferrão via Users 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil.
> 
> I’m not using barrier options on mount. It’s the default settings from CentOS 
> install.
> 
> I have some additional findings, there’s a big number of discarded packages 
> on the switch on the hypervisor interfaces.
> 
> Discards are OK as far as I know, I hope TCP handles this and do the proper 
> retransmissions, but I ask if this may be related or not. Our storage is over 
> NFS. My general expertise is with iSCSI and I’ve never seen this kind of 
> issue with iSCSI, not that I’m aware of.
> 
> In other clusters, I’ve seen a high number of discards with iSCSI on 
> XenServer 7.2 but there’s no corruption on the VMs there...
> 
> Thanks,
> 
> Sent from my iPhone
> 
>> On 29 Nov 2020, at 04:00, Strahil Nikolov  wrote:
>> 
>> Are you using "nobarrier" mou

[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-11-29 Thread Vinícius Ferrão via Users
Yes!

I have a live VM right now that will de dead on a reboot:

[root@kontainerscomk ~]# cat /etc/*release
NAME="Red Hat Enterprise Linux"
VERSION="8.3 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.3"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.3 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8.3:GA"
HOME_URL="https://www.redhat.com/;
BUG_REPORT_URL="https://bugzilla.redhat.com/;

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.3
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.3"
Red Hat Enterprise Linux release 8.3 (Ootpa)
Red Hat Enterprise Linux release 8.3 (Ootpa)

[root@kontainerscomk ~]# sysctl -a | grep dirty
vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_expire_centisecs = 3000
vm.dirty_ratio = 30
vm.dirty_writeback_centisecs = 500
vm.dirtytime_expire_seconds = 43200

[root@kontainerscomk ~]# xfs_db -r /dev/dm-0
xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 
0xa82a)
Use -F to force a read attempt.
[root@kontainerscomk ~]# xfs_db -r /dev/dm-0 -F
xfs_db: /dev/dm-0 is not a valid XFS filesystem (unexpected SB magic number 
0xa82a)
xfs_db: size check failed
xfs_db: V1 inodes unsupported. Please try an older xfsprogs.

[root@kontainerscomk ~]# cat /etc/fstab
#
# /etc/fstab
# Created by anaconda on Thu Nov 19 22:40:39 2020
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
/dev/mapper/rhel-root   /   xfs defaults0 0
UUID=ad84d1ea-c9cc-4b22-8338-d1a6b2c7d27e /boot   xfs 
defaults0 0
UUID=4642-2FF6  /boot/efi   vfat
umask=0077,shortname=winnt 0 2
/dev/mapper/rhel-swap   noneswapdefaults0 0

Thanks,


-Original Message-
From: Strahil Nikolov  
Sent: Sunday, November 29, 2020 2:33 PM
To: Vinícius Ferrão 
Cc: users 
Subject: Re: [ovirt-users] Re: Constantly XFS in memory corruption inside VMs

Can you check the output on the VM that was affected:
# cat /etc/*release
# sysctl -a | grep dirty


Best Regards,
Strahil Nikolov





В неделя, 29 ноември 2020 г., 19:07:48 Гринуич+2, Vinícius Ferrão via Users 
 написа: 





Hi Strahil.

I’m not using barrier options on mount. It’s the default settings from CentOS 
install.

I have some additional findings, there’s a big number of discarded packages on 
the switch on the hypervisor interfaces.

Discards are OK as far as I know, I hope TCP handles this and do the proper 
retransmissions, but I ask if this may be related or not. Our storage is over 
NFS. My general expertise is with iSCSI and I’ve never seen this kind of issue 
with iSCSI, not that I’m aware of.

In other clusters, I’ve seen a high number of discards with iSCSI on XenServer 
7.2 but there’s no corruption on the VMs there...

Thanks,

Sent from my iPhone

> On 29 Nov 2020, at 04:00, Strahil Nikolov  wrote:
> 
> Are you using "nobarrier" mount options in the VM ?
> 
> If yes, can you try to remove the "nobarrrier" option.
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В събота, 28 ноември 2020 г., 19:25:48 Гринуич+2, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil,
> 
> I moved a running VM to other host, rebooted and no corruption was found. If 
> there's any corruption it may be silent corruption... I've cases where the VM 
> was new, just installed, run dnf -y update to get the updated packages, 
> rebooted, and boom XFS corruption. So perhaps the motion process isn't the 
> one to blame.
> 
> But, in fact, I remember when moving a VM that it went down during the 
> process and when I rebooted it was corrupted. But this may not seems related. 
> It perhaps was already in a inconsistent state.
> 
> Anyway, here's the mount options:
> 
> Host1:
> 192.168.10.14:/mnt/pool0/ovirt/vm on 
> /rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
> (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,noshar
> ecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,l
> ocal_lock=none,addr=192.168.10.14)
> 
> Host2:
> 192.168.10.14:/mnt/pool0/ovirt/vm on 
> /rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
> (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,noshar
> ecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,l
> ocal_lock=none,addr=192.168.10.14)
> 
> The option

[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-11-29 Thread Vinícius Ferrão via Users
Hi Strahil.

I’m not using barrier options on mount. It’s the default settings from CentOS 
install.

I have some additional findings, there’s a big number of discarded packages on 
the switch on the hypervisor interfaces.

Discards are OK as far as I know, I hope TCP handles this and do the proper 
retransmissions, but I ask if this may be related or not. Our storage is over 
NFS. My general expertise is with iSCSI and I’ve never seen this kind of issue 
with iSCSI, not that I’m aware of.

In other clusters, I’ve seen a high number of discards with iSCSI on XenServer 
7.2 but there’s no corruption on the VMs there...

Thanks,

Sent from my iPhone

> On 29 Nov 2020, at 04:00, Strahil Nikolov  wrote:
> 
> Are you using "nobarrier" mount options in the VM ?
> 
> If yes, can you try to remove the "nobarrrier" option.
> 
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В събота, 28 ноември 2020 г., 19:25:48 Гринуич+2, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil,
> 
> I moved a running VM to other host, rebooted and no corruption was found. If 
> there's any corruption it may be silent corruption... I've cases where the VM 
> was new, just installed, run dnf -y update to get the updated packages, 
> rebooted, and boom XFS corruption. So perhaps the motion process isn't the 
> one to blame.
> 
> But, in fact, I remember when moving a VM that it went down during the 
> process and when I rebooted it was corrupted. But this may not seems related. 
> It perhaps was already in a inconsistent state.
> 
> Anyway, here's the mount options:
> 
> Host1:
> 192.168.10.14:/mnt/pool0/ovirt/vm on 
> /rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
> (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,local_lock=none,addr=192.168.10.14)
> 
> Host2:
> 192.168.10.14:/mnt/pool0/ovirt/vm on 
> /rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
> (rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,local_lock=none,addr=192.168.10.14)
> 
> The options are the default ones. I haven't changed anything when configuring 
> this cluster.
> 
> Thanks.
> 
> 
> 
> -Original Message-
> From: Strahil Nikolov  
> Sent: Saturday, November 28, 2020 1:54 PM
> To: users ; Vinícius Ferrão 
> Subject: Re: [ovirt-users] Constantly XFS in memory corruption inside VMs
> 
> Can you try with a test vm, if this happens after a Virtual Machine migration 
> ?
> 
> What are your mount options for the storage domain ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В събота, 28 ноември 2020 г., 18:25:15 Гринуич+2, Vinícius Ferrão via Users 
>  написа: 
> 
> 
> 
> 
> 
>   
> 
> 
> Hello,
> 
>  
> 
> I’m trying to discover why an oVirt 4.4.3 Cluster with two hosts and NFS 
> shared storage on TrueNAS 12.0 is constantly getting XFS corruption inside 
> the VMs.
> 
>  
> 
> For random reasons VM’s gets corrupted, sometimes halting it or just being 
> silent corrupted and after a reboot the system is unable to boot due to 
> “corruption of in-memory data detected”. Sometimes the corrupted data are 
> “all zeroes”, sometimes there’s data there. In extreme cases the XFS 
> superblock 0 get’s corrupted and the system cannot even detect a XFS 
> partition anymore since the magic XFS key is corrupted on the first blocks of 
> the virtual disk.
> 
>  
> 
> This is happening for a month now. We had to rollback some backups, and I 
> don’t trust anymore on the state of the VMs.
> 
>  
> 
> Using xfs_db I can see that some VM’s have corrupted superblocks but the VM 
> is up. One in specific, was with sb0 corrupted, so I knew when a reboot kicks 
> in the machine will be gone, and that’s exactly what happened.
> 
>  
> 
> Another day I was just installing a new CentOS 8 VM for random reasons, and 
> after running dnf -y update and a reboot the VM was corrupted needing XFS 
> repair. That was an extreme case.
> 
>  
> 
> So, I’ve looked on the TrueNAS logs, and there’s apparently nothing wrong on 
> the system. No errors logged on dmesg, nothing on /var/log/messages and no 
> errors on the “zpools”, not even after scrub operations. On the switch, a 
> Catalyst 2960X, we’ve been monitoring it and all it’s interfaces. There are 
> no “up and down” and zero errors on all interfaces (we have a 4x Port LACP on 
> the TrueNAS side and 2x Port LACP on each hosts), everything seems to be 
> fine. The only metric that I was unable to get is “dr

[ovirt-users] Re: Constantly XFS in memory corruption inside VMs

2020-11-28 Thread Vinícius Ferrão via Users
Hi Strahil,

I moved a running VM to other host, rebooted and no corruption was found. If 
there's any corruption it may be silent corruption... I've cases where the VM 
was new, just installed, run dnf -y update to get the updated packages, 
rebooted, and boom XFS corruption. So perhaps the motion process isn't the one 
to blame.

But, in fact, I remember when moving a VM that it went down during the process 
and when I rebooted it was corrupted. But this may not seems related. It 
perhaps was already in a inconsistent state.

Anyway, here's the mount options:

Host1:
192.168.10.14:/mnt/pool0/ovirt/vm on 
/rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,local_lock=none,addr=192.168.10.14)

Host2:
192.168.10.14:/mnt/pool0/ovirt/vm on 
/rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,local_lock=none,addr=192.168.10.14)

The options are the default ones. I haven't changed anything when configuring 
this cluster.

Thanks.



-Original Message-
From: Strahil Nikolov  
Sent: Saturday, November 28, 2020 1:54 PM
To: users ; Vinícius Ferrão 
Subject: Re: [ovirt-users] Constantly XFS in memory corruption inside VMs

Can you try with a test vm, if this happens after a Virtual Machine migration ?

What are your mount options for the storage domain ?

Best Regards,
Strahil Nikolov






В събота, 28 ноември 2020 г., 18:25:15 Гринуич+2, Vinícius Ferrão via Users 
 написа: 





  


Hello,

 

I’m trying to discover why an oVirt 4.4.3 Cluster with two hosts and NFS shared 
storage on TrueNAS 12.0 is constantly getting XFS corruption inside the VMs.

 

For random reasons VM’s gets corrupted, sometimes halting it or just being 
silent corrupted and after a reboot the system is unable to boot due to 
“corruption of in-memory data detected”. Sometimes the corrupted data are “all 
zeroes”, sometimes there’s data there. In extreme cases the XFS superblock 0 
get’s corrupted and the system cannot even detect a XFS partition anymore since 
the magic XFS key is corrupted on the first blocks of the virtual disk.

 

This is happening for a month now. We had to rollback some backups, and I don’t 
trust anymore on the state of the VMs.

 

Using xfs_db I can see that some VM’s have corrupted superblocks but the VM is 
up. One in specific, was with sb0 corrupted, so I knew when a reboot kicks in 
the machine will be gone, and that’s exactly what happened.

 

Another day I was just installing a new CentOS 8 VM for random reasons, and 
after running dnf -y update and a reboot the VM was corrupted needing XFS 
repair. That was an extreme case.

 

So, I’ve looked on the TrueNAS logs, and there’s apparently nothing wrong on 
the system. No errors logged on dmesg, nothing on /var/log/messages and no 
errors on the “zpools”, not even after scrub operations. On the switch, a 
Catalyst 2960X, we’ve been monitoring it and all it’s interfaces. There are no 
“up and down” and zero errors on all interfaces (we have a 4x Port LACP on the 
TrueNAS side and 2x Port LACP on each hosts), everything seems to be fine. The 
only metric that I was unable to get is “dropped packages”, but I’m don’t know 
if this can be an issue or not.

 

Finally, on oVirt, I can’t find anything either. I looked on /var/log/messages 
and /var/log/sanlock.log but there’s nothing that I found suspicious.

 

Is there’s anyone out there experiencing this? Our VM’s are mainly CentOS 7/8 
with XFS, there’s 3 Windows VM’s that does not seems to be affected, everything 
else is affected.

 

Thanks all.



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: 
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLYSE7HCFNWTWFZZTL2EJHV36OENHUGB/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OWT5U6UTXNSBZELWFVID42XKYMUSCPDF/


[ovirt-users] Constantly XFS in memory corruption inside VMs

2020-11-28 Thread Vinícius Ferrão via Users
Hello,

I'm trying to discover why an oVirt 4.4.3 Cluster with two hosts and NFS shared 
storage on TrueNAS 12.0 is constantly getting XFS corruption inside the VMs.

For random reasons VM's gets corrupted, sometimes halting it or just being 
silent corrupted and after a reboot the system is unable to boot due to 
"corruption of in-memory data detected". Sometimes the corrupted data are "all 
zeroes", sometimes there's data there. In extreme cases the XFS superblock 0 
get's corrupted and the system cannot even detect a XFS partition anymore since 
the magic XFS key is corrupted on the first blocks of the virtual disk.

This is happening for a month now. We had to rollback some backups, and I don't 
trust anymore on the state of the VMs.

Using xfs_db I can see that some VM's have corrupted superblocks but the VM is 
up. One in specific, was with sb0 corrupted, so I knew when a reboot kicks in 
the machine will be gone, and that's exactly what happened.

Another day I was just installing a new CentOS 8 VM for random reasons, and 
after running dnf -y update and a reboot the VM was corrupted needing XFS 
repair. That was an extreme case.

So, I've looked on the TrueNAS logs, and there's apparently nothing wrong on 
the system. No errors logged on dmesg, nothing on /var/log/messages and no 
errors on the "zpools", not even after scrub operations. On the switch, a 
Catalyst 2960X, we've been monitoring it and all it's interfaces. There are no 
"up and down" and zero errors on all interfaces (we have a 4x Port LACP on the 
TrueNAS side and 2x Port LACP on each hosts), everything seems to be fine. The 
only metric that I was unable to get is "dropped packages", but I'm don't know 
if this can be an issue or not.

Finally, on oVirt, I can't find anything either. I looked on /var/log/messages 
and /var/log/sanlock.log but there's nothing that I found suspicious.

Is there's anyone out there experiencing this? Our VM's are mainly CentOS 7/8 
with XFS, there's 3 Windows VM's that does not seems to be affected, everything 
else is affected.

Thanks all.
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLYSE7HCFNWTWFZZTL2EJHV36OENHUGB/


[ovirt-users] Re: EPYC CPU not being detected correctly on cluster

2020-11-25 Thread Vinícius Ferrão via Users
Lucia, I ended figuring out.

The culprit is that I was pinned with the wrong virt module; after running this 
commands I was able to have the CPU properly detected:

# dnf module reset virt
# dnf module enable virt:8.3
# dnf upgrade –nobest

I think virt was in 8.2.

Thank you!

From: Lucia Jelinkova 
Sent: Monday, November 23, 2020 6:25 AM
To: Vinícius Ferrão 
Cc: users 
Subject: Re: [ovirt-users] EPYC CPU not being detected correctly on cluster

Hi Vinícius,

Thank you for the libvirt output - libvirt marked the EPYC CPU as not usable. 
Let's query qemu why that is.  You do not need an oVirt VM to do that, just any 
VM running on qemu, e.g. created by Virtual Machines Manager or you can follow 
the command from the answer here:

https://unix.stackexchange.com/questions/309788/how-to-create-a-vm-from-scratch-with-virsh

Then you can use the following commands:
sudo virsh list --all
sudo virsh qemu-monitor-command [your-vm's-name] --pretty 
'{"execute":"query-cpu-definitions"}'

I do not know if this could be related to UEFI Firmware, lets check the qemu 
output first.

Regards,

Lucia


On Fri, Nov 20, 2020 at 4:07 PM Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:
Hi Lucia,

I had to create an user for virsh:
# saslpasswd2 -a libvirt test
Password:
Again (for verification):

With that in mind, here’s the outputs:


  /usr/libexec/qemu-kvm
  kvm
  pc-i440fx-rhel7.6.0
  x86_64
  
  
  


  /usr/share/OVMF/OVMF_CODE.secboot.fd
  
rom
pflash
  
  
yes
no
  
  
no
  

  
  


  EPYC-IBPB
  AMD
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  qemu64
  qemu32
  phenom
  pentium3
  pentium2
  pentium
  n270
  kvm64
  kvm32
  coreduo
  core2duo
  athlon
  Westmere-IBRS
  Westmere
  Skylake-Server-noTSX-IBRS
  Skylake-Server-IBRS
  Skylake-Server
  Skylake-Client-noTSX-IBRS
  Skylake-Client-IBRS
  Skylake-Client
  SandyBridge-IBRS
  SandyBridge
  Penryn
  Opteron_G5
  Opteron_G4
  Opteron_G3
  Opteron_G2
  Opteron_G1
  Nehalem-IBRS
  Nehalem
  IvyBridge-IBRS
  IvyBridge
  Icelake-Server-noTSX
  Icelake-Server
  Icelake-Client-noTSX
  Icelake-Client
  Haswell-noTSX-IBRS
  Haswell-noTSX
  Haswell-IBRS
  Haswell
  EPYC-IBPB
  EPYC
  Dhyana
 Cooperlake
  Conroe
  Cascadelake-Server-noTSX
  Cascadelake-Server
  Broadwell-noTSX-IBRS
  Broadwell-noTSX
  Broadwell-IBRS
  Broadwell
  486

  
  

  
disk
cdrom
floppy
lun
  
  
ide
fdc
scsi
virtio
usb
sata
  
  
virtio
virtio-transitional
virtio-non-transitional
  


  
sdl
vnc
spice
  


  
vga
cirrus
qxl
virtio
none
bochs
ramfb
  


  
subsystem
  
  
default
mandatory
requisite
optional
  
  
usb
pci
scsi
  
  
  
default
vfio
  


  
virtio
virtio-transitional
virtio-non-transitional
  
  
random
egd
  

  
  






  47
  1

  


Regarding the last two commands, I don’t have any VM running, since I cannot 
start anything on the engine.

I’m starting to suspect that this may be something in the UEFI Firmware.

Any thoughts?

Thanks,

From: Lucia Jelinkova mailto:ljeli...@redhat.com>>
Sent: Friday, November 20, 2020 5:30 AM
To: Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>>
Cc: users mailto:users@ovirt.org>>
Subject: Re: [ovirt-users] EPYC CPU not being detected correctly on cluster

Hi,

oVirt CPU detection depends on libvirt (and that depends on qemu) CPU models. 
Could you please run the following command to see what libvirt reports?

virsh domcapabilities

That should give you the list of CPUs known to libvirt with a usability flag 
for each CPU.

If you find out that the CPU is not usable by libvirt, you might want to dig 
deeper by querying quemu directly.

Locate any VM running on the system by
sudo virsh list --all

Use the name of a VM in the following command:
sudo virsh qemu-monitor-command [your-vm's-name] --pretty 
'{"execute":"query-cpu-definitions"}'

That would give you the list of all CPUs supported by qemu and it will list all 
cpu's features that are not available on your system.

Regards,

Lucia

On Thu, Nov 19, 2020 at 9:38 PM Vinícius Ferrão via Users 
mailto:users@ovirt.

[ovirt-users] Re: EPYC CPU not being detected correctly on cluster

2020-11-20 Thread Vinícius Ferrão via Users
Hi Lucia,

I had to create an user for virsh:
# saslpasswd2 -a libvirt test
Password:
Again (for verification):

With that in mind, here’s the outputs:


  /usr/libexec/qemu-kvm
  kvm
  pc-i440fx-rhel7.6.0
  x86_64
  
  
  


  /usr/share/OVMF/OVMF_CODE.secboot.fd
  
rom
pflash
  
  
yes
no
  
  
no
  

  
  


  EPYC-IBPB
  AMD
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  


  qemu64
  qemu32
  phenom
  pentium3
  pentium2
  pentium
  n270
  kvm64
  kvm32
  coreduo
  core2duo
  athlon
  Westmere-IBRS
  Westmere
  Skylake-Server-noTSX-IBRS
  Skylake-Server-IBRS
  Skylake-Server
  Skylake-Client-noTSX-IBRS
  Skylake-Client-IBRS
  Skylake-Client
  SandyBridge-IBRS
  SandyBridge
  Penryn
  Opteron_G5
  Opteron_G4
  Opteron_G3
  Opteron_G2
  Opteron_G1
  Nehalem-IBRS
  Nehalem
  IvyBridge-IBRS
  IvyBridge
  Icelake-Server-noTSX
  Icelake-Server
  Icelake-Client-noTSX
  Icelake-Client
  Haswell-noTSX-IBRS
  Haswell-noTSX
  Haswell-IBRS
  Haswell
  EPYC-IBPB
  EPYC
  Dhyana
 Cooperlake
  Conroe
  Cascadelake-Server-noTSX
  Cascadelake-Server
  Broadwell-noTSX-IBRS
  Broadwell-noTSX
  Broadwell-IBRS
  Broadwell
  486

  
  

  
disk
cdrom
floppy
lun
  
  
ide
fdc
scsi
virtio
usb
sata
  
  
virtio
virtio-transitional
virtio-non-transitional
  


  
sdl
vnc
spice
  


  
vga
cirrus
qxl
virtio
none
bochs
ramfb
  


  
subsystem
  
  
default
mandatory
requisite
optional
  
  
usb
pci
scsi
  
  
  
default
vfio
  


  
virtio
virtio-transitional
virtio-non-transitional
  
  
random
egd
  

  
  






  47
  1

  


Regarding the last two commands, I don’t have any VM running, since I cannot 
start anything on the engine.

I’m starting to suspect that this may be something in the UEFI Firmware.

Any thoughts?

Thanks,

From: Lucia Jelinkova 
Sent: Friday, November 20, 2020 5:30 AM
To: Vinícius Ferrão 
Cc: users 
Subject: Re: [ovirt-users] EPYC CPU not being detected correctly on cluster

Hi,

oVirt CPU detection depends on libvirt (and that depends on qemu) CPU models. 
Could you please run the following command to see what libvirt reports?

virsh domcapabilities

That should give you the list of CPUs known to libvirt with a usability flag 
for each CPU.

If you find out that the CPU is not usable by libvirt, you might want to dig 
deeper by querying quemu directly.

Locate any VM running on the system by
sudo virsh list --all

Use the name of a VM in the following command:
sudo virsh qemu-monitor-command [your-vm's-name] --pretty 
'{"execute":"query-cpu-definitions"}'

That would give you the list of all CPUs supported by qemu and it will list all 
cpu's features that are not available on your system.

Regards,

Lucia

On Thu, Nov 19, 2020 at 9:38 PM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hi

I’ve an strange issue with two hosts (not using the hypervisor image) with EPYC 
CPUs, on the engine I got this message:

The host CPU does not match the Cluster CPU Type and is running in a degraded 
mode. It is missing the following CPU flags: model_EPYC. Please update the host 
CPU microcode or change the Cluster CPU Type.

But it is an EPYC CPU, the firmware is updated to the latest versions, but for 
some reason oVirt does not like it.

Here’s the relevant output from VDSM:
"cpuCores": "128",
"cpuFlags": 
"ibs,vme,abm,sep,ssse3,perfctr_core,sse4_2,skip-l1dfl-vmentry,cx16,pae,misalignsse,avx2,smap,movbe,vgif,rdctl-no,extapic,clflushopt,de,sse4_1,xsaveerptr,perfctr_llc,fma,mca,sse,rdtscp,monitor,umip,mwaitx,cr8_legacy,mtrr,stibp,bmi2,pclmulqdq,amd-ssbd,lbrv,pdpe1gb,constant_tsc,vmmcall,f16c,ibrs,fsgsbase,invtsc,nopl,lm,3dnowprefetch,smca,ht,tsc_adjust,popcnt,cpb,bmi1,mmx,arat,aperfmperf,bpext,cqm_occup_llc,virt-ssbd,tce,pse,xsave,xgetbv1,topoext,sha_ni,amd_ppin,rdrand,cpuid,tsc_scale,extd_apicid,cqm,rep_good,tsc,sse4a,flushbyasid,pschange-mc-no,mds-no,ibpb,smep,clflush,tsc-deadline,fxsr,pat,avx,pfthreshold,v_vmsave_vmload,osvw,xsavec,cdp_l3,clzero,svm_lock,nonstop_tsc,adx,hw_pstate,spec-ctrl,arch-capabilities,xsaveopt,skinit,rdt_a,svm,rdpid,lah

[ovirt-users] EPYC CPU not being detected correctly on cluster

2020-11-19 Thread Vinícius Ferrão via Users
Hi

I've an strange issue with two hosts (not using the hypervisor image) with EPYC 
CPUs, on the engine I got this message:

The host CPU does not match the Cluster CPU Type and is running in a degraded 
mode. It is missing the following CPU flags: model_EPYC. Please update the host 
CPU microcode or change the Cluster CPU Type.

But it is an EPYC CPU, the firmware is updated to the latest versions, but for 
some reason oVirt does not like it.

Here's the relevant output from VDSM:
"cpuCores": "128",
"cpuFlags": 
"ibs,vme,abm,sep,ssse3,perfctr_core,sse4_2,skip-l1dfl-vmentry,cx16,pae,misalignsse,avx2,smap,movbe,vgif,rdctl-no,extapic,clflushopt,de,sse4_1,xsaveerptr,perfctr_llc,fma,mca,sse,rdtscp,monitor,umip,mwaitx,cr8_legacy,mtrr,stibp,bmi2,pclmulqdq,amd-ssbd,lbrv,pdpe1gb,constant_tsc,vmmcall,f16c,ibrs,fsgsbase,invtsc,nopl,lm,3dnowprefetch,smca,ht,tsc_adjust,popcnt,cpb,bmi1,mmx,arat,aperfmperf,bpext,cqm_occup_llc,virt-ssbd,tce,pse,xsave,xgetbv1,topoext,sha_ni,amd_ppin,rdrand,cpuid,tsc_scale,extd_apicid,cqm,rep_good,tsc,sse4a,flushbyasid,pschange-mc-no,mds-no,ibpb,smep,clflush,tsc-deadline,fxsr,pat,avx,pfthreshold,v_vmsave_vmload,osvw,xsavec,cdp_l3,clzero,svm_lock,nonstop_tsc,adx,hw_pstate,spec-ctrl,arch-capabilities,xsaveopt,skinit,rdt_a,svm,rdpid,lahf_lm,fpu,rdseed,fxsr_opt,sse2,nrip_save,vmcb_clean,sme,cat_l3,cqm_mbm_local,irperf,overflow_recov,avic,mce,mmxext,msr,cx8,hypervisor,wdt,mba,nx,decodeassists,cmp_legacy,x2apic,perfctr_nb,succor,pni,xsaves,clwb,cqm_llc,syscall,apic,pge,npt,pse36,cmov,ssbd,pausefilter,sev,aes,wbnoinvd,cqm_mbm_total,spec_ctrl,model_qemu32,model_Opteron_G3,model_Nehalem-IBRS,model_qemu64,model_Conroe,model_kvm64,model_Penryn,model_SandyBridge,model_pentium,model_pentium2,model_kvm32,model_Nehalem,model_Opteron_G2,model_pentium3,model_Opteron_G1,model_SandyBridge-IBRS,model_486,model_Westmere-IBRS,model_Westmere",
"cpuModel": "AMD EPYC 7H12 64-Core Processor",
"cpuSockets": "2",
"cpuSpeed": "3293.405",
"cpuThreads": "256",

Any ideia on why ou what to do to fix it?

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WP6XL6ODTLJVB46MAXKCOA34PEFN576Q/


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
 0009  ActivityState = 
> [519192.551137] *** Host State ***
> [519192.551963] RIP = 0xc150a034  RSP = 0x88cd9cafbc90
> [519192.552805] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
> [519192.553646] FSBase=7f7da762a700 GSBase=88d45f2c 
> TRBase=88d45f2c4000
> [519192.554496] GDTBase=88d45f2cc000 IDTBase=ff528000
> [519192.555347] CR0=80050033 CR3=00033dc82000 CR4=001627e0
> [519192.556202] Sysenter RSP= CS:RIP=0010:91596cc0
> [519192.557058] EFER = 0x0d01  PAT = 0x0007050600070106
> [519192.557913] *** Control State ***
> [519192.558757] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
> [519192.559605] EntryControls=d1ff ExitControls=002fefff
> [519192.560453] ExceptionBitmap=00060042 PFECmask= PFECmatch=
> [519192.561306] VMEntry: intr_info= errcode=0006 ilen=
> [519192.562158] VMExit: intr_info= errcode= ilen=0001
> [519192.563006] reason=8021 qualification=
> [519192.563860] IDTVectoring: info= errcode=
> [519192.564695] TSC Offset = 0xfffcc6c7d53f16d7
> [519192.565526] TPR Threshold = 0x00
> [519192.566345] EPT pointer = 0x000b9397901e
> [519192.567162] PLE Gap=0080 Window=1000
> [519192.567984] Virtual processor ID = 0x0005
> 
> 
> 
> 
> 
> 
> 
> Thank you!
> 
> 
> 
> 
> 
> 
>>   
>> On 22 Sep 2020, at 02:30, Strahil Nikolov  wrote:
>> 
>> 
>>   
>> Interesting is that I don't find anything recent , but this one:
>> https://devblogs.microsoft.com/oldnewthing/20120511-00/?p=7653
>> 
>> Can you check if anything in the OS was updated/changed recently ?
>> 
>> Also check if the VM is with nested virtualization enabled. 
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В понеделник, 21 септември 2020 г., 23:56:26 Гринуич+3, Vinícius Ferrão 
>>  написа: 
>> 
>> 
>> 
>> 
>> 
>> Strahil, thank you man. We finally got some output:
>> 
>> 2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any 
>> NUMA nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 
>> [socket-id: 11, core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 
>> 0, thread-id: 0], CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 
>> [socket-id: 14, core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 
>> 0, thread-id: 0]
>> 2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus 
>> should be described in NUMA config, ability to start up with partial NUMA 
>> mappings is obsoleted and will be removed in future
>> KVM: entry failed, hardware error 0x8021
>> 
>> If you're running a guest on an Intel machine without unrestricted mode
>> support, the failure can be most likely due to the guest entering an invalid
>> state for Intel VT. For example, the guest maybe running in big real mode
>> which is not supported on less recent Intel processors.
>> 
>> EAX= EBX=01746180 ECX=4be7c002 EDX=000400b6
>> ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770
>> EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
>> ES =   00809300
>> CS =8d00 7ff8d000  00809300
>> SS =   00809300
>> DS =   00809300
>> FS =   00809300
>> GS =   00809300
>> LDT=  000f 
>> TR =0040 04c59000 0067 8b00
>> GDT=04c5afb0 0057
>> IDT= 
>> CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=
>> DR0= DR1= DR2= 
>> DR3= 
>> DR6=0ff0 DR7=0400
>> EFER=
>> Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff 
>> ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 
>> ff ff
>> 2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 
>> ()
>> 2020-09-16 04:12:02.212+: shutting down, reason=shutdown
>> 
>> 
>> 
>> 
>> 
>> 
>> That’s the issue, I got this on the logs of both physical machines. The 
>> probability of both machines are damaged is not quite common right? So even 
>> with the log saying it’s a hardware error it may be software related? And 
>> again, this only happens with this VM.
>> 
>> 
>>> On

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
Hi Gianluca.

On 22 Sep 2020, at 04:24, Gianluca Cecchi 
mailto:gianluca.cec...@gmail.com>> wrote:



On Tue, Sep 22, 2020 at 9:12 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hi Strahil, yes I can’t find anything recently either. You digged way further 
then me, I found some regressions on the kernel but I don’t know if it’s 
related or not:

https://patchwork.kernel.org/patch/5526561/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1045027

Regarding the OS, nothing new was installed, just regular Windows Updates.
And finally about nested virtualisation, it’s disabled on hypervisor.



In your original post you wrote about the VM going suspended.
So I think there could be something useful in engine.log on the engine and/or 
vdsm.log on the hypervisor.
Could you check those?

Yes I goes to suspend. I think this is just the engine don’t knowing what 
really happened and guessing it was suspended. On engine.log I only have this 
two lines:

# grep "2020-09-22 01:51" /var/log/ovirt-engine/engine.log
2020-09-22 01:51:52,604-03 INFO  
[org.ovirt.engine.core.vdsbroker.monitoring.VmAnalyzer] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] VM 
'351db98a-5f74-439f-99a4-31f611b2d250'(cerulean) moved from 'Up' --> 'Paused'
2020-09-22 01:51:52,699-03 INFO  
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] 
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-57) [] 
EVENT_ID: VM_PAUSED(1,025), VM cerulean has been paused.

Note that I’ve “grepped” with time. There’s only this two lines when it crashed 
like 2h30m ago.

On vdsm.log on the near time with the name of the VM I only found an huge JSON, 
with the characteristics of the VM. If there something that I should check 
specifically? Tried some combinations of “grep” but nothing really useful.

Also, do you see anything in event viewer of the WIndows VM and/or in Freenas 
logs?

FreeNAS is just cool, nothing wrong there. No errors on dmesg, nor resource 
starvation on ZFS. No overload on the disks, nothing… the storage is running 
easy.

About Windows Event Viewer it’s my Achilles’ heel; nothing relevant either as 
far as I’m concerned. There’s of course some mentions of improperly shutdown 
due to the crash, but nothing else. I’m looking further here, will report back 
if I found something useful.

Thanks,


Gianluca

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XTTUYAGYB6EE5I3XNNLBZEBWY363XTIQ/


[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-22 Thread Vinícius Ferrão via Users
output:

2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA 
nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, 
core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], 
CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, 
core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]
2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should 
be described in NUMA config, ability to start up with partial NUMA mappings is 
obsoleted and will be removed in future
KVM: entry failed, hardware error 0x8021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX= EBX=01746180 ECX=4be7c002 EDX=000400b6
ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =8d00 7ff8d000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 04c59000 0067 8b00
GDT=04c5afb0 0057
IDT= 
CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 
()
2020-09-16 04:12:02.212+: shutting down, reason=shutdown






That’s the issue, I got this on the logs of both physical machines. The 
probability of both machines are damaged is not quite common right? So even 
with the log saying it’s a hardware error it may be software related? And 
again, this only happens with this VM.

On 21 Sep 2020, at 17:36, Strahil Nikolov  wrote:

Usually libvirt's log might provide hints (yet , no clues) of any issues.

For example:
/var/log/libvirt/qemu/.log

Anything changed recently (maybe oVirt version was increased) ?

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão 
 написа:





Hi Strahil,



Both disks are VirtIO-SCSI and are Preallocated:














Thanks,









On 21 Sep 2020, at 17:09, Strahil Nikolov  wrote:



What type of disks are you using ? Any change you use thin disks ?

Best Regards,
Strahil Nikolov






В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
Users  написа:





Hi, sorry to bump the thread.

But I still with this issue on the VM. This crashes are still happening, and I 
really don’t know what to do. Since there’s nothing on logs, except from that 
message on `dmesg` of the host machine I started changing setting to see if 
anything changes or if I at least I get a pattern.

What I’ve tried:
1. Disabled I/O Threading on VM.
2. Increased I/O Threading to 2 form 1.
3. Disabled Memory Balooning.
4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
RAM.
5. Moved the VM to another host.
6. Dedicated a host specific to this VM.
7. Check on the storage system to see if there’s any resource starvation, but 
everything seems to be fine.
8. Checked both iSCSI switches to see if there’s something wrong with the 
fabrics: 0 errors.

I’m really running out of ideas. The VM was working normally and suddenly this 
started.

Thanks,

PS: When I was typing this message it crashed again:

[427483.126725] *** Guest State ***
[427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[427483.128505] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[427483.129342] CR3 = 0x0001849ff002
[427483.130177] RSP = 0xb10186b0  RIP = 0x8000
[427483.131014] RFLAGS=0x0002DR7 = 0x0400
[427483.131859] Sysenter RSP= CS:RIP=:
[427483.132708] CS:  sel=0x9b00, attr=0x08093, limit=0x, 
base=0x7ff9b000
[427483.133559] DS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.134413] SS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.135237] ES:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136040] FS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136842] GS:  sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.137629] GDTR:  limit=0x0057, 
base=0xb10186eb4fb0
[427483.138409] LDTR: sel=0x

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-21 Thread Vinícius Ferrão via Users
Strahil, thank you man. We finally got some output:

2020-09-15T12:34:49.362238Z qemu-kvm: warning: CPU(s) not present in any NUMA 
nodes: CPU 10 [socket-id: 10, core-id: 0, thread-id: 0], CPU 11 [socket-id: 11, 
core-id: 0, thread-id: 0], CPU 12 [socket-id: 12, core-id: 0, thread-id: 0], 
CPU 13 [socket-id: 13, core-id: 0, thread-id: 0], CPU 14 [socket-id: 14, 
core-id: 0, thread-id: 0], CPU 15 [socket-id: 15, core-id: 0, thread-id: 0]
2020-09-15T12:34:49.362265Z qemu-kvm: warning: All CPU(s) up to maxcpus should 
be described in NUMA config, ability to start up with partial NUMA mappings is 
obsoleted and will be removed in future
KVM: entry failed, hardware error 0x8021

If you're running a guest on an Intel machine without unrestricted mode
support, the failure can be most likely due to the guest entering an invalid
state for Intel VT. For example, the guest maybe running in big real mode
which is not supported on less recent Intel processors.

EAX= EBX=01746180 ECX=4be7c002 EDX=000400b6
ESI=8b3d6080 EDI=02d70400 EBP=a19bbdfe ESP=82883770
EIP=8000 EFL=0002 [---] CPL=0 II=0 A20=1 SMM=1 HLT=0
ES =   00809300
CS =8d00 7ff8d000  00809300
SS =   00809300
DS =   00809300
FS =   00809300
GS =   00809300
LDT=  000f 
TR =0040 04c59000 0067 8b00
GDT= 04c5afb0 0057
IDT=  
CR0=00050032 CR2=c1b7ec48 CR3=001ad002 CR4=
DR0= DR1= DR2= 
DR3= 
DR6=0ff0 DR7=0400
EFER=
Code=ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff  ff ff ff 
ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
2020-09-16T04:11:55.344128Z qemu-kvm: terminating on signal 15 from pid 1 
()
2020-09-16 04:12:02.212+: shutting down, reason=shutdown






That’s the issue, I got this on the logs of both physical machines. The 
probability of both machines are damaged is not quite common right? So even 
with the log saying it’s a hardware error it may be software related? And 
again, this only happens with this VM.

> On 21 Sep 2020, at 17:36, Strahil Nikolov  wrote:
> 
> Usually libvirt's log might provide hints (yet , no clues) of any issues.
> 
> For example: 
> /var/log/libvirt/qemu/.log
> 
> Anything changed recently (maybe oVirt version was increased) ?
> 
> Best Regards,
> Strahil Nikolov
> 
> 
> 
> 
> 
> 
> В понеделник, 21 септември 2020 г., 23:28:13 Гринуич+3, Vinícius Ferrão 
>  написа: 
> 
> 
> 
> 
> 
> Hi Strahil, 
> 
> 
> 
> Both disks are VirtIO-SCSI and are Preallocated:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> 
> 
> 
> 
> 
> 
> 
> 
>>   
>> On 21 Sep 2020, at 17:09, Strahil Nikolov  wrote:
>> 
>> 
>>   
>> What type of disks are you using ? Any change you use thin disks ?
>> 
>> Best Regards,
>> Strahil Nikolov
>> 
>> 
>> 
>> 
>> 
>> 
>> В понеделник, 21 септември 2020 г., 07:20:23 Гринуич+3, Vinícius Ferrão via 
>> Users  написа: 
>> 
>> 
>> 
>> 
>> 
>> Hi, sorry to bump the thread.
>> 
>> But I still with this issue on the VM. This crashes are still happening, and 
>> I really don’t know what to do. Since there’s nothing on logs, except from 
>> that message on `dmesg` of the host machine I started changing setting to 
>> see if anything changes or if I at least I get a pattern.
>> 
>> What I’ve tried:
>> 1. Disabled I/O Threading on VM.
>> 2. Increased I/O Threading to 2 form 1.
>> 3. Disabled Memory Balooning.
>> 4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
>> RAM.
>> 5. Moved the VM to another host.
>> 6. Dedicated a host specific to this VM.
>> 7. Check on the storage system to see if there’s any resource starvation, 
>> but everything seems to be fine.
>> 8. Checked both iSCSI switches to see if there’s something wrong with the 
>> fabrics: 0 errors.
>> 
>> I’m really running out of ideas. The VM was working normally and suddenly 
>> this started.
>> 
>> Thanks,
>> 
>> PS: When I was typing this message it crashed again:
>> 
>> [427483.126725] *** Guest State ***
>> [427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
>> gh_mask=fff7
>> [427483.128505] CR4: actual=0x2050, shadow=0x, 
>> gh_mask=f871
>> [427483.129342] CR3 = 0x0001849ff002
>> [427483.130177] RSP = 0xb10186b0  RI

[ovirt-users] Re: How to discover why a VM is getting suspended without recovery possibility?

2020-09-20 Thread Vinícius Ferrão via Users
Hi, sorry to bump the thread.

But I still with this issue on the VM. This crashes are still happening, and I 
really don’t know what to do. Since there’s nothing on logs, except from that 
message on `dmesg` of the host machine I started changing setting to see if 
anything changes or if I at least I get a pattern.

What I’ve tried:
1. Disabled I/O Threading on VM.
2. Increased I/O Threading to 2 form 1.
3. Disabled Memory Balooning.
4. Reduced VM resources form 10 CPU’s and 48GB of RAM to 6 CPU’s and 24GB of 
RAM.
5. Moved the VM to another host.
6. Dedicated a host specific to this VM.
7. Check on the storage system to see if there’s any resource starvation, but 
everything seems to be fine.
8. Checked both iSCSI switches to see if there’s something wrong with the 
fabrics: 0 errors.

I’m really running out of ideas. The VM was working normally and suddenly this 
started.

Thanks,

PS: When I was typing this message it crashed again:

[427483.126725] *** Guest State ***
[427483.127661] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[427483.128505] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[427483.129342] CR3 = 0x0001849ff002
[427483.130177] RSP = 0xb10186b0  RIP = 0x8000
[427483.131014] RFLAGS=0x0002 DR7 = 0x0400
[427483.131859] Sysenter RSP= CS:RIP=:
[427483.132708] CS:   sel=0x9b00, attr=0x08093, limit=0x, 
base=0x7ff9b000
[427483.133559] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.134413] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.135237] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136040] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.136842] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[427483.137629] GDTR:   limit=0x0057, 
base=0xb10186eb4fb0
[427483.138409] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[427483.139202] IDTR:   limit=0x, 
base=0x
[427483.139998] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0xb10186eb3000
[427483.140816] EFER = 0x  PAT = 0x0007010600070106
[427483.141650] DebugCtl = 0x  DebugExceptions = 
0x
[427483.142503] Interruptibility = 0009  ActivityState = 
[427483.143353] *** Host State ***
[427483.144194] RIP = 0xc0c65024  RSP = 0x9253c0b9bc90
[427483.145043] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[427483.145903] FSBase=7fcc13816700 GSBase=925adf24 
TRBase=925adf244000
[427483.146766] GDTBase=925adf24c000 IDTBase=ff528000
[427483.147630] CR0=80050033 CR3=0010597b6000 CR4=001627e0
[427483.148498] Sysenter RSP= CS:RIP=0010:8f196cc0
[427483.149365] EFER = 0x0d01  PAT = 0x0007050600070106
[427483.150231] *** Control State ***
[427483.151077] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[427483.151942] EntryControls=d1ff ExitControls=002fefff
[427483.152800] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[427483.153661] VMEntry: intr_info= errcode=0006 ilen=
[427483.154521] VMExit: intr_info= errcode= ilen=0004
[427483.155376] reason=8021 qualification=
[427483.156230] IDTVectoring: info= errcode=
[427483.157068] TSC Offset = 0xfffccfc261506dd9
[427483.157905] TPR Threshold = 0x0d
[427483.158728] EPT pointer = 0x0009b437701e
[427483.159550] PLE Gap=0080 Window=0008
[427483.160370] Virtual processor ID = 0x0004


> On 16 Sep 2020, at 17:11, Vinícius Ferrão  wrote:
> 
> Hello,
> 
> I’m an Exchange Server VM that’s going down to suspend without possibility of 
> recovery. I need to click on shutdown and them power on. I can’t find 
> anything useful on the logs, except on “dmesg” of the host:
> 
> [47807.747606] *** Guest State ***
> [47807.747633] CR0: actual=0x00050032, shadow=0x00050032, 
> gh_mask=fff7
> [47807.747671] CR4: actual=0x2050, shadow=0x, 
> gh_mask=f871
> [47807.747721] CR3 = 0x001ad002
> [47807.747739] RSP = 0xc20904fa3770  RIP = 0x8000
> [47807.747766] RFLAGS=0x0002 DR7 = 0x0400
> [47807.747792] Sysenter RSP= CS:RIP=:
> [47807.747821] CS:   sel=0x9100, attr=0x08093, limit=0x, 
> base=0x7ff91000
> [47807.747855] DS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [47807.747889] SS:   sel=0x, attr=0x08093, limit=0x, 
> base=0x
> [47807.747923] ES:   sel=0x, 

[ovirt-users] How to discover why a VM is getting suspended without recovery possibility?

2020-09-16 Thread Vinícius Ferrão via Users
Hello,

I’m an Exchange Server VM that’s going down to suspend without possibility of 
recovery. I need to click on shutdown and them power on. I can’t find anything 
useful on the logs, except on “dmesg” of the host:

[47807.747606] *** Guest State ***
[47807.747633] CR0: actual=0x00050032, shadow=0x00050032, 
gh_mask=fff7
[47807.747671] CR4: actual=0x2050, shadow=0x, 
gh_mask=f871
[47807.747721] CR3 = 0x001ad002
[47807.747739] RSP = 0xc20904fa3770  RIP = 0x8000
[47807.747766] RFLAGS=0x0002 DR7 = 0x0400
[47807.747792] Sysenter RSP= CS:RIP=:
[47807.747821] CS:   sel=0x9100, attr=0x08093, limit=0x, 
base=0x7ff91000
[47807.747855] DS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[47807.747889] SS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[47807.747923] ES:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[47807.747957] FS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[47807.747991] GS:   sel=0x, attr=0x08093, limit=0x, 
base=0x
[47807.748025] GDTR:   limit=0x0057, 
base=0x80817e7d5fb0
[47807.748059] LDTR: sel=0x, attr=0x1, limit=0x000f, 
base=0x
[47807.748093] IDTR:   limit=0x, 
base=0x
[47807.748128] TR:   sel=0x0040, attr=0x0008b, limit=0x0067, 
base=0x80817e7d4000
[47807.748162] EFER = 0x  PAT = 0x0007010600070106
[47807.748189] DebugCtl = 0x  DebugExceptions = 
0x
[47807.748221] Interruptibility = 0009  ActivityState = 
[47807.748248] *** Host State ***
[47807.748263] RIP = 0xc0c65024  RSP = 0x9252bda5fc90
[47807.748290] CS=0010 SS=0018 DS= ES= FS= GS= TR=0040
[47807.748318] FSBase=7f46d462a700 GSBase=9252ffac 
TRBase=9252ffac4000
[47807.748351] GDTBase=9252ffacc000 IDTBase=ff528000
[47807.748377] CR0=80050033 CR3=00105ac8c000 CR4=001627e0
[47807.748407] Sysenter RSP= CS:RIP=0010:8f196cc0
[47807.748435] EFER = 0x0d01  PAT = 0x0007050600070106
[47807.748461] *** Control State ***
[47807.748478] PinBased=003f CPUBased=b6a1edfa SecondaryExec=0ceb
[47807.748507] EntryControls=d1ff ExitControls=002fefff
[47807.748531] ExceptionBitmap=00060042 PFECmask= PFECmatch=
[47807.748561] VMEntry: intr_info= errcode=0006 ilen=
[47807.748589] VMExit: intr_info= errcode= ilen=0001
[47807.748618] reason=8021 qualification=
[47807.748645] IDTVectoring: info= errcode=
[47807.748669] TSC Offset = 0xf9b8c8d943b6
[47807.748699] TPR Threshold = 0x00
[47807.748715] EPT pointer = 0x00105cd5601e
[47807.748735] PLE Gap=0080 Window=1000
[47807.748755] Virtual processor ID = 0x0003

So something really went crazy. The VM is going down at least two times a day 
for the last 5 days.

At first I thought it would be an hardware issue, so I restarted the VM on 
other host, and the same thing happened.

About the VM it’s configured with 10 CPUs, 48GB of RAM running oVirt 4.3.10 
with iSCSI storage to a FreeNAS box, where the VM disks are running; there are 
a 300GB disc for C:\ and 2TB disk for D:\.

Any ideia on how to start troubleshooting it?

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/X34PTPXY5GLAULTQ2ZCB3PGZA2MON5KX/


[ovirt-users] Re: Multiple GPU Passthrough with NVLink (Invalid I/O region)

2020-09-04 Thread Vinícius Ferrão via Users
Thanks Michael and Arman.

To make things clear, you guys are using Passthrough, right? It’s not vGPU. The 
4x GPUs are added on the “Host Devices” tab of the VM.
What I’m trying to achieve is add the 4x V100 directly to one specific VM.

And finally can you guys confirm which BIOS type is being used in your 
machines? I’m with Q35 Chipset with UEFI BIOS. I haven’t tested it with legacy, 
perhaps I’ll give it a try.

Thanks again.

On 4 Sep 2020, at 14:09, Michael Jones 
mailto:m...@mikejonesey.co.uk>> wrote:

Also use multiple t4, also p4, titans, no issues but never used the nvlink

On Fri, 4 Sep 2020, 16:02 Arman Khalatyan, 
mailto:arm2...@gmail.com>> wrote:
hi,
with the 2xT4 we haven't seen any trouble. we have no nvlink there.

did u try to disable the nvlink?



Vinícius Ferrão via Users mailto:users@ovirt.org>> schrieb am 
Fr., 4. Sept. 2020, 08:39:
Hello, here we go again.

I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a single 
VM; but things aren’t that good. Only one GPU shows up on the VM. lspci is able 
to show the GPUs, but three of them are unusable:

08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)
09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)
0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)
0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)

There are some errors on dmesg, regarding a misconfigured BIOS:

[   27.295972] nvidia: loading out-of-tree module taints kernel.
[   27.295980] nvidia: module license 'NVIDIA' taints kernel.
[   27.295981] Disabling lock debugging due to kernel taint
[   27.304180] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
[   27.364244] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 241
[   27.579261] nvidia :09:00.0: enabling device ( -> 0002)
[   27.579560] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
   NVRM: BAR1 is 0M @ 0x0 (PCI::09:00.0)
[   27.579560] NVRM: The system BIOS may have misconfigured your GPU.
[   27.579566] nvidia: probe of :09:00.0 failed with error -1
[   27.580727] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
   NVRM: BAR0 is 0M @ 0x0 (PCI::0a:00.0)
[   27.580729] NVRM: The system BIOS may have misconfigured your GPU.
[   27.580734] nvidia: probe of :0a:00.0 failed with error -1
[   27.581299] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
   NVRM: BAR0 is 0M @ 0x0 (PCI::0b:00.0)
[   27.581300] NVRM: The system BIOS may have misconfigured your GPU.
[   27.581305] nvidia: probe of :0b:00.0 failed with error -1
[   27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s).
[   27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  450.51.06  Sun 
Jul 19 20:02:54 UTC 2020
[   27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for 
UNIX platforms  450.51.06  Sun Jul 19 20:06:42 UTC 2020

The host is Secure Intel Skylake (x86_64). VM is running with Q35 Chipset with 
UEFI (pc-q35-rhel8.2.0)

I’ve tried to change the I/O mapping options on the host, tried with 56TB and 
12TB without success. Same results. Didn’t tried with 512GB since the machine 
have 768GB of system RAM.

Tried blacklisting the nouveau on the host, nothing.
Installed NVIDIA drivers on the host, nothing.

In the host I can use the 4x V100, but inside a single VM it’s impossible.

Any suggestions?



___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73CXU27AX6ND6EXUJKBKKRWM6DJH7UL7/
___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PIO4DIVUU4JWG5FXYW3NQSVXCFZWYV26/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FY5J2VGAZXUOE3K5QJIS3ETXP76M3CHO/


[ovirt-users] Multiple GPU Passthrough with NVLink (Invalid I/O region)

2020-09-04 Thread Vinícius Ferrão via Users
Hello, here we go again.

I’m trying to passthrough 4x NVIDIA Tesla V100 GPUs (with NVLink) to a single 
VM; but things aren’t that good. Only one GPU shows up on the VM. lspci is able 
to show the GPUs, but three of them are unusable:

08:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)
09:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)
0a:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)
0b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB] (rev 
a1)

There are some errors on dmesg, regarding a misconfigured BIOS:

[   27.295972] nvidia: loading out-of-tree module taints kernel.
[   27.295980] nvidia: module license 'NVIDIA' taints kernel.
[   27.295981] Disabling lock debugging due to kernel taint
[   27.304180] nvidia: module verification failed: signature and/or required 
key missing - tainting kernel
[   27.364244] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 241
[   27.579261] nvidia :09:00.0: enabling device ( -> 0002)
[   27.579560] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
   NVRM: BAR1 is 0M @ 0x0 (PCI::09:00.0)
[   27.579560] NVRM: The system BIOS may have misconfigured your GPU.
[   27.579566] nvidia: probe of :09:00.0 failed with error -1
[   27.580727] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
   NVRM: BAR0 is 0M @ 0x0 (PCI::0a:00.0)
[   27.580729] NVRM: The system BIOS may have misconfigured your GPU.
[   27.580734] nvidia: probe of :0a:00.0 failed with error -1
[   27.581299] NVRM: This PCI I/O region assigned to your NVIDIA device is 
invalid:
   NVRM: BAR0 is 0M @ 0x0 (PCI::0b:00.0)
[   27.581300] NVRM: The system BIOS may have misconfigured your GPU.
[   27.581305] nvidia: probe of :0b:00.0 failed with error -1
[   27.581333] NVRM: The NVIDIA probe routine failed for 3 device(s).
[   27.581334] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  450.51.06  Sun 
Jul 19 20:02:54 UTC 2020
[   27.649128] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for 
UNIX platforms  450.51.06  Sun Jul 19 20:06:42 UTC 2020

The host is Secure Intel Skylake (x86_64). VM is running with Q35 Chipset with 
UEFI (pc-q35-rhel8.2.0)

I’ve tried to change the I/O mapping options on the host, tried with 56TB and 
12TB without success. Same results. Didn’t tried with 512GB since the machine 
have 768GB of system RAM.

Tried blacklisting the nouveau on the host, nothing.
Installed NVIDIA drivers on the host, nothing.

In the host I can use the 4x V100, but inside a single VM it’s impossible.

Any suggestions?



___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/73CXU27AX6ND6EXUJKBKKRWM6DJH7UL7/


[ovirt-users] Mellanox OFED with oVirt

2020-09-01 Thread Vinícius Ferrão via Users
Hello,

Anyone had success using Mellanox OFED with oVirt? Already learned some things:

1. I can’t use oVirt Node.
2. Mellanox OFED cannot be installed with mlnx-ofed-all since it breaks dnf. We 
need to rely on the upstream RDMA implementation.
3. The way to go is running: dnf install mlnx-ofed-dpdk-upstream-libs

But after the installation I ended up with broken dnf:

[root@c4140 ~]# dnf update
Updating Subscription Management repositories.
Last metadata expiration check: 0:03:54 ago on Tue 01 Sep 2020 11:52:41 PM -03.
Error: 
 Problem: both package mlnx-ofed-all-user-only-5.1-0.6.6.0.rhel8.2.noarch and 
mlnx-ofed-all-5.1-0.6.6.0.rhel8.2.noarch obsolete glusterfs-rdma
  - cannot install the best update candidate for package 
glusterfs-rdma-6.0-37.el8.x86_64
  - package ovirt-host-4.4.1-4.el8ev.x86_64 requires glusterfs-rdma, but none 
of the providers can be installed
  - package mlnx-ofed-all-5.1-0.6.6.0.rhel8.2.noarch obsoletes glusterfs-rdma 
provided by glusterfs-rdma-6.0-37.el8.x86_64
  - package glusterfs-rdma-3.12.2-40.2.el8.x86_64 requires glusterfs(x86-64) = 
3.12.2-40.2.el8, but none of the providers can be installed
  - package glusterfs-rdma-6.0-15.el8.x86_64 requires glusterfs(x86-64) = 
6.0-15.el8, but none of the providers can be installed
  - package glusterfs-rdma-6.0-20.el8.x86_64 requires glusterfs(x86-64) = 
6.0-20.el8, but none of the providers can be installed
  - cannot install both glusterfs-3.12.2-40.2.el8.x86_64 and 
glusterfs-6.0-37.el8.x86_64
  - cannot install both glusterfs-6.0-15.el8.x86_64 and 
glusterfs-6.0-37.el8.x86_64
  - cannot install both glusterfs-6.0-20.el8.x86_64 and 
glusterfs-6.0-37.el8.x86_64
  - cannot install the best update candidate for package 
ovirt-host-4.4.1-4.el8ev.x86_64
  - cannot install the best update candidate for package 
glusterfs-6.0-37.el8.x86_64
(try to add '--allowerasing' to command line to replace conflicting packages or 
'--skip-broken' to skip uninstallable packages or '--nobest' to use not only 
best candidate packages)

That are the packages installed:

[root@c4140 ~]# rpm -qa *mlnx*
mlnx-dpdk-19.11.0-1.51066.x86_64
mlnx-ofa_kernel-devel-5.1-OFED.5.1.0.6.6.1.rhel8u2.x86_64
mlnx-ethtool-5.4-1.51066.x86_64
mlnx-dpdk-devel-19.11.0-1.51066.x86_64
mlnx-ofa_kernel-5.1-OFED.5.1.0.6.6.1.rhel8u2.x86_64
mlnx-dpdk-doc-19.11.0-1.51066.noarch
mlnx-dpdk-tools-19.11.0-1.51066.x86_64
mlnx-ofed-dpdk-upstream-libs-5.1-0.6.6.0.rhel8.2.noarch
kmod-mlnx-ofa_kernel-5.1-OFED.5.1.0.6.6.1.rhel8u2.x86_64
mlnx-iproute2-5.6.0-1.51066.x86_64

And finally this is the repo that I’m using:
[root@c4140 ~]# cat /etc/yum.repos.d/mellanox_mlnx_ofed.repo 
#
# Mellanox Technologies Ltd. public repository configuration file.
# For more information, refer to http://linux.mellanox.com
#

[mlnx_ofed_latest_base]
name=Mellanox Technologies rhel8.2-$basearch mlnx_ofed latest
baseurl=http://linux.mellanox.com/public/repo/mlnx_ofed/latest/rhel8.2/$basearch
enabled=1
gpgkey=http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox
gpgcheck=1


So anyone had success with this?

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z2QBLGLN5NNUUCGHYM5HL4QDHIPZ6J72/


[ovirt-users] Re: Missing model_FLAGS on specific host

2020-08-28 Thread Vinícius Ferrão via Users
Hi just a followup here.

Something was messy with the DELL Firmware.

I’ve refreshed it and completely erased the NVRAM. Reconfigured the firmware 
and now the host behave as expected:

[root@c4140 ~]# !vdsm
vdsm-client Host getCapabilities | egrep "cpuFlags|cpuModel|Skylake"
"cpuFlags": 
"pln,msr,acpi,sse2,smx,rdrand,cqm_occup_llc,xsaveopt,rdseed,rtm,epb,sse,hypervisor,ibrs,cmov,nopl,cpuid_fault,pse,f16c,spec_ctrl,adx,constant_tsc,bts,rdt_a,pae,nx,tsc,x2apic,sep,pat,cqm_mbm_total,pebs,xsave,smep,ds_cpl,fma,ospke,mca,mmx,pge,pku,pcid,aperfmperf,ssse3,flexpriority,cqm,avx512dq,avx512vl,fpu,umip,flush_l1d,ssbd,lm,syscall,movbe,vpid,ht,xsavec,invpcid_single,3dnowprefetch,tsc_deadline_timer,cx8,rep_good,tm2,avx,cx16,rdtscp,ss,popcnt,lahf_lm,stibp,arch_perfmon,smap,clflushopt,invtsc,vmx,dts,xsaves,md_clear,dtherm,avx512f,bmi2,mpx,arch-capabilities,dtes64,avx512cd,mba,avx2,ept,pts,vme,vnmi,fxsr,pschange-mc-no,dca,avx512bw,tsc_adjust,cqm_llc,pclmulqdq,cat_l3,bmi1,monitor,pti,arat,abm,cpuid,clflush,mce,sse4_2,erms,nonstop_tsc,apic,cdp_l3,fsgsbase,sdbg,art,xgetbv1,tpr_shadow,cqm_mbm_local,clwb,pdpe1gb,xtpr,ida,de,pbe,intel_pt,ibpb,est,intel_ppin,tm,pni,aes,amd-ssbd,md-clear,skip-l1dfl-vmentry,hle,pdcm,invpcid,mtrr,pse36,sse4_1,xtopology,model_core2duo,model_pentium2,model_Skylake-Server-IBRS,model_Haswell,model_Skylake-Server,model_IvyBridge-IBRS,model_Penryn,model_Broadwell-noTSX-IBRS,model_qemu64,model_n270,model_kvm32,model_coreduo,model_Broadwell-IBRS,model_Skylake-Client-noTSX-IBRS,model_Haswell-IBRS,model_Broadwell-noTSX,model_Skylake-Client,model_SandyBridge,model_Skylake-Server-noTSX-IBRS,model_SandyBridge-IBRS,model_Broadwell,model_kvm64,model_Nehalem-IBRS,model_IvyBridge,model_pentium,model_Skylake-Client-IBRS,model_Conroe,model_Haswell-noTSX,model_Opteron_G2,model_Westmere,model_qemu32,model_486,model_pentium3,model_Opteron_G1,model_Westmere-IBRS,model_Haswell-noTSX-IBRS,model_Nehalem",
"cpuModel": "Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz",

Not sure what really happened, but those actions solved the issue.

On 28 Aug 2020, at 03:29, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hi,

I’ve an strange issue in one of my hosts, it’s missing a lot of CPU flags that 
oVirt seems to require:

[root@c4140 ~]# vdsm-client Host getCapabilities | egrep "cpuFlags|cpuModel"
"cpuFlags": 
"ssse3,mca,ept,pdpe1gb,vmx,clwb,smep,msr,acpi,pge,sse4_2,nopl,cqm_mbm_total,cx16,avx512vl,aperfmperf,xsaves,3dnowprefetch,nonstop_tsc,cmov,mce,intel_pt,avx512f,fpu,pku,tsc,sdbg,erms,pse36,md_clear,apic,sse,pcid,clflushopt,xtopology,pts,monitor,vpid,cpuid,hle,mba,ss,cqm,avx2,ibpb,xgetbv1,flush_l1d,mmx,epb,pti,fxsr,dca,nx,syscall,stibp,mtrr,cx8,sse2,avx,sep,intel_ppin,lm,tm,bts,adx,bmi1,smx,popcnt,pclmulqdq,lahf_lm,mpx,rdseed,cqm_llc,avx512cd,cdp_l3,f16c,invpcid,fsgsbase,cpuid_fault,tm2,smap,dts,pse,xsave,sse4_1,constant_tsc,pat,tsc_deadline_timer,vnmi,avx512dq,dtes64,xsaveopt,ida,pdcm,tpr_shadow,pln,de,x2apic,avx512bw,pae,rdrand,clflush,rdtscp,art,cqm_mbm_local,pebs,ssbd,movbe,pbe,tsc_adjust,vme,ht,est,bmi2,cat_l3,dtherm,ospke,rdt_a,aes,ibrs,rep_good,fma,xtpr,ds_cpl,abm,xsavec,invpcid_single,flexpriority,cqm_occup_llc,pni,rtm,arat,arch_perfmon",
"cpuModel": "Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz",

In a properly working host it have flags like those ones:

model_Westmere-IBRS,model_kvm32,model_core2duo,model_Opteron_G1,model_Broadwell,model_qemu64,model_Broadwell-noTSX,model_Nehalem-IBRS,model_Haswell-IBRS,model_pentium2,model_Broadwell-IBRS,model_Haswell-noTSX,model_Haswell,model_Haswell-noTSX-IBRS,model_Conroe,model_pentium,model_n270,model_Nehalem,model_IvyBridge-IBRS,model_kvm64,model_SandyBridge,model_pentium3,model_Broadwell-noTSX-IBRS,model_qemu32,model_486,model_IvyBridge,model_SandyBridge-IBRS,model_Westmere,model_Penryn,model_Opteron_G2,model_coreduo",

But in this machine it’s totally missing. I know this model_ flags are an oVirt 
thing, since they aren’t default on the CPUs.

The host machine is a Dell C4140 compute node, the firmware is fully updated, 
so I’ve done the basic to figure out what’s happening.

Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XKXBZLYRMMJGBFATFUOWLN2CBS6T75DM/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Z5DOWN4F4TYI6FMCOVYDEGT4GBZ27SSF/


[ovirt-users] Missing model_FLAGS on specific host

2020-08-28 Thread Vinícius Ferrão via Users
Hi,

I’ve an strange issue in one of my hosts, it’s missing a lot of CPU flags that 
oVirt seems to require:

[root@c4140 ~]# vdsm-client Host getCapabilities | egrep "cpuFlags|cpuModel"
"cpuFlags": 
"ssse3,mca,ept,pdpe1gb,vmx,clwb,smep,msr,acpi,pge,sse4_2,nopl,cqm_mbm_total,cx16,avx512vl,aperfmperf,xsaves,3dnowprefetch,nonstop_tsc,cmov,mce,intel_pt,avx512f,fpu,pku,tsc,sdbg,erms,pse36,md_clear,apic,sse,pcid,clflushopt,xtopology,pts,monitor,vpid,cpuid,hle,mba,ss,cqm,avx2,ibpb,xgetbv1,flush_l1d,mmx,epb,pti,fxsr,dca,nx,syscall,stibp,mtrr,cx8,sse2,avx,sep,intel_ppin,lm,tm,bts,adx,bmi1,smx,popcnt,pclmulqdq,lahf_lm,mpx,rdseed,cqm_llc,avx512cd,cdp_l3,f16c,invpcid,fsgsbase,cpuid_fault,tm2,smap,dts,pse,xsave,sse4_1,constant_tsc,pat,tsc_deadline_timer,vnmi,avx512dq,dtes64,xsaveopt,ida,pdcm,tpr_shadow,pln,de,x2apic,avx512bw,pae,rdrand,clflush,rdtscp,art,cqm_mbm_local,pebs,ssbd,movbe,pbe,tsc_adjust,vme,ht,est,bmi2,cat_l3,dtherm,ospke,rdt_a,aes,ibrs,rep_good,fma,xtpr,ds_cpl,abm,xsavec,invpcid_single,flexpriority,cqm_occup_llc,pni,rtm,arat,arch_perfmon",
"cpuModel": "Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz",

In a properly working host it have flags like those ones:

model_Westmere-IBRS,model_kvm32,model_core2duo,model_Opteron_G1,model_Broadwell,model_qemu64,model_Broadwell-noTSX,model_Nehalem-IBRS,model_Haswell-IBRS,model_pentium2,model_Broadwell-IBRS,model_Haswell-noTSX,model_Haswell,model_Haswell-noTSX-IBRS,model_Conroe,model_pentium,model_n270,model_Nehalem,model_IvyBridge-IBRS,model_kvm64,model_SandyBridge,model_pentium3,model_Broadwell-noTSX-IBRS,model_qemu32,model_486,model_IvyBridge,model_SandyBridge-IBRS,model_Westmere,model_Penryn,model_Opteron_G2,model_coreduo",

But in this machine it’s totally missing. I know this model_ flags are an oVirt 
thing, since they aren’t default on the CPUs.

The host machine is a Dell C4140 compute node, the firmware is fully updated, 
so I’ve done the basic to figure out what’s happening.

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/XKXBZLYRMMJGBFATFUOWLN2CBS6T75DM/


[ovirt-users] Re: POWER9 (ppc64le) Support on oVirt 4.4.1

2020-08-27 Thread Vinícius Ferrão via Users


On 27 Aug 2020, at 16:03, Arik Hadas 
mailto:aha...@redhat.com>> wrote:



On Thu, Aug 27, 2020 at 8:40 PM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hi Michal,

On 27 Aug 2020, at 05:08, Michal Skrivanek 
mailto:michal.skriva...@redhat.com>> wrote:



On 26 Aug 2020, at 20:50, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Okay here we go Arik.

With your insight I’ve done the following:

# rpm -Va

This showed what’s zeroed on the machine, since it was a lot of things, I’ve 
just gone crazy and done:

you should still have host deploy logs on the engine machine. it’s weird it 
succeeded, unless it somehow happened afterwards?

It only succeeded my yum reinstall rampage.

yum list installed | cut -f 1 -d " " > file
yum -y reinstall `cat file | xargs`

Reinstalled everything.

Everything worked as expected and I finally added the machine back to the 
cluster. It’s operational.

eh, I wouldn’t trust it much. did you run redeploy at least?

I’ve done reinstall on the web interface of the engine. I can reinstall the 
host, there’s nothing running on it… gonna try a third format.



Now I’ve another issue, I have 3 VM’s that are ppc64le, when trying to import 
them, the Hosted Engine identifies them as x86_64:



So…

This appears to be a bug. Any ideia on how to force it back to ppc64? I can’t 
manually force the import on the Hosted Engine since there’s no buttons to do 
this…

how exactly did you import them? could be a bug indeed.
we don’t support changing it as it doesn’t make sense, the guest can’t be 
converted

Yeah. I done the normal procedure, added the storage domain to the engine and 
clicked on “Import VM”. Immediately it was detected as x86_64.

Since I wasn’t able to upgrade my environment from 4.3.10 to 4.4.1 due to 
random errors when redeploying the engine with the backup from 4.3.10, I just 
reinstalled it, reconfigured everything and them imported the storage domains.

I don’t know where the information about architecture is stored in the storage 
domain, I tried to search for some metadata files inside the domain but nothing 
come up. Is there a way to force this change? It must be a way.

I even tried to import the machine as x86_64. So I can delete the VM and just 
reattach the disks in a new only, effectively not losing the data, but…



Yeah, so something is broken. The check during the import appears to be OK, but 
the interface does not me allow to import it to the ppc64le machine, since it’s 
read as x86_64.

Could you please provide the output of the following query from the database:
select * from unregistered_ovf_of_entities where 
entity_name='energy.versatushpc.com.br<http://energy.versatushpc.com.br/>';

Sure, there you go:

 46ad1d80-2649-48f5-92e6-e5489d11d30c | 
energy.versatushpc.com.br<http://energy.versatushpc.com.br> | VM  | 
   1 | | d19456e4-0051-456e-b33c-57348a78c2e0 |
 http://schemas.dmtf.org/ovf/envelope/1/; 
xmlns:rasd="http://schemas.dmtf.org/wbem/wscim/1/cim
-schema/2/CIM_ResourceAllocationSettingData" 
xmlns:vssd="http://schemas.dmtf.org/wbem/wscim/1/cim-schema/2/CIM_VirtualSystemSettingData;
 xmlns:xsi="http://ww
w.w3.org/2001/XMLSchema-instance<http://w.w3.org/2001/XMLSchema-instance>" 
ovf:version="4.1.0.0">List of networks
List of Virtual Diskshttp://www.vmwa
re.com/specifications/vmdk.html#sparse<http://re.com/specifications/vmdk.html#sparse>"
 ovf:volume-format="RAW" ovf:volume-type="Sparse" 
ovf:disk-interface="VirtIO_SCSI" ovf:read-only="false" ovf:shareable
="false" ovf:boot="true" ovf:pass-discard="false" 
ovf:disk-alias="energy.versatushpc.com.br_Disk1" ovf:disk-description="" 
ovf:wipe-after-delete="false">energy.versatushpc.com.br<http://energy.versatushpc.com.br>Holds
 Kosen backend and frontend prod
 services (nginx + 
docker)2020/08/19 
20:11:332020/08/20 18:37:41falseguest_agentfalse1Etc/GMT984.31AUTO_RESUME2730falsefalsefalse16ea16f22-45d7-11ea-bd83-00163e518b7c0falsetruetruetrueLOCK_SCREEN016384truefalseBlastoise----Blanktrue032644894-755e-4588-b967-8fb9dc3277952false000
0----Blankfalse2020/08/20 17:52:35Guest Operating 
Systemother_linux_ppc642 CPU, 4096 MemoryENGINE 
4.1.0.02 virtual 
cpuNumber of virtual 
CPU132111624096 
>MB of memoryMemory Size24MegaBytes4096energy.versatushpc.com.br_Disk1b1d9832e-076f-48f3-a300-0b5cdf0949af17775b24a9-6a32-431a-831f-4ac9b3b31152/b1d9832e-076f-48f3-a300-0b5cdf0949af--------d19456e4-0051-456e-b33c-57348a78c2e06c54f91e-89bf-45b4-bc48-56e74c4efd5e2020/08/19 
20:13:051970/01/01 
00:00:002020/08
/20 
18:37:41diskdisk{type=drive,
 bus=0, controller=1, target=0, unit=0}<
BootOrder>1truefalseua

[ovirt-users] Re: POWER9 (ppc64le) Support on oVirt 4.4.1

2020-08-27 Thread Vinícius Ferrão via Users
Hi Michal,

On 27 Aug 2020, at 05:08, Michal Skrivanek 
mailto:michal.skriva...@redhat.com>> wrote:



On 26 Aug 2020, at 20:50, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Okay here we go Arik.

With your insight I’ve done the following:

# rpm -Va

This showed what’s zeroed on the machine, since it was a lot of things, I’ve 
just gone crazy and done:

you should still have host deploy logs on the engine machine. it’s weird it 
succeeded, unless it somehow happened afterwards?

It only succeeded my yum reinstall rampage.

yum list installed | cut -f 1 -d " " > file
yum -y reinstall `cat file | xargs`

Reinstalled everything.

Everything worked as expected and I finally added the machine back to the 
cluster. It’s operational.

eh, I wouldn’t trust it much. did you run redeploy at least?

I’ve done reinstall on the web interface of the engine. I can reinstall the 
host, there’s nothing running on it… gonna try a third format.



Now I’ve another issue, I have 3 VM’s that are ppc64le, when trying to import 
them, the Hosted Engine identifies them as x86_64:



So…

This appears to be a bug. Any ideia on how to force it back to ppc64? I can’t 
manually force the import on the Hosted Engine since there’s no buttons to do 
this…

how exactly did you import them? could be a bug indeed.
we don’t support changing it as it doesn’t make sense, the guest can’t be 
converted

Yeah. I done the normal procedure, added the storage domain to the engine and 
clicked on “Import VM”. Immediately it was detected as x86_64.

Since I wasn’t able to upgrade my environment from 4.3.10 to 4.4.1 due to 
random errors when redeploying the engine with the backup from 4.3.10, I just 
reinstalled it, reconfigured everything and them imported the storage domains.

I don’t know where the information about architecture is stored in the storage 
domain, I tried to search for some metadata files inside the domain but nothing 
come up. Is there a way to force this change? It must be a way.

I even tried to import the machine as x86_64. So I can delete the VM and just 
reattach the disks in a new only, effectively not losing the data, but…

[cid:254FDE4F-5CBC-472A-9C89-0B728E4B0894]

Yeah, so something is broken. The check during the import appears to be OK, but 
the interface does not me allow to import it to the ppc64le machine, since it’s 
read as x86_64.


Thanks,
michal


Ideias?

On 26 Aug 2020, at 15:04, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

What a strange thing is happening here:

[root@power ~]# file /usr/bin/vdsm-client
/usr/bin/vdsm-client: empty
[root@power ~]# ls -l /usr/bin/vdsm-client
-rwxr-xr-x. 1 root root 0 Jul  3 06:23 /usr/bin/vdsm-client

A lot of files are just empty, I’ve tried reinstalling vdsm-client, it worked, 
but there’s other zeroed files:

Transaction test succeeded.
Running transaction
  Preparing:
 1/1
  Reinstalling : vdsm-client-4.40.22-1.el8ev.noarch 
 1/2
  Cleanup  : vdsm-client-4.40.22-1.el8ev.noarch 
 2/2
  Running scriptlet: vdsm-client-4.40.22-1.el8ev.noarch 
 2/2
/sbin/ldconfig: File /lib64/libkadm5clnt.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so.11 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so.11.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so.11 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so.11.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libsensors.so.4 is empty, not checked.
/sbin/ldconfig: File /lib64/libsensors.so.4.4.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-admin.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-admin.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-lxc.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-lxc.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-qemu.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-qemu.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libisns.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libiscsi.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libopeniscsiusr.so.0 is emp

[ovirt-users] Re: POWER9 (ppc64le) Support on oVirt 4.4.1

2020-08-26 Thread Vinícius Ferrão via Users
sm-client-4.40.22-1.el8ev.noarch 
 1/2
  Verifying: vdsm-client-4.40.22-1.el8ev.noarch 
 2/2
Installed products updated.

Reinstalled:
  vdsm-client-4.40.22-1.el8ev.noarch



I’ve never seen something like this.

I’ve already reinstalled the host from the ground and the same thing happens.


On 26 Aug 2020, at 14:28, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hello Arik,
This is probably the issue. Output totally empty:

[root@power ~]# vdsm-client Host getCapabilities
[root@power ~]#

Here are the packages installed on the machine: (grepped ovirt and vdsm on rpm 
-qa)
ovirt-imageio-daemon-2.0.8-1.el8ev.ppc64le
ovirt-imageio-client-2.0.8-1.el8ev.ppc64le
ovirt-host-4.4.1-4.el8ev.ppc64le
ovirt-vmconsole-host-1.0.8-1.el8ev.noarch
ovirt-host-dependencies-4.4.1-4.el8ev.ppc64le
ovirt-imageio-common-2.0.8-1.el8ev.ppc64le
ovirt-vmconsole-1.0.8-1.el8ev.noarch
vdsm-hook-vmfex-dev-4.40.22-1.el8ev.noarch
vdsm-hook-fcoe-4.40.22-1.el8ev.noarch
vdsm-hook-ethtool-options-4.40.22-1.el8ev.noarch
vdsm-hook-openstacknet-4.40.22-1.el8ev.noarch
vdsm-common-4.40.22-1.el8ev.noarch
vdsm-python-4.40.22-1.el8ev.noarch
vdsm-jsonrpc-4.40.22-1.el8ev.noarch
vdsm-api-4.40.22-1.el8ev.noarch
vdsm-yajsonrpc-4.40.22-1.el8ev.noarch
vdsm-4.40.22-1.el8ev.ppc64le
vdsm-network-4.40.22-1.el8ev.ppc64le
vdsm-http-4.40.22-1.el8ev.noarch
vdsm-client-4.40.22-1.el8ev.noarch
vdsm-hook-vhostmd-4.40.22-1.el8ev.noarch

Any ideias to try?

Thanks.

On 26 Aug 2020, at 05:09, Arik Hadas 
mailto:aha...@redhat.com>> wrote:



On Mon, Aug 24, 2020 at 1:30 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello, I was using oVirt 4.3.10 with IBM AC922 (POWER9 / ppc64le) without any 
issues.

Since I’ve moved to 4.4.1 I can’t add the AC922 machine to the engine anymore, 
it complains with the following error:
The host CPU does not match the Cluster CPU type and is running in degraded 
mode. It is missing the following CPU flags: model_POWER9, powernv.

Any ideia of what’s may be happening? The engine runs on x86_64, and I was 
using this way on 4.3.10.

Machine info:
timebase: 51200
platform: PowerNV
model   : 8335-GTH
machine : PowerNV 8335-GTH
firmware: OPAL
MMU : Radix

Can you please provide the output of 'vdsm-client Host getCapabilities' on that 
host?


Thanks,


___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RV6FHRGKGPPZHVR36WKUHBFDMCQHEJHP/

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DFMIR7764V6P4U3DIMDKP6I2RNNNA3T/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/MLSRBXRNNBPHFVGYHVPTDHDMUSUN7YZS/


[ovirt-users] Re: POWER9 (ppc64le) Support on oVirt 4.4.1

2020-08-26 Thread Vinícius Ferrão via Users
What a strange thing is happening here:

[root@power ~]# file /usr/bin/vdsm-client
/usr/bin/vdsm-client: empty
[root@power ~]# ls -l /usr/bin/vdsm-client
-rwxr-xr-x. 1 root root 0 Jul  3 06:23 /usr/bin/vdsm-client

A lot of files are just empty, I’ve tried reinstalling vdsm-client, it worked, 
but there’s other zeroed files:

Transaction test succeeded.
Running transaction
  Preparing:
 1/1
  Reinstalling : vdsm-client-4.40.22-1.el8ev.noarch 
 1/2
  Cleanup  : vdsm-client-4.40.22-1.el8ev.noarch 
 2/2
  Running scriptlet: vdsm-client-4.40.22-1.el8ev.noarch 
 2/2
/sbin/ldconfig: File /lib64/libkadm5clnt.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so.11 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so.11.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so.11 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so.11.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libsensors.so.4 is empty, not checked.
/sbin/ldconfig: File /lib64/libsensors.so.4.4.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-admin.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-admin.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-lxc.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-lxc.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-qemu.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-qemu.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libisns.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libiscsi.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libopeniscsiusr.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libopeniscsiusr.so.0.2.0 is empty, not checked.

/sbin/ldconfig: File /lib64/libkadm5clnt.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so.11 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5clnt_mit.so.11.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so.11 is empty, not checked.
/sbin/ldconfig: File /lib64/libkadm5srv_mit.so.11.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libsensors.so.4 is empty, not checked.
/sbin/ldconfig: File /lib64/libsensors.so.4.4.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-admin.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-admin.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-lxc.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-lxc.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-qemu.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt-qemu.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libvirt.so.0.6000.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libisns.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libiscsi.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libopeniscsiusr.so.0 is empty, not checked.
/sbin/ldconfig: File /lib64/libopeniscsiusr.so.0.2.0 is empty, not checked.

  Verifying: vdsm-client-4.40.22-1.el8ev.noarch 
 1/2
  Verifying: vdsm-client-4.40.22-1.el8ev.noarch 
 2/2
Installed products updated.

Reinstalled:
  vdsm-client-4.40.22-1.el8ev.noarch



I’ve never seen something like this.

I’ve already reinstalled the host from the ground and the same thing happens.


On 26 Aug 2020, at 14:28, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hello Arik,
This is probably the issue. Output totally empty:

[root@power ~]# vdsm-client Host getCapabilities
[root@power ~]#

Here are the packages installed on the machine: (grepped ovirt and vdsm on rpm 
-qa)
ovirt-imageio-daemon-2.0.8-1.el8ev.ppc64le
ovirt-i

[ovirt-users] Re: POWER9 (ppc64le) Support on oVirt 4.4.1

2020-08-26 Thread Vinícius Ferrão via Users
Hello Arik,
This is probably the issue. Output totally empty:

[root@power ~]# vdsm-client Host getCapabilities
[root@power ~]#

Here are the packages installed on the machine: (grepped ovirt and vdsm on rpm 
-qa)
ovirt-imageio-daemon-2.0.8-1.el8ev.ppc64le
ovirt-imageio-client-2.0.8-1.el8ev.ppc64le
ovirt-host-4.4.1-4.el8ev.ppc64le
ovirt-vmconsole-host-1.0.8-1.el8ev.noarch
ovirt-host-dependencies-4.4.1-4.el8ev.ppc64le
ovirt-imageio-common-2.0.8-1.el8ev.ppc64le
ovirt-vmconsole-1.0.8-1.el8ev.noarch
vdsm-hook-vmfex-dev-4.40.22-1.el8ev.noarch
vdsm-hook-fcoe-4.40.22-1.el8ev.noarch
vdsm-hook-ethtool-options-4.40.22-1.el8ev.noarch
vdsm-hook-openstacknet-4.40.22-1.el8ev.noarch
vdsm-common-4.40.22-1.el8ev.noarch
vdsm-python-4.40.22-1.el8ev.noarch
vdsm-jsonrpc-4.40.22-1.el8ev.noarch
vdsm-api-4.40.22-1.el8ev.noarch
vdsm-yajsonrpc-4.40.22-1.el8ev.noarch
vdsm-4.40.22-1.el8ev.ppc64le
vdsm-network-4.40.22-1.el8ev.ppc64le
vdsm-http-4.40.22-1.el8ev.noarch
vdsm-client-4.40.22-1.el8ev.noarch
vdsm-hook-vhostmd-4.40.22-1.el8ev.noarch

Any ideias to try?

Thanks.

On 26 Aug 2020, at 05:09, Arik Hadas 
mailto:aha...@redhat.com>> wrote:



On Mon, Aug 24, 2020 at 1:30 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello, I was using oVirt 4.3.10 with IBM AC922 (POWER9 / ppc64le) without any 
issues.

Since I’ve moved to 4.4.1 I can’t add the AC922 machine to the engine anymore, 
it complains with the following error:
The host CPU does not match the Cluster CPU type and is running in degraded 
mode. It is missing the following CPU flags: model_POWER9, powernv.

Any ideia of what’s may be happening? The engine runs on x86_64, and I was 
using this way on 4.3.10.

Machine info:
timebase: 51200
platform: PowerNV
model   : 8335-GTH
machine : PowerNV 8335-GTH
firmware: OPAL
MMU : Radix

Can you please provide the output of 'vdsm-client Host getCapabilities' on that 
host?


Thanks,


___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RV6FHRGKGPPZHVR36WKUHBFDMCQHEJHP/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3DFMIR7764V6P4U3DIMDKP6I2RNNNA3T/


[ovirt-users] POWER9 (ppc64le) Support on oVirt 4.4.1

2020-08-23 Thread Vinícius Ferrão via Users
Hello, I was using oVirt 4.3.10 with IBM AC922 (POWER9 / ppc64le) without any 
issues.

Since I’ve moved to 4.4.1 I can’t add the AC922 machine to the engine anymore, 
it complains with the following error:
The host CPU does not match the Cluster CPU type and is running in degraded 
mode. It is missing the following CPU flags: model_POWER9, powernv.

Any ideia of what’s may be happening? The engine runs on x86_64, and I was 
using this way on 4.3.10.

Machine info:
timebase: 51200
platform: PowerNV
model   : 8335-GTH
machine : PowerNV 8335-GTH
firmware: OPAL
MMU : Radix

Thanks,


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/RV6FHRGKGPPZHVR36WKUHBFDMCQHEJHP/


[ovirt-users] Hosted Engine stuck in Firmware

2020-08-22 Thread Vinícius Ferrão via Users
Hello, I’ve an strange issue with oVirt 4.4.1

The hosted engine is stuck in the UEFI firmware trying to “never” boot.

I think this happened when I changed the default VM mode for the cluster inside 
the datacenter.

There’s a way to fix this without redeploying the engine?

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/GBFY2F4FNVZ25TLCR6IZ5YP32PUQLDLI/


[ovirt-users] Re: Support for Shared SAS storage

2020-08-07 Thread Vinícius Ferrão via Users
Really??

Treat it as FC??

Thats new for me.


On 8 Aug 2020, at 00:35, Jeff Bailey 
mailto:bai...@cs.kent.edu>> wrote:


I haven't tried with 4.4 but shared SAS works just fine with 4.3 (and has for 
many, many years).  You simply treat it as Fibre Channel.  If your LUNs aren't 
showing up I'd make sure they're being claimed as multipath devices.  You want 
them to be.  After that, just make sure they're sufficiently wiped so they 
don't look like they're in use.


On 8/7/2020 10:49 PM, Lao Dh via Users wrote:
Wow. That's sound bad. Then what storage type you choose at last (with your SAS 
connected storage)? VMware vSphere support DAS. Red Hat should do something.

2020年8月8日土曜日 4:06:34 GMT+8、Vinícius Ferrão via Users 
<mailto:users@ovirt.org>が書いたメール:


No, there’s no support for direct attached shared SAS storage on oVirt/RHV.

Fibre Channel is a different thing that oVirt/RHV supports.

> On 7 Aug 2020, at 08:52, hkexdong--- via Users 
> mailto:users@ovirt.org>> wrote:
>
> Hello Vinícius,
> Do you able to connect the SAS external storage?
> Now I've the problem during host engine setup. Select Fibre Channel and end 
> up show "No LUNS found".
> ___
> Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
> To unsubscribe send an email to 
> users-le...@ovirt.org<mailto:users-le...@ovirt.org>
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/RDPLKGIRN5ZGIEPWGOKMGNFZNMCEN5RC/


___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives:
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2CLI3YSYU7BPI62YANJXZV7RIQFOXXED/




___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/WOBHQDCBZZK5WKRAUNHP5CGFYY3HQYYU/


___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/UY52JRY5EMQJTMKG3POE2YXSFGL7P55S/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/275GTMI6XQKR52U32WPESHA4BPSAHWZF/


[ovirt-users] Re: Support for Shared SAS storage

2020-08-07 Thread Vinícius Ferrão via Users
No, there’s no support for direct attached shared SAS storage on oVirt/RHV.

Fibre Channel is a different thing that oVirt/RHV supports.

> On 7 Aug 2020, at 08:52, hkexdong--- via Users  wrote:
> 
> Hello Vinícius,
> Do you able to connect the SAS external storage?
> Now I've the problem during host engine setup. Select Fibre Channel and end 
> up show "No LUNS found".
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/RDPLKGIRN5ZGIEPWGOKMGNFZNMCEN5RC/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/2CLI3YSYU7BPI62YANJXZV7RIQFOXXED/


[ovirt-users] Re: iSCSI multipath with separate subnets... still not possible in 4.4.x?

2020-07-18 Thread Vinícius Ferrão via Users
I second that, I’ve tirelessly talked about this and just given up, it’s a 
basic feature that keeps oVirt lagging behind.

> On 18 Jul 2020, at 04:47, Uwe Laverenz  wrote:
> 
> Hi Mark,
> 
> Am 14.07.20 um 02:14 schrieb Mark R:
> 
>> I'm looking through quite a few bug reports and mailing list threads,
>> but want to make sure I'm not missing some recent development.  It
>> appears that doing iSCSI with two separate, non-routed subnets is
>> still not possible with 4.4.x. I have the dead-standard iSCSI setup
>> with two separate switches, separate interfaces on hosts and storage,
>> and separate subnets that have no gateway and are completely
>> unreachable except from directly attached interfaces.
> 
> I haven't tested 4.4 yet but AFAIK nothing has changed, OVirt iSCSI bonds 
> don't work with separated, isolated subnets:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1474904
> 
> I don't use them as multipathing generally works without OVirt bonds in my 
> setup, I configured multipathd directly to use round robin e.g..
> 
> cu,
> Uwe
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/EXRSANPZHZ2JE2DKRB6KBMYVVMDSGSJV/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZOAM3K7EU4UXMIZSPMHJHR72YVWYSJFK/


[ovirt-users] Re: New fenceType in oVirt code for IBM OpenBMC

2020-07-07 Thread Vinícius Ferrão via Users
@Martin if needed I can raise a RFE for this. Just point me where to do, and I 
will do it.

Thank you.

On 1 Jul 2020, at 03:33, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hi Martin,

On 1 Jul 2020, at 03:26, Martin Perina 
mailto:mper...@redhat.com>> wrote:



On Wed, Jul 1, 2020 at 1:57 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello,

After some days scratching my head I found that oVirt is probably missing 
fenceTypes for IBM’s implementation of OpenBMC in the Power Management section. 
The host machine is an OpenPOWER AC922 (ppc64le).

The BMC basically is an “ipmilan” device but the ciphers must be defined as 3 
or 17 by default:

[root@h01 ~]# ipmitool -I lanplus -H 10.20.10.2 root -P 0penBmc -L operator -C 
3 channel getciphers ipmi
ID   IANAAuth AlgIntegrity Alg   Confidentiality Alg
3N/A hmac_sha1   hmac_sha1_96aes_cbc_128
17   N/A hmac_sha256 sha256_128  aes_cbc_128

The default ipmilan connector forces the option cipher=1 which breaks the 
communication.

Hi,

have you tried to overwrite the default by adding cipher=3 into Options field 
when adding/updating fence agent configuration for specific host?

Eli, looking at 
https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ipmi-second-gen-interface-spec-v2-rev1-1.pdf
 I'm not sure our defaults make sense, because by default we enable IPMIv2 
(lanplus=1), but we set IPMIv1 cipher support (cipher=1). Or am I missing 
something?

Yes I’m running this way right now: ipmilan with cipher=17 on options.

But to figure it out I took almost a month. Really. I’ve sent a message on 5 
June to the list: Power Management on IBM AC922 Power9 (ppc64le); and I was 
trying to solve it since them.

This was mainly due to poor documentation. I only figured it out when I done a 
lot of searches on Github to read the oVirt code. So the cipher=1 thing show 
up, and I guessed that it may be it. And it was…

I know that no one cares for ppc64le haha. But I think a change on the list of 
supported fenceTypes will save some people the time I’ve lost with this. If 
there’s something like “openbmc"  would be great.

Or at least a better explanation on the Power Management configure box. Not 
even the options are explained correctly guessing lanplus=1 was hard. I tried a 
lot of combinations like:
I=lanplus
-I lanplus
-I=lanplus

Thanks,


Regards,
Martin

So I was reading the code and found this “fenceType” class, but I wasn't able 
to found where to define those classes. So I can create another one called 
something like openbmc to set cipher=17 by default.

Another question is how bad the output is, it only returns a JSON-RPC generic 
error. But I don’t know how to suggest a fix for this.

Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BP33DZ3AET53DGS7TAD6L765WKQIOW7B/


--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H2YXAPKSR4BAONK7JMRPT4B3WEMMFTWR/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/J3NMDDGEG4FE2PM66SVIRXULARBODWZU/


[ovirt-users] Re: New fenceType in oVirt code for IBM OpenBMC

2020-07-01 Thread Vinícius Ferrão via Users
Hi Martin,

On 1 Jul 2020, at 03:26, Martin Perina 
mailto:mper...@redhat.com>> wrote:



On Wed, Jul 1, 2020 at 1:57 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello,

After some days scratching my head I found that oVirt is probably missing 
fenceTypes for IBM’s implementation of OpenBMC in the Power Management section. 
The host machine is an OpenPOWER AC922 (ppc64le).

The BMC basically is an “ipmilan” device but the ciphers must be defined as 3 
or 17 by default:

[root@h01 ~]# ipmitool -I lanplus -H 10.20.10.2 root -P 0penBmc -L operator -C 
3 channel getciphers ipmi
ID   IANAAuth AlgIntegrity Alg   Confidentiality Alg
3N/A hmac_sha1   hmac_sha1_96aes_cbc_128
17   N/A hmac_sha256 sha256_128  aes_cbc_128

The default ipmilan connector forces the option cipher=1 which breaks the 
communication.

Hi,

have you tried to overwrite the default by adding cipher=3 into Options field 
when adding/updating fence agent configuration for specific host?

Eli, looking at 
https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/ipmi-second-gen-interface-spec-v2-rev1-1.pdf
 I'm not sure our defaults make sense, because by default we enable IPMIv2 
(lanplus=1), but we set IPMIv1 cipher support (cipher=1). Or am I missing 
something?

Yes I’m running this way right now: ipmilan with cipher=17 on options.

But to figure it out I took almost a month. Really. I’ve sent a message on 5 
June to the list: Power Management on IBM AC922 Power9 (ppc64le); and I was 
trying to solve it since them.

This was mainly due to poor documentation. I only figured it out when I done a 
lot of searches on Github to read the oVirt code. So the cipher=1 thing show 
up, and I guessed that it may be it. And it was…

I know that no one cares for ppc64le haha. But I think a change on the list of 
supported fenceTypes will save some people the time I’ve lost with this. If 
there’s something like “openbmc"  would be great.

Or at least a better explanation on the Power Management configure box. Not 
even the options are explained correctly guessing lanplus=1 was hard. I tried a 
lot of combinations like:
I=lanplus
-I lanplus
-I=lanplus

Thanks,


Regards,
Martin

So I was reading the code and found this “fenceType” class, but I wasn't able 
to found where to define those classes. So I can create another one called 
something like openbmc to set cipher=17 by default.

Another question is how bad the output is, it only returns a JSON-RPC generic 
error. But I don’t know how to suggest a fix for this.

Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BP33DZ3AET53DGS7TAD6L765WKQIOW7B/


--
Martin Perina
Manager, Software Engineering
Red Hat Czech s.r.o.

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/H2YXAPKSR4BAONK7JMRPT4B3WEMMFTWR/


[ovirt-users] New fenceType in oVirt code for IBM OpenBMC

2020-06-30 Thread Vinícius Ferrão via Users
Hello,

After some days scratching my head I found that oVirt is probably missing 
fenceTypes for IBM’s implementation of OpenBMC in the Power Management section. 
The host machine is an OpenPOWER AC922 (ppc64le).

The BMC basically is an “ipmilan” device but the ciphers must be defined as 3 
or 17 by default:

[root@h01 ~]# ipmitool -I lanplus -H 10.20.10.2 root -P 0penBmc -L operator -C 
3 channel getciphers ipmi
ID   IANAAuth AlgIntegrity Alg   Confidentiality Alg
3N/A hmac_sha1   hmac_sha1_96aes_cbc_128
17   N/A hmac_sha256 sha256_128  aes_cbc_128 

The default ipmilan connector forces the option cipher=1 which breaks the 
communication.

So I was reading the code and found this “fenceType” class, but I wasn't able 
to found where to define those classes. So I can create another one called 
something like openbmc to set cipher=17 by default.

Another question is how bad the output is, it only returns a JSON-RPC generic 
error. But I don’t know how to suggest a fix for this.

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BP33DZ3AET53DGS7TAD6L765WKQIOW7B/


[ovirt-users] Re: Clean old mount points in hosts VDSM

2020-06-25 Thread Vinícius Ferrão via Users
Strahil, thank you.

Reinstalling the host solved the issue.

> On 25 Jun 2020, at 15:48, Vinícius Ferrão via Users  wrote:
> 
> I think yes. But I’m not sure. 
> 
> I can do it again, there’s an update so I’ll do both and report back.
> 
> Thank you Strahil.
> 
>> On 25 Jun 2020, at 00:37, Strahil Nikolov  wrote:
>> 
>> Did you reinstall the node via the WEB UI ?
>> 
>> Best Regards,
>> Strahil  Nikolov
>> 
>> На 25 юни 2020 г. 3:23:15 GMT+03:00, "Vinícius Ferrão via Users" 
>>  написа:
>>> Hello,
>>> 
>>> For reasons unknown one of my hosts is trying to mount an old storage
>>> point that’s been removed some time ago.
>>> 
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:35,958-0300 INFO 
>>> (tmap-65016/0) [IOProcessClient] (/192.168.10.6:_mnt_pool0_ovirt_he)
>>> Starting client (__init__:308)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:35,968-0300 INFO 
>>> (ioprocess/12115) [IOProcess] (/192.168.10.6:_mnt_pool0_ovirt_he)
>>> Starting ioprocess (__init__:434)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,167-0300 INFO  (jsonrpc/6)
>>> [vdsm.api] START connectStorageServer(domType=1,
>>> spUUID=u'----',
>>> conList=[{u'protocol_version': u'auto', u'connection':
>>> u'192.168.10.6:/mnt/pool0/ovirt/he', u'user': u'kvm', u'id':
>>> u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}], options=None)
>>> from=::1,59090, task_id=5ea81925-ec92-4031-aa36-bb6f436321d5 (api:48)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6)
>>> [storage.StorageServer.MountConnection] Creating directory
>>> u'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he'
>>> (storageServer:168)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6)
>>> [storage.fileUtils] Creating directory:
>>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0ovirt_he mode: None
>>> (fileUtils:199)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6)
>>> [storage.Mount] mounting 192.168.10.6:/mnt/pool0/ovirt/he at
>>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he (mount:204)
>>> /var/log/vdsm/vdsm.log:MountError: (32, ';mount.nfs: mounting
>>> 192.168.10.6:/mnt/pool0/ovirt/he failed, reason given by server: No
>>> such file or directory\n')
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,683-0300 INFO  (jsonrpc/5)
>>> [vdsm.api] START connectStorageServer(domType=1,
>>> spUUID=u'----',
>>> conList=[{u'protocol_version': u'auto', u'connection':
>>> u'192.168.10.6:/mnt/pool0/ovirt/he', u'user': u'kvm', u'id':
>>> u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}], options=None)
>>> from=::1,59094, task_id=9ce61858-dea1-4059-b942-a52c8c82afdc (api:48)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5)
>>> [storage.StorageServer.MountConnection] Creating directory
>>> u'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he'
>>> (storageServer:168)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5)
>>> [storage.fileUtils] Creating directory:
>>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0ovirt_he mode: None
>>> (fileUtils:199)
>>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5)
>>> [storage.Mount] mounting 192.168.10.6:/mnt/pool0/ovirt/he at
>>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he (mount:204)
>>> /var/log/vdsm/vdsm.log:MountError: (32, ';mount.nfs: mounting
>>> 192.168.10.6:/mnt/pool0/ovirt/he failed, reason given by server: No
>>> such file or directory\n’)
>>> 
>>> This only happens in one host and it’s spamming /var/log/vdsm/vdsm.log.
>>> 
>>> Any ideia on how to debug this and remove the entry?
>>> 
>>> Thanks,
>>> 
>>> ___
>>> Users mailing list -- users@ovirt.org
>>> To unsubscribe send an email to users-le...@ovirt.org
>>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>>> oVirt Code of Conduct:
>>> https://www.ovirt.org/community/about/community-guidelines/
>>> List Archives:
>>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DS2ZIZVAXJVLPR6BFSZU63TU7KJWTZVA/
> 
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/4TSSMMITEHBXTGS76BYQ5ZPKRB7REZF7/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OICOEQ47EFZH2EPOLRXH7CKAT3XL4XJE/


[ovirt-users] Re: Clean old mount points in hosts VDSM

2020-06-25 Thread Vinícius Ferrão via Users
I think yes. But I’m not sure. 

I can do it again, there’s an update so I’ll do both and report back.

Thank you Strahil.

> On 25 Jun 2020, at 00:37, Strahil Nikolov  wrote:
> 
> Did you reinstall the node via the WEB UI ?
> 
> Best Regards,
> Strahil  Nikolov
> 
> На 25 юни 2020 г. 3:23:15 GMT+03:00, "Vinícius Ferrão via Users" 
>  написа:
>> Hello,
>> 
>> For reasons unknown one of my hosts is trying to mount an old storage
>> point that’s been removed some time ago.
>> 
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:35,958-0300 INFO 
>> (tmap-65016/0) [IOProcessClient] (/192.168.10.6:_mnt_pool0_ovirt_he)
>> Starting client (__init__:308)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:35,968-0300 INFO 
>> (ioprocess/12115) [IOProcess] (/192.168.10.6:_mnt_pool0_ovirt_he)
>> Starting ioprocess (__init__:434)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,167-0300 INFO  (jsonrpc/6)
>> [vdsm.api] START connectStorageServer(domType=1,
>> spUUID=u'----',
>> conList=[{u'protocol_version': u'auto', u'connection':
>> u'192.168.10.6:/mnt/pool0/ovirt/he', u'user': u'kvm', u'id':
>> u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}], options=None)
>> from=::1,59090, task_id=5ea81925-ec92-4031-aa36-bb6f436321d5 (api:48)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6)
>> [storage.StorageServer.MountConnection] Creating directory
>> u'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he'
>> (storageServer:168)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6)
>> [storage.fileUtils] Creating directory:
>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0ovirt_he mode: None
>> (fileUtils:199)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6)
>> [storage.Mount] mounting 192.168.10.6:/mnt/pool0/ovirt/he at
>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he (mount:204)
>> /var/log/vdsm/vdsm.log:MountError: (32, ';mount.nfs: mounting
>> 192.168.10.6:/mnt/pool0/ovirt/he failed, reason given by server: No
>> such file or directory\n')
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,683-0300 INFO  (jsonrpc/5)
>> [vdsm.api] START connectStorageServer(domType=1,
>> spUUID=u'----',
>> conList=[{u'protocol_version': u'auto', u'connection':
>> u'192.168.10.6:/mnt/pool0/ovirt/he', u'user': u'kvm', u'id':
>> u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}], options=None)
>> from=::1,59094, task_id=9ce61858-dea1-4059-b942-a52c8c82afdc (api:48)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5)
>> [storage.StorageServer.MountConnection] Creating directory
>> u'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he'
>> (storageServer:168)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5)
>> [storage.fileUtils] Creating directory:
>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0ovirt_he mode: None
>> (fileUtils:199)
>> /var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5)
>> [storage.Mount] mounting 192.168.10.6:/mnt/pool0/ovirt/he at
>> /rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he (mount:204)
>> /var/log/vdsm/vdsm.log:MountError: (32, ';mount.nfs: mounting
>> 192.168.10.6:/mnt/pool0/ovirt/he failed, reason given by server: No
>> such file or directory\n’)
>> 
>> This only happens in one host and it’s spamming /var/log/vdsm/vdsm.log.
>> 
>> Any ideia on how to debug this and remove the entry?
>> 
>> Thanks,
>> 
>> ___
>> Users mailing list -- users@ovirt.org
>> To unsubscribe send an email to users-le...@ovirt.org
>> Privacy Statement: https://www.ovirt.org/privacy-policy.html
>> oVirt Code of Conduct:
>> https://www.ovirt.org/community/about/community-guidelines/
>> List Archives:
>> https://lists.ovirt.org/archives/list/users@ovirt.org/message/DS2ZIZVAXJVLPR6BFSZU63TU7KJWTZVA/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4TSSMMITEHBXTGS76BYQ5ZPKRB7REZF7/


[ovirt-users] Clean old mount points in hosts VDSM

2020-06-24 Thread Vinícius Ferrão via Users
Hello,

For reasons unknown one of my hosts is trying to mount an old storage point 
that’s been removed some time ago.

/var/log/vdsm/vdsm.log:2020-06-24 19:57:35,958-0300 INFO  (tmap-65016/0) 
[IOProcessClient] (/192.168.10.6:_mnt_pool0_ovirt_he) Starting client 
(__init__:308)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:35,968-0300 INFO  (ioprocess/12115) 
[IOProcess] (/192.168.10.6:_mnt_pool0_ovirt_he) Starting ioprocess 
(__init__:434)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:42,167-0300 INFO  (jsonrpc/6) 
[vdsm.api] START connectStorageServer(domType=1, 
spUUID=u'----', conList=[{u'protocol_version': 
u'auto', u'connection': u'192.168.10.6:/mnt/pool0/ovirt/he', u'user': u'kvm', 
u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}], options=None) from=::1,59090, 
task_id=5ea81925-ec92-4031-aa36-bb6f436321d5 (api:48)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6) 
[storage.StorageServer.MountConnection] Creating directory 
u'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he' (storageServer:168)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6) 
[storage.fileUtils] Creating directory: 
/rhev/data-center/mnt/192.168.10.6:_mnt_pool0ovirt_he mode: None (fileUtils:199)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:42,169-0300 INFO  (jsonrpc/6) 
[storage.Mount] mounting 192.168.10.6:/mnt/pool0/ovirt/he at 
/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he (mount:204)
/var/log/vdsm/vdsm.log:MountError: (32, ';mount.nfs: mounting 
192.168.10.6:/mnt/pool0/ovirt/he failed, reason given by server: No such file 
or directory\n')
/var/log/vdsm/vdsm.log:2020-06-24 19:57:43,683-0300 INFO  (jsonrpc/5) 
[vdsm.api] START connectStorageServer(domType=1, 
spUUID=u'----', conList=[{u'protocol_version': 
u'auto', u'connection': u'192.168.10.6:/mnt/pool0/ovirt/he', u'user': u'kvm', 
u'id': u'e29cf818-5ee5-46e1-85c1-8aeefa33e95d'}], options=None) from=::1,59094, 
task_id=9ce61858-dea1-4059-b942-a52c8c82afdc (api:48)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5) 
[storage.StorageServer.MountConnection] Creating directory 
u'/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he' (storageServer:168)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5) 
[storage.fileUtils] Creating directory: 
/rhev/data-center/mnt/192.168.10.6:_mnt_pool0ovirt_he mode: None (fileUtils:199)
/var/log/vdsm/vdsm.log:2020-06-24 19:57:43,685-0300 INFO  (jsonrpc/5) 
[storage.Mount] mounting 192.168.10.6:/mnt/pool0/ovirt/he at 
/rhev/data-center/mnt/192.168.10.6:_mnt_pool0_ovirt_he (mount:204)
/var/log/vdsm/vdsm.log:MountError: (32, ';mount.nfs: mounting 
192.168.10.6:/mnt/pool0/ovirt/he failed, reason given by server: No such file 
or directory\n’)

This only happens in one host and it’s spamming /var/log/vdsm/vdsm.log.

Any ideia on how to debug this and remove the entry?

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/DS2ZIZVAXJVLPR6BFSZU63TU7KJWTZVA/


[ovirt-users] Re: teaming vs bonding

2020-06-10 Thread Vinícius Ferrão via Users
Only bonding, teaming is not supported on the by the hypervisor.

This was valid up to 4.3; not sure if something changed on 4.4, since I didn’t 
checked it.


> On 10 Jun 2020, at 15:30, Diggy Mc  wrote:
> 
> Does 4.4.x support adapter teaming?  If yes, which is preferred, teaming or 
> bonding?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/THMDPSFEX4GAISJ5ELGEWBEFMLKGQVE5/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IXTUQTYSZS6QR4BYDHNCUBBFRHDRHAXS/


[ovirt-users] Re: What happens when shared storage is down?

2020-06-09 Thread Vinícius Ferrão via Users


> On 7 Jun 2020, at 08:34, Strahil Nikolov  wrote:
> 
> 
> 
> На 7 юни 2020 г. 1:58:27 GMT+03:00, "Vinícius Ferrão via Users" 
>  написа:
>> Hello,
>> 
>> This is a pretty vague and difficult question to answer. But what
>> happens if the shared storage holding the VMs is down or unavailable
>> for a period of time?
> Once  a  pending I/O is blocked, libvirt will pause the VM .
> 
>> I’m aware that a longer timeout may put the VMs on pause state, but how
>> this is handled? Is it a time limit? Requests limit? Who manages this?
> You got sanlock.service that notifies the engine when a storage domain is 
> unaccessible for  mode than 60s.
> 
> Libvirt also will pause  a  VM when a pending I/O cannot be done.
> 
>> In an event of self recovery of the storage backend what happens next?
> Usually the engine should resume the VM,  and from application perspective 
> nothing has happened.

Hmm thanks Strahil. I was thinking to upgrade the storage backend of one of my 
oVirt clusters without powering off the VM’s, just to be lazy.

The storage does not have dual controllers, so downtime is needed. I’m trying 
to understand what happens so I can evaluate this update without turning off 
the VMs.

>> Manual intervention is required? The VMs may be down or they just
>> continue to run? It depends on the guest OS running like in XenServer
>> where different scenarios may happen?
>> 
>> I’ve looked here:
>> https://www.ovirt.org/documentation/admin-guide/chap-Storage.html but
>> there’s nothing that goes about this question.
>> 
>> Thanks,
>> 
>> Sent from my iPhone

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/BVZAG2V3KBB364U5VBRCBIU42LJNGCI6/


[ovirt-users] Re: Cannot start ppc64le VM's

2020-06-09 Thread Vinícius Ferrão via Users


On 8 Jun 2020, at 07:43, Michal Skrivanek 
mailto:michal.skriva...@redhat.com>> wrote:



On 5 Jun 2020, at 20:23, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

Hi Michal

On 5 Jun 2020, at 04:39, Michal Skrivanek 
mailto:michal.skriva...@redhat.com>> wrote:



On 5 Jun 2020, at 08:19, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hello, I’m trying to run ppc64le VM’s on POWER9 but qemu-kvm fails complaining 
about NUMA issues:

that is not a line you should be looking at, it’s just a harmless warning.
I suppose it’s the other one, about spectre fixes

VM ppc64le.local.versatushpc.com.br<http://ppc64le.local.versatushpc.com.br/> 
is down with error. Exit message: internal error: qemu unexpectedly closed the 
monitor: 2020-06-05T06:16:10.716052Z qemu-kvm: warning: CPU(s) not present in 
any NUMA nodes: CPU 4 [core-id: 4], CPU 5 [core-id: 5], CPU 6 [core-id: 6], CPU 
7 [core-id: 7], CPU 8 [core-id: 8], CPU 9 [core-id: 9], CPU 10 [core-id: 10], 
CPU 11 [core-id: 11], CPU 12 [core-id: 12], CPU 13 [core-id: 13], CPU 14 
[core-id: 14], CPU 15 [core-id: 15] 2020-06-05T06:16:10.716067Z qemu-kvm: 
warning: All CPU(s) up to maxcpus should be described in NUMA config, ability 
to start up with partial NUMA mappings is obsoleted and will be removed in 
future 2020-06-05T06:16:11.155924Z qemu-kvm: Requested safe indirect branch 
capability level not supported by kvm, try cap-ibs=fixed-ibs.

Any idea of what’s happening?

I found some links, but I’m not sure if they are related or not:
https://bugzilla.redhat.com/show_bug.cgi?id=1732726
https://bugzilla.redhat.com/show_bug.cgi?id=1592648

yes, they look relevant if that’s the hw you have. We do use 
pseries-rhel7.6.0-sxxm machine type in 4.3 (not in 4.4. that would be the 
preferred solution, to upgrade).
If you don’t care about security you can also modify the machine type per VM 
(or in engine db for all VMs) to "pseries-rhel7.6.0"

I’m using an AC922 machine.

and is it oVirt  4.3 or 4.4?
Bug 1732726 is on RHEL 8, so relevant only for oVirt 4.4, i.e. you’d have to 
have a 4.3 cluster level?
if you really want to keep using -sxxm you need to modify it to add the extra 
flag the bug talks about

this shouldn’t be needed in 4.4 cluster level though

Hi Michal, I’m running 4.3.10. Not in 4.4 yet.

So the workaround would be add cap-ibs=fixed-ibs to VM parameters so sxxm would 
work? Where do I add this? Do you know?

Thanks.



In fact I can boot the VMs with pseries-rhel7.6.0 but not with 
pseries-rhel7.6.0-sxxm; how do you made pseries-rhel7.6.0-sxxm works on 4.3 
release?

# lscpu
Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:4
Core(s) per socket:16
Socket(s): 2
NUMA node(s):  6
Model: 2.2 (pvr 004e 1202)
Model name:POWER9, altivec supported
CPU max MHz:   3800.
CPU min MHz:   2300.
L1d cache: 32K
L1i cache: 32K
L2 cache:  512K
L3 cache:  10240K
NUMA node0 CPU(s): 0-63
NUMA node8 CPU(s): 64-127
NUMA node252 CPU(s):
NUMA node253 CPU(s):
NUMA node254 CPU(s):
NUMA node255 CPU(s):

Thank you for helping out.


Thanks,
michal

Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PVVQDBO2XJYBQN7EUDMM74QZJ2UTLRJ2/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SC2ERGD6UZ7SCNOM52F3MDFMZVWY7B5E/


[ovirt-users] Re: Power Management on IBM AC922 Power9 (ppc64le)

2020-06-08 Thread Vinícius Ferrão via Users
Yes… actually IBM uses pretty standard stuff. IPMI is enabled by default and as 
I said, I can use ipmitool on CLI and it’s works normally.

I do have some updates, I upgraded the OpenBMC firmware and now I can use 
ipmitool like anything else with -U and -P; so I was hoping that oVirt would 
handle the Power Management with IPMI over LAN (exactly how you suggested) but 
the issue stays. JSON-RPC error. :(

Now I really think this is a bug, but I would like to get some confirmation 
from the oVirt devs to raise it on bugzilla.

> On 8 Jun 2020, at 14:00, bernadette.pfau--- via Users  wrote:
> 
> Making a guess here -- on Dell iDRAC there is a setting for "IPMI over LAN".  
> Is there an equivalent on the IBM?
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/BYLLNDCJ2VO3RRTJXS45CNUQYF3GYR6R/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3ZTOY2JM3EOHYDQ5XQBPNQ3YATTTX3BE/


[ovirt-users] What happens when shared storage is down?

2020-06-06 Thread Vinícius Ferrão via Users
Hello,

This is a pretty vague and difficult question to answer. But what happens if 
the shared storage holding the VMs is down or unavailable for a period of time?

I’m aware that a longer timeout may put the VMs on pause state, but how this is 
handled? Is it a time limit? Requests limit? Who manages this?

In an event of self recovery of the storage backend what happens next? Manual 
intervention is required? The VMs may be down or they just continue to run? It 
depends on the guest OS running like in XenServer where different scenarios may 
happen?

I’ve looked here: 
https://www.ovirt.org/documentation/admin-guide/chap-Storage.html but there’s 
nothing that goes about this question.

Thanks,

Sent from my iPhone
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/SQ3AQNL5VOYGRH63SHQJFB7MYUWQDGO3/


[ovirt-users] Re: Cannot start ppc64le VM's

2020-06-05 Thread Vinícius Ferrão via Users
Hi Michal

On 5 Jun 2020, at 04:39, Michal Skrivanek 
mailto:michal.skriva...@redhat.com>> wrote:



On 5 Jun 2020, at 08:19, Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:

Hello, I’m trying to run ppc64le VM’s on POWER9 but qemu-kvm fails complaining 
about NUMA issues:

that is not a line you should be looking at, it’s just a harmless warning.
I suppose it’s the other one, about spectre fixes

VM ppc64le.local.versatushpc.com.br<http://ppc64le.local.versatushpc.com.br/> 
is down with error. Exit message: internal error: qemu unexpectedly closed the 
monitor: 2020-06-05T06:16:10.716052Z qemu-kvm: warning: CPU(s) not present in 
any NUMA nodes: CPU 4 [core-id: 4], CPU 5 [core-id: 5], CPU 6 [core-id: 6], CPU 
7 [core-id: 7], CPU 8 [core-id: 8], CPU 9 [core-id: 9], CPU 10 [core-id: 10], 
CPU 11 [core-id: 11], CPU 12 [core-id: 12], CPU 13 [core-id: 13], CPU 14 
[core-id: 14], CPU 15 [core-id: 15] 2020-06-05T06:16:10.716067Z qemu-kvm: 
warning: All CPU(s) up to maxcpus should be described in NUMA config, ability 
to start up with partial NUMA mappings is obsoleted and will be removed in 
future 2020-06-05T06:16:11.155924Z qemu-kvm: Requested safe indirect branch 
capability level not supported by kvm, try cap-ibs=fixed-ibs.

Any idea of what’s happening?

I found some links, but I’m not sure if they are related or not:
https://bugzilla.redhat.com/show_bug.cgi?id=1732726
https://bugzilla.redhat.com/show_bug.cgi?id=1592648

yes, they look relevant if that’s the hw you have. We do use 
pseries-rhel7.6.0-sxxm machine type in 4.3 (not in 4.4. that would be the 
preferred solution, to upgrade).
If you don’t care about security you can also modify the machine type per VM 
(or in engine db for all VMs) to "pseries-rhel7.6.0"

I’m using an AC922 machine.

In fact I can boot the VMs with pseries-rhel7.6.0 but not with 
pseries-rhel7.6.0-sxxm; how do you made pseries-rhel7.6.0-sxxm works on 4.3 
release?

# lscpu
Architecture:  ppc64le
Byte Order:Little Endian
CPU(s):128
On-line CPU(s) list:   0-127
Thread(s) per core:4
Core(s) per socket:16
Socket(s): 2
NUMA node(s):  6
Model: 2.2 (pvr 004e 1202)
Model name:POWER9, altivec supported
CPU max MHz:   3800.
CPU min MHz:   2300.
L1d cache: 32K
L1i cache: 32K
L2 cache:  512K
L3 cache:  10240K
NUMA node0 CPU(s): 0-63
NUMA node8 CPU(s): 64-127
NUMA node252 CPU(s):
NUMA node253 CPU(s):
NUMA node254 CPU(s):
NUMA node255 CPU(s):

Thank you for helping out.


Thanks,
michal

Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PVVQDBO2XJYBQN7EUDMM74QZJ2UTLRJ2/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TH36FTKGIQR2WZ5D7KJUYLY46C5GXO7Z/


[ovirt-users] Cannot start ppc64le VM's

2020-06-05 Thread Vinícius Ferrão via Users
Hello, I’m trying to run ppc64le VM’s on POWER9 but qemu-kvm fails complaining 
about NUMA issues:

VM ppc64le.local.versatushpc.com.br is 
down with error. Exit message: internal error: qemu unexpectedly closed the 
monitor: 2020-06-05T06:16:10.716052Z qemu-kvm: warning: CPU(s) not present in 
any NUMA nodes: CPU 4 [core-id: 4], CPU 5 [core-id: 5], CPU 6 [core-id: 6], CPU 
7 [core-id: 7], CPU 8 [core-id: 8], CPU 9 [core-id: 9], CPU 10 [core-id: 10], 
CPU 11 [core-id: 11], CPU 12 [core-id: 12], CPU 13 [core-id: 13], CPU 14 
[core-id: 14], CPU 15 [core-id: 15] 2020-06-05T06:16:10.716067Z qemu-kvm: 
warning: All CPU(s) up to maxcpus should be described in NUMA config, ability 
to start up with partial NUMA mappings is obsoleted and will be removed in 
future 2020-06-05T06:16:11.155924Z qemu-kvm: Requested safe indirect branch 
capability level not supported by kvm, try cap-ibs=fixed-ibs.

Any idea of what’s happening?

I found some links, but I’m not sure if they are related or not:
https://bugzilla.redhat.com/show_bug.cgi?id=1732726
https://bugzilla.redhat.com/show_bug.cgi?id=1592648

Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/PVVQDBO2XJYBQN7EUDMM74QZJ2UTLRJ2/


[ovirt-users] Power Management on IBM AC922 Power9 (ppc64le)

2020-06-04 Thread Vinícius Ferrão via Users
Hello,

I would like to know how to enable Power Management on AC922 hardware from IBM. 
It’s ppc64le architecture and runs OpenBMC as manager.

I only get Test failed: Internal JSON-RPC error when adding the infos with 
ipmilan on the engine. From the command line I can use ipmitool but without 
specifying any user. On the Engine I must specify an user. There’s no way to 
leave it blank.

Thanks,


 
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5THZCU737LSCVOGLUF3INB5DEKWO56YD/


[ovirt-users] Re: POWER9 Support: VDSM requiring LVM2 package that's missing

2020-05-14 Thread Vinícius Ferrão via Users
Hi Amit, I think I found the answer: It’s not available yet.

https://bugzilla.redhat.com/show_bug.cgi?id=1829348

It's this bug right?

Thanks,

On 14 May 2020, at 20:14, Vinícius Ferrão 
mailto:fer...@versatushpc.com.br>> wrote:

Hi Amit, thanks for confirming.

Do you know in which repository VDSM 4.30.36 is available?

It’s not available on any of both:
rhel-7-for-power-9-rpms/ppc64le  Red 
Hat Enterprise Linux 7 for POWER9 (RPMs) 9,156
rhel-7-server-rhv-4-mgmt-agent-for-power-9-rpms/ppc64le Red Hat Virtualization 
4 Management Agents (for RHEL 7 Server for IBM POWER9   814


Thank you!


On 14 May 2020, at 20:09, Amit Bawer 
mailto:aba...@redhat.com>> wrote:


On Fri, May 15, 2020 at 12:19 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello,

I would like to know if this is a bug or not, if yes I will submit to Red Hat.
Fixed on vdsm-4.30.46

I’m trying to add a ppc64le (POWER9) machine to the hosts pool, but there’s 
missing dependencies on VDSM:

--> Processing Dependency: lvm2 >= 7:2.02.186-7.el7_8.1 for package: 
vdsm-4.30.44-1.el7ev.ppc64le
--> Finished Dependency Resolution
Error: Package: vdsm-4.30.44-1.el7ev.ppc64le 
(rhel-7-server-rhv-4-mgmt-agent-for-power-9-rpms)
   Requires: lvm2 >= 7:2.02.186-7.el7_8.1
   Available: 7:lvm2-2.02.171-8.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.171-8.el7
   Available: 7:lvm2-2.02.177-4.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.177-4.el7
   Available: 7:lvm2-2.02.180-8.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-8.el7
   Available: 7:lvm2-2.02.180-10.el7_6.1.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.1
   Available: 7:lvm2-2.02.180-10.el7_6.2.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.2
   Available: 7:lvm2-2.02.180-10.el7_6.3.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.3
   Available: 7:lvm2-2.02.180-10.el7_6.7.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.7
   Available: 7:lvm2-2.02.180-10.el7_6.8.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.8
   Installing: 7:lvm2-2.02.180-10.el7_6.9.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.9


Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I3YDM2VN7K2GHNLNLWCEXZRSAHI4F4L7/


___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/4ITYU2XWLABXTRSQ3VE3G7AUXADWP3DB/


[ovirt-users] Re: POWER9 Support: VDSM requiring LVM2 package that's missing

2020-05-14 Thread Vinícius Ferrão via Users
Hi Amit, thanks for confirming.

Do you know in which repository VDSM 4.30.36 is available?

It’s not available on any of both:
rhel-7-for-power-9-rpms/ppc64le  Red 
Hat Enterprise Linux 7 for POWER9 (RPMs) 9,156
rhel-7-server-rhv-4-mgmt-agent-for-power-9-rpms/ppc64le Red Hat Virtualization 
4 Management Agents (for RHEL 7 Server for IBM POWER9   814


Thank you!


On 14 May 2020, at 20:09, Amit Bawer 
mailto:aba...@redhat.com>> wrote:


On Fri, May 15, 2020 at 12:19 AM Vinícius Ferrão via Users 
mailto:users@ovirt.org>> wrote:
Hello,

I would like to know if this is a bug or not, if yes I will submit to Red Hat.
Fixed on vdsm-4.30.46

I’m trying to add a ppc64le (POWER9) machine to the hosts pool, but there’s 
missing dependencies on VDSM:

--> Processing Dependency: lvm2 >= 7:2.02.186-7.el7_8.1 for package: 
vdsm-4.30.44-1.el7ev.ppc64le
--> Finished Dependency Resolution
Error: Package: vdsm-4.30.44-1.el7ev.ppc64le 
(rhel-7-server-rhv-4-mgmt-agent-for-power-9-rpms)
   Requires: lvm2 >= 7:2.02.186-7.el7_8.1
   Available: 7:lvm2-2.02.171-8.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.171-8.el7
   Available: 7:lvm2-2.02.177-4.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.177-4.el7
   Available: 7:lvm2-2.02.180-8.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-8.el7
   Available: 7:lvm2-2.02.180-10.el7_6.1.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.1
   Available: 7:lvm2-2.02.180-10.el7_6.2.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.2
   Available: 7:lvm2-2.02.180-10.el7_6.3.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.3
   Available: 7:lvm2-2.02.180-10.el7_6.7.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.7
   Available: 7:lvm2-2.02.180-10.el7_6.8.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.8
   Installing: 7:lvm2-2.02.180-10.el7_6.9.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.9


Thanks,

___
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I3YDM2VN7K2GHNLNLWCEXZRSAHI4F4L7/

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/W3P6LKEPJBNECFYFJEML6G5W3XDAS43Q/


[ovirt-users] POWER9 Support: VDSM requiring LVM2 package that's missing

2020-05-14 Thread Vinícius Ferrão via Users
Hello,

I would like to know if this is a bug or not, if yes I will submit to Red Hat.

I’m trying to add a ppc64le (POWER9) machine to the hosts pool, but there’s 
missing dependencies on VDSM:

--> Processing Dependency: lvm2 >= 7:2.02.186-7.el7_8.1 for package: 
vdsm-4.30.44-1.el7ev.ppc64le
--> Finished Dependency Resolution
Error: Package: vdsm-4.30.44-1.el7ev.ppc64le 
(rhel-7-server-rhv-4-mgmt-agent-for-power-9-rpms)
   Requires: lvm2 >= 7:2.02.186-7.el7_8.1
   Available: 7:lvm2-2.02.171-8.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.171-8.el7
   Available: 7:lvm2-2.02.177-4.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.177-4.el7
   Available: 7:lvm2-2.02.180-8.el7.ppc64le (rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-8.el7
   Available: 7:lvm2-2.02.180-10.el7_6.1.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.1
   Available: 7:lvm2-2.02.180-10.el7_6.2.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.2
   Available: 7:lvm2-2.02.180-10.el7_6.3.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.3
   Available: 7:lvm2-2.02.180-10.el7_6.7.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.7
   Available: 7:lvm2-2.02.180-10.el7_6.8.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.8
   Installing: 7:lvm2-2.02.180-10.el7_6.9.ppc64le 
(rhel-7-for-power-9-rpms)
   lvm2 = 7:2.02.180-10.el7_6.9


Thanks,

___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/I3YDM2VN7K2GHNLNLWCEXZRSAHI4F4L7/