Are you using "nobarrier" mount options in the VM ?

If yes, can you try to remove the "nobarrrier" option.


Best Regards,
Strahil Nikolov






В събота, 28 ноември 2020 г., 19:25:48 Гринуич+2, Vinícius Ferrão 
<fer...@versatushpc.com.br> написа: 





Hi Strahil,

I moved a running VM to other host, rebooted and no corruption was found. If 
there's any corruption it may be silent corruption... I've cases where the VM 
was new, just installed, run dnf -y update to get the updated packages, 
rebooted, and boom XFS corruption. So perhaps the motion process isn't the one 
to blame.

But, in fact, I remember when moving a VM that it went down during the process 
and when I rebooted it was corrupted. But this may not seems related. It 
perhaps was already in a inconsistent state.

Anyway, here's the mount options:

Host1:
192.168.10.14:/mnt/pool0/ovirt/vm on 
/rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,local_lock=none,addr=192.168.10.14)

Host2:
192.168.10.14:/mnt/pool0/ovirt/vm on 
/rhev/data-center/mnt/192.168.10.14:_mnt_pool0_ovirt_vm type nfs4 
(rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.10.1,local_lock=none,addr=192.168.10.14)

The options are the default ones. I haven't changed anything when configuring 
this cluster.

Thanks.



-----Original Message-----
From: Strahil Nikolov <hunter86...@yahoo.com> 
Sent: Saturday, November 28, 2020 1:54 PM
To: users <users@ovirt.org>; Vinícius Ferrão <fer...@versatushpc.com.br>
Subject: Re: [ovirt-users] Constantly XFS in memory corruption inside VMs

Can you try with a test vm, if this happens after a Virtual Machine migration ?

What are your mount options for the storage domain ?

Best Regards,
Strahil Nikolov






В събота, 28 ноември 2020 г., 18:25:15 Гринуич+2, Vinícius Ferrão via Users 
<users@ovirt.org> написа: 





  


Hello,

 

I’m trying to discover why an oVirt 4.4.3 Cluster with two hosts and NFS shared 
storage on TrueNAS 12.0 is constantly getting XFS corruption inside the VMs.

 

For random reasons VM’s gets corrupted, sometimes halting it or just being 
silent corrupted and after a reboot the system is unable to boot due to 
“corruption of in-memory data detected”. Sometimes the corrupted data are “all 
zeroes”, sometimes there’s data there. In extreme cases the XFS superblock 0 
get’s corrupted and the system cannot even detect a XFS partition anymore since 
the magic XFS key is corrupted on the first blocks of the virtual disk.

 

This is happening for a month now. We had to rollback some backups, and I don’t 
trust anymore on the state of the VMs.

 

Using xfs_db I can see that some VM’s have corrupted superblocks but the VM is 
up. One in specific, was with sb0 corrupted, so I knew when a reboot kicks in 
the machine will be gone, and that’s exactly what happened.

 

Another day I was just installing a new CentOS 8 VM for random reasons, and 
after running dnf -y update and a reboot the VM was corrupted needing XFS 
repair. That was an extreme case.

 

So, I’ve looked on the TrueNAS logs, and there’s apparently nothing wrong on 
the system. No errors logged on dmesg, nothing on /var/log/messages and no 
errors on the “zpools”, not even after scrub operations. On the switch, a 
Catalyst 2960X, we’ve been monitoring it and all it’s interfaces. There are no 
“up and down” and zero errors on all interfaces (we have a 4x Port LACP on the 
TrueNAS side and 2x Port LACP on each hosts), everything seems to be fine. The 
only metric that I was unable to get is “dropped packages”, but I’m don’t know 
if this can be an issue or not.

 

Finally, on oVirt, I can’t find anything either. I looked on /var/log/messages 
and /var/log/sanlock.log but there’s nothing that I found suspicious.

 

Is there’s anyone out there experiencing this? Our VM’s are mainly CentOS 7/8 
with XFS, there’s 3 Windows VM’s that does not seems to be affected, everything 
else is affected.

 

Thanks all.



_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org Privacy Statement: 
https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VLYSE7HCFNWTWFZZTL2EJHV36OENHUGB/
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/IYCYMCPXTXQHYDTZLN3T4WLIBIN4HPDM/

Reply via email to