We have seen something similar in the past and patches were posted to deal with this issue, but it's still in progress[1]
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1553133 On Mon, Jul 22, 2019 at 8:07 PM Strahil <[email protected]> wrote: > I have a theory... But after all without any proof it will remain theory. > > The storage volumes are just VGs over a shared storage.The SPM host is > supposed to be the only one that is working with the LVM metadata, but I > have observed that when someone is executing a simple LVM command (for > example -lvs, vgs or pvs ) while another one is going on on another host - > your metadata can corrupt, due to lack of clvmd. > > As a protection, I could offer you to try the following solution: > 1. Create new iSCSI lun > 2. Share it to all nodes and create the storage domain. Set it to > maintenance. > 3. Start dlm & clvmd services on all hosts > 4. Convert the VG of your shared storage domain to have a 'cluster'-ed > flag: > vgchange -c y mynewVG > 5. Check the lvs of that VG. > 6. Activate the storage domain. > > Of course test it on a test cluster before inplementing it on Prod. > This is one of the approaches used in Linux HA clusters in order to avoid > LVM metadata corruption. > > Best Regards, > Strahil Nikolov > On Jul 22, 2019 15:46, Martijn Grendelman <[email protected]> > wrote: > > Hi, > > Op 22-7-2019 om 14:30 schreef Strahil: > > If you can give directions (some kind of history) , the dev might try to > reproduce this type of issue. > > If it is reproduceable - a fix can be provided. > > Based on my experience, if something as used as Linux LVM gets broken, the > case is way hard to reproduce. > > > Yes, I'd think so too, especially since this activity (online moving of > disk images) is done all the time, mostly without problems. In this case, > there was a lot of activity on all storage domains, because I'm moving all > my storage (> 10TB in 185 disk images) to a new storage platform. During > the online move of one the images, the metadata checksum became corrupted > and the storage domain went offline. > > Of course, I could dig up the engine logs and vdsm logs of when it > happened, but that would be some work and I'm not very confident that the > actual cause would be in there. > > If any oVirt devs are interested in the logs, I'll provide them, but > otherwise I think I'll just see it as an incident and move on. > > Best regards, > Martijn. > > > > > On Jul 22, 2019 10:17, Martijn Grendelman <[email protected]> > <[email protected]> wrote: > > Hi, > > Thanks for the tips! I didn't know about 'pvmove', thanks. > > In the mean time, I managed to get it fixed by restoring the VG metadata > on the iSCSI server, so on the underlying Zvol directly, rather than via > the iSCSI session on the oVirt host. That allowed me to perform the restore > without bringing all VMs down, which was important to me, because if I had > to shut down VMs, I was sure I wouldn't be able to restart them before the > storage domain was back online. > > Of course this is a more a Linux problem than an oVirt problem, but oVirt > did cause it ;-) > > Thanks, > Martijn. > > > > Op 19-7-2019 om 19:06 schreef Strahil Nikolov: > > Hi Martin, > > First check what went wrong with the VG -as it could be something simple. > vgcfgbackup -f VGname will create a file which you can use to compare > current metadata with a previous version. > > If you have Linux boxes - you can add disks from another storage and then > pvmove the data inside the VM. Of course , you will need to reinstall grub > on the new OS disk , or you won't be able to boot afterwards. > If possible, try with a test VM before proceeding with important ones. > > Backing up the VMs is very important , because working on LVM metada > > _______________________________________________ > Users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > Privacy Statement: https://www.ovirt.org/site/privacy-policy/ > oVirt Code of Conduct: > https://www.ovirt.org/community/about/community-guidelines/ > List Archives: > https://lists.ovirt.org/archives/list/[email protected]/message/37UDAWDXON3URKVGSR3YGIZML2ZVPZOG/ >
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/SHYGSEOGHWPBQHXQXOPRWWBOMRSTPADH/

