We have seen something similar in the past and patches were posted to deal
with this issue, but it's still in progress[1]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1553133

On Mon, Jul 22, 2019 at 8:07 PM Strahil <[email protected]> wrote:

> I have a theory... But after all without any proof it will remain theory.
>
> The storage volumes are just VGs over a shared storage.The SPM host is
> supposed to be the only one that is working with the LVM metadata, but I
> have observed that when someone is executing a simple LVM command  (for
> example -lvs, vgs or pvs ) while another one is going on on another host -
> your metadata can corrupt, due to lack of clvmd.
>
> As a protection, I could offer you to try the following solution:
> 1. Create new iSCSI lun
> 2. Share it to all nodes and create the storage domain. Set it to
> maintenance.
> 3. Start dlm & clvmd services on all hosts
> 4. Convert the VG of your shared storage domain to have a 'cluster'-ed
> flag:
> vgchange -c y mynewVG
> 5. Check the lvs of that VG.
> 6. Activate the storage domain.
>
> Of course  test it on a test cluster before inplementing it on Prod.
> This is one of the approaches used in Linux HA clusters in order to avoid
> LVM metadata corruption.
>
> Best Regards,
> Strahil Nikolov
> On Jul 22, 2019 15:46, Martijn Grendelman <[email protected]>
> wrote:
>
> Hi,
>
> Op 22-7-2019 om 14:30 schreef Strahil:
>
> If you can give directions (some kind of history) , the dev might try to
> reproduce this type of issue.
>
> If it is reproduceable - a fix can be provided.
>
> Based on my experience, if something as used as Linux LVM gets broken, the
> case is way hard to reproduce.
>
>
> Yes, I'd think so too, especially since this activity (online moving of
> disk images) is done all the time, mostly without problems. In this case,
> there was a lot of activity on all storage domains, because I'm moving all
> my storage (> 10TB in 185 disk images) to a new storage platform. During
> the online move of one the images, the metadata checksum became corrupted
> and the storage domain went offline.
>
> Of course, I could dig up the engine logs and vdsm logs of when it
> happened, but that would be some work and I'm not very confident that the
> actual cause would be in there.
>
> If any oVirt devs are interested in the logs, I'll provide them, but
> otherwise I think I'll just see it as an incident and move on.
>
> Best regards,
> Martijn.
>
>
>
>
> On Jul 22, 2019 10:17, Martijn Grendelman <[email protected]>
> <[email protected]> wrote:
>
> Hi,
>
> Thanks for the tips! I didn't know about 'pvmove', thanks.
>
> In  the mean time, I managed to get it fixed by restoring the VG metadata
> on the iSCSI server, so on the underlying Zvol directly, rather than via
> the iSCSI session on the oVirt host. That allowed me to perform the restore
> without bringing all VMs down, which was important to me, because if I had
> to shut down VMs, I was sure I wouldn't be able to restart them before the
> storage domain was back online.
>
> Of course this is a more a Linux problem than an oVirt problem, but oVirt
> did cause it ;-)
>
> Thanks,
> Martijn.
>
>
>
> Op 19-7-2019 om 19:06 schreef Strahil Nikolov:
>
> Hi Martin,
>
> First check what went wrong with the VG -as it could be something simple.
> vgcfgbackup -f VGname will create a file which you can use to compare
> current metadata with a previous version.
>
> If you have Linux boxes - you can add disks from another storage and then
> pvmove the data inside the VM. Of course , you will need to reinstall grub
> on the new OS disk , or you won't be able to boot afterwards.
> If possible, try with a test VM before proceeding with important ones.
>
> Backing up the VMs is very important , because working on LVM metada
>
> _______________________________________________
> Users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/[email protected]/message/37UDAWDXON3URKVGSR3YGIZML2ZVPZOG/
>
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/SHYGSEOGHWPBQHXQXOPRWWBOMRSTPADH/

Reply via email to