Op 24-7-2019 om 10:07 schreef Benny Zlotnik:
We have seen something similar in the past and patches were posted to deal with 
this issue, but it's still in progress[1]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1553133

That's some interesting reading, and it sure looks like the problem I had. 
Thanks!

Best regards,
Martijn.



On Mon, Jul 22, 2019 at 8:07 PM Strahil 
<hunter86...@yahoo.com<mailto:hunter86...@yahoo.com>> wrote:

I have a theory... But after all without any proof it will remain theory.

The storage volumes are just VGs over a shared storage.The SPM host is supposed 
to be the only one that is working with the LVM metadata, but I have observed 
that when someone is executing a simple LVM command  (for example -lvs, vgs or 
pvs ) while another one is going on on another host - your metadata can 
corrupt, due to lack of clvmd.

As a protection, I could offer you to try the following solution:
1. Create new iSCSI lun
2. Share it to all nodes and create the storage domain. Set it to maintenance.
3. Start dlm & clvmd services on all hosts
4. Convert the VG of your shared storage domain to have a 'cluster'-ed  flag:
vgchange -c y mynewVG
5. Check the lvs of that VG.
6. Activate the storage domain.

Of course  test it on a test cluster before inplementing it on Prod.
This is one of the approaches used in Linux HA clusters in order to avoid  LVM 
metadata corruption.

Best Regards,
Strahil Nikolov

On Jul 22, 2019 15:46, Martijn Grendelman 
<martijn.grendel...@isaac.nl<mailto:martijn.grendel...@isaac.nl>> wrote:
Hi,

Op 22-7-2019 om 14:30 schreef Strahil:

If you can give directions (some kind of history) , the dev might try to 
reproduce this type of issue.

If it is reproduceable - a fix can be provided.

Based on my experience, if something as used as Linux LVM gets broken, the case 
is way hard to reproduce.

Yes, I'd think so too, especially since this activity (online moving of disk 
images) is done all the time, mostly without problems. In this case, there was 
a lot of activity on all storage domains, because I'm moving all my storage (> 
10TB in 185 disk images) to a new storage platform. During the online move of 
one the images, the metadata checksum became corrupted and the storage domain 
went offline.

Of course, I could dig up the engine logs and vdsm logs of when it happened, 
but that would be some work and I'm not very confident that the actual cause 
would be in there.

If any oVirt devs are interested in the logs, I'll provide them, but otherwise 
I think I'll just see it as an incident and move on.

Best regards,
Martijn.




On Jul 22, 2019 10:17, Martijn Grendelman 
<martijn.grendel...@isaac.nl><mailto:martijn.grendel...@isaac.nl> wrote:
Hi,

Thanks for the tips! I didn't know about 'pvmove', thanks.

In  the mean time, I managed to get it fixed by restoring the VG metadata on 
the iSCSI server, so on the underlying Zvol directly, rather than via the iSCSI 
session on the oVirt host. That allowed me to perform the restore without 
bringing all VMs down, which was important to me, because if I had to shut down 
VMs, I was sure I wouldn't be able to restart them before the storage domain 
was back online.

Of course this is a more a Linux problem than an oVirt problem, but oVirt did 
cause it ;-)

Thanks,
Martijn.



Op 19-7-2019 om 19:06 schreef Strahil Nikolov:
Hi Martin,

First check what went wrong with the VG -as it could be something simple.
vgcfgbackup -f VGname will create a file which you can use to compare current 
metadata with a previous version.

If you have Linux boxes - you can add disks from another storage and then 
pvmove the data inside the VM. Of course , you will need to reinstall grub on 
the new OS disk , or you won't be able to boot afterwards.
If possible, try with a test VM before proceeding with important ones.

Backing up the VMs is very important , because working on LVM metada
_______________________________________________
Users mailing list -- users@ovirt.org<mailto:users@ovirt.org>
To unsubscribe send an email to 
users-le...@ovirt.org<mailto:users-le...@ovirt.org>
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/37UDAWDXON3URKVGSR3YGIZML2ZVPZOG/

--
Met vriendelijke groet,
Kind regards,

[Martijn]<mailto:martijn.grendel...@isaac.nl>

Martijn Grendelman  Infrastructure Architect
T: +31 (0)40 264 94 44



[ISAAC]<https://www.isaac.nl>
ISAAC   Marconilaan 16   5621 AA Eindhoven   The Netherlands
T: +31 (0)40 290 89 79   www.isaac.nl<https://www.isaac.nl>

[ISAAC #1 
Again!]<https://www.isaac.nl/nl/over-ons/nieuws/isaac-news/ISAAC-voor-tweede-keer-nummer-1-Fullservice-Digital-Agency-Emerce100>

Dit e-mail bericht is alleen bestemd voor de geadresseerde(n). Indien dit 
bericht niet voor u is bedoeld wordt u verzocht de afzender hiervan op de 
hoogte te stellen door het bericht te retourneren en de inhoud niet te 
gebruiken. Aan dit bericht kunnen geen rechten worden ontleend.
_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3HGKGOKURQUQ6M3HO7F53FJUSZ4OMCXV/

Reply via email to