Hi Dietmar On 01/09/16 08:38, Dietmar Maurer wrote: >> In reply to Dietmar in absence of John: >> >> root@pve:~# lvchange -a y pve/data >> Thin pool transaction_id is 0, while expected 3. > > > Does it help if you reboot the node? >
No - we tried that (it's a single machine/node - no cluster) > Some people reported that is possible to fix this with > vgcfgbackup/vgcfgrestore, > but you should be really careful when doing such things (backup everything > first). > We can try it but we have a very large VM that we have nowhere to store easily at present..... as the main part of the array is taken up by the inaccessible lvm-thin partition we cannot back up to the small 'local' partition and cannot see another way to get a backup off the server as you can't backup if you cannot access the VM..... > # vgcfgbackup -f lvmbackup.txt pve > > edit transaction_id inside lvmbackup.txt > > # vgcfgrestore --file lvmbackup.txt --force pve We can try this but clearly we risk losing the VM > > But John wrote: > >> There was an issue with the BBWC on the RAID > > So it is likely that more than this is damaged (what happened exactly?)... > Extremely unlikely. The RAID would normally just disable any write caching to preserve data integrity. From what we can see the server looked like it was shutdown cleanly before they replaced the battery. There are no obvious errors in the logs Due to an error by the datacentre staff who managed to plug in the network cables in the wrong ports (!!!!!!) after replacing the battery the machine was rebooted a number of times, but the VMs were all set to manual start so none were run until a few hours later. The machine seemed to boot cleanly with no obvious errors What is worrying is a) there is no effective way to back up a VM if you cannot mount the partition b) For a single machine having a default setting to provision lvm-thin with no easily accessible backup space is clearly dangerous c) what actually caused this to happen Having looked a bit further the only thing that I can see that *might* be related is this Aug 31 15:35:55 pve pve-manager[5584]: shutdown VM 701: UPID:pve:000015D0:147C9EF2:57C6DD3B:qmshutdown:701:root@pam: Aug 31 15:36:06 pve pve-manager[5583]: end task UPID:pve:000015D0:147C9EF2:57C6DD3B:qmshutdown:701:root@pam: Aug 31 15:36:10 pve pve-manager[5583]: all VMs and CTs stopped Aug 31 15:36:10 pve pve-manager[5577]: <root@pam> end task UPID:pve:000015CF:147C9EEF:57C6DD3B:stopall::root@pam: OK Aug 31 15:36:12 pve fusermount[5672]: /bin/fusermount: failed to unmount /var/lib/lxcfs: Invalid argument This seems very similar: https://forum.proxmox.com/threads/cannot-start-kvm.25085/ We wonder whether the VM was shutdown cleanly but the file system was not unmounted cleanly ? Any thoughts appreciated. B. Rgds John
signature.asc
Description: OpenPGP digital signature
_______________________________________________ pve-user mailing list [email protected] http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
