Hi everyone,

Thank you everyone who answered.
In fact, I will be glad to file a bug when I'm done with recovering this very serious VM. But my main concern now is to be able to run it asap or switch to the painfull way of tape recovering.

I found similarities between some already filed bugs and my issue, but I think my issue is much simpler. In my case :
- the VM has only one disk
- the whole oVirt setup is using an iSCSI SAN
- the VM was shut, there were no attempt to do a live snapshot
- I did not stop the engine during to delete or whatever disturbing action
- I did the exact same steps two days ago on a test VM and it ran fine.
- In between I did not upgrade or reset anything

I found in the mail below many many common points :

http://list-archives.org/2013/10/25/users-ovirt-org/vm-snapshot-delete-failed-iscsi-domain/f/6837397684

By reading my logs, some of you jumped to the python errors, but when looking far above, one can see some previous (non-python) errors complaining about some logical volume not found.


Today, I had no more log written in engine.log, so I decided to restart the engine :
- Logs came back (...).
- In the faulty VM, now I see NO snapshot at all.
- I still see the disk
- Trying to start the VM leads to the following error :

VM uc-674 is down. Exit message: internal error process exited while connecting to monitor: qemu-kvm: -drive file=/rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/11a077c7-658b-49bb-8596-a785109c24c9/images/69220da6-eeed-4435-aad0-7aa33f3a0d21/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23,if=none,id=drive-virtio-disk0,format=qcow2,serial=69220da6-eeed-4435-aad0-7aa33f3a0d21,cache=none,werror=stop,rerror=stop,aio=native: could not open disk image /rhev/data-center/5849b030-626e-47cb-ad90-3ce782d831b3/11a077c7-658b-49bb-8596-a785109c24c9/images/69220da6-eeed-4435-aad0-7aa33f3a0d21/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23: Invalid argument .

And indeed, from the SPM I'm looking for the device, I see nothing :
[root@serv-vm-adm9 11a077c7-658b-49bb-8596-a785109c24c9]# ls -la /dev/11a077c7-658b-49bb-8596-a785109c24c9/
total 0
drwxr-xr-x.  2 root root  200  7 janv. 08:23 .
drwxr-xr-x. 21 root root 4480  7 janv. 08:23 ..
lrwxrwxrwx. 1 root root 8 5 déc. 11:58 5c71e53b-21f2-4671-94f8-4603d1b0bf5e -> ../dm-19 lrwxrwxrwx. 1 root root 8 5 déc. 11:58 7369a73a-fea5-40d9-ad0a-7d81a43fe931 -> ../dm-20
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 ids -> ../dm-5
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 inbox -> ../dm-7
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 leases -> ../dm-6
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 master -> ../dm-9
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 metadata -> ../dm-4
lrwxrwxrwx.  1 root root    7 10 oct.  17:22 outbox -> ../dm-8

There is no trace of the lvs it should be using (/dev/11a077c7-658b-49bb-8596-a785109c24c9/c50561d9-c3ba-4366-b2bc-49bbfaa4cd23).

In the URL I provided above, the op is able to lvchange -aey the device.
In my case, though a lvmdiskscan + a lvs is showing me the LV, there is not device in /dev/{the proper VG}/{my missing LV}.

Well, the last thing to ask is :

Is there a way to recover it, to recreate an device to access this LV and to activate it?

--
Nicolas Ecarnot

Le 07/01/2014 04:09, Maor Lipchuk a écrit :
Hi Nicolas,
I think that the initial problem started at 10:06 when VDSM tried to
clear records of the ancestor volume
c50561d9-c3ba-4366-b2bc-49bbfaa4cd23 (see [1])

Looking at bugzilla, it could be related to
https://bugzilla.redhat.com/1029069
(based on the exception described at
https://bugzilla.redhat.com/show_bug.cgi?id=1029069#c1)

The issue there was fixed after an upgrade to 3.3.1 (as Sander mentioned
it before in the mailing list)

Could you give it a try and check if that works for you?

Also it will be great if you could open a bug on that with the full
VDSM, engine logs and the list of lvs.

Regards,
Maor



[1]
236b3c5a-452a-4614-801a-c30cefbce87e::ERROR::2014-01-06
10:06:14,407::task::850::TaskManager.Task::(_setError)
Task=`236b3c5a-452a-4614-801a-c30cefbce87e`::Unexpected error
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/task.py", line 857, in _run
     return fn(*args, **kargs)
   File "/usr/share/vdsm/storage/task.py", line 318, in run
     return self.cmd(*self.argslist, **self.argsdict)
   File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
     return f(self, *args, **kwargs)
   File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
     sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
   File "/usr/share/vdsm/storage/image.py", line 1162, in merge
     srcVol.shrinkToOptimalSize()
   File "/usr/share/vdsm/storage/blockVolume.py", line 315, in
shrinkToOptimalSize
     volParams = self.getVolumeParams()
   File "/usr/share/vdsm/storage/volume.py", line 1008, in getVolumeParams
     volParams['imgUUID'] = self.getImage()
   File "/usr/share/vdsm/storage/blockVolume.py", line 494, in getImage
     return self.getVolumeTag(TAG_PREFIX_IMAGE)
   File "/usr/share/vdsm/storage/blockVolume.py", line 464, in getVolumeTag
     return _getVolumeTag(self.sdUUID, self.volUUID, tagPrefix)
   File "/usr/share/vdsm/storage/blockVolume.py", line 662, in _getVolumeTag
     tags = lvm.getLV(sdUUID, volUUID).tags
   File "/usr/share/vdsm/storage/lvm.py", line 851, in getLV
     raise se.LogicalVolumeDoesNotExistError("%s/%s" % (vgName, lvName))
LogicalVolumeDoesNotExistError: Logical volume does not exist:
('11a077c7-658b-49bb-8596-a785109c24c9/_remove_me_aVmPgweS_c50561d9-c3ba-4366-b2bc-49bbfaa4cd23',)


On 01/06/2014 04:39 PM, Meital Bourvine wrote:
I got the attachment.

This is the relevant error:
6caec3bc-fc66-42be-a642-7733fc033103::ERROR::2014-01-06 
10:13:21,068::task::850::TaskManager.Task::(_setError) 
Task=`6caec3bc-fc66-42be-a642-7733fc033103`::Unexpected error
Traceback (most recent call last):
   File "/usr/share/vdsm/storage/task.py", line 857, in _run
     return fn(*args, **kargs)
   File "/usr/share/vdsm/storage/task.py", line 318, in run
     return self.cmd(*self.argslist, **self.argsdict)
   File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
     return f(self, *args, **kwargs)
   File "/usr/share/vdsm/storage/sp.py", line 1937, in mergeSnapshots
     sdUUID, vmUUID, imgUUID, ancestor, successor, postZero)
   File "/usr/share/vdsm/storage/image.py", line 1101, in merge
     dstVol = vols[ancestor]
KeyError: '506085b6-40e0-4176-a4df-9102857f51f2'

I don't know why it happens, so you'll have to wait for someone else to answer.

----- Original Message -----
From: "Nicolas Ecarnot" <[email protected]>
To: "users" <[email protected]>
Sent: Monday, January 6, 2014 4:22:57 PM
Subject: Re: [Users] Unable to delete a snapshot

Le 06/01/2014 12:51, Nicolas Ecarnot a écrit :
Also, Please attach the whole vdsm.log, it's hard to read it this way
(lines are broken)

See attachment.

Actually, I don't know if this mailing list allows attachments ?

--
Nicolas Ecarnot
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users




--
Nicolas Ecarnot
_______________________________________________
Users mailing list
[email protected]
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to