[ovirt-users] Re: how to remove a failed backup operation

Gianluca Cecchi Wed, 08 Sep 2021 01:56:28 -0700

On Sun, Sep 5, 2021 at 6:00 PM Pavel Bar <[email protected]> wrote:

> Hi,
> Please try the instructions below and update whether it helped.
>
> Thank you!
>
> Pavel
>
>
Thanks for input.
If I understand it correctly I have to complete the steps described by Nir
and then work at db level.


Right now what I see in the table is:

engine=# \x
Expanded display is on.
engine=# select * from vm_backups;
-[ RECORD 1 ]------+-------------------------------------
backup_id          | 68f83141-9d03-4cb0-84d4-e71fdd8753bb
from_checkpoint_id |
to_checkpoint_id   | d31e35b6-bd16-46d2-a053-eabb26d283f5
vm_id              | dc386237-1e98-40c8-9d3d-45658163d1e2
phase              | Finalizing
_create_date       | 2021-09-03 15:31:11.447+02
host_id            | cc241ec7-64fc-4c93-8cec-9e0e7005a49d

engine=#

see below my doubts...

On Sun, 5 Sept 2021 at 18:41, Nir Soffer <[email protected]> wrote:
>
>> On Sat, Sep 4, 2021 at 1:08 AM Gianluca Cecchi
>> <[email protected]> wrote:
>> ...
>> >>> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed:
>> [Error 28] No space left on device
>> >> This error is expected if you don't have space to write the data.
>> > ok.
>>
>> I forgot to mention that running backup on engine host is not recommended.
>> It is better to run the backup on the hypervisor, speeding up the data
>> copy.
>>
>
OK, I will take care of it, thanks.

>>> How can I clean the situation?
>> >>
>> >> 1. Stop the current backup
>>
>>

> >> If stopping the backup failed, stopping the VM will stop the backup.
>>
>
OK, I will try to fix it with the VM running if possible, before going and
stopping it.


>> > But if I try the stop command I get the error
>> >
>> > [g.cecchi@ovmgr1 ~]$ python3
>> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1
>> stop dc386237-1e98-40c8-9d3d-45658163d1e2
>> 68f83141-9d03-4cb0-84d4-e71fdd8753bb
>> > [   0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
>> > Traceback (most recent call last):
>> ...
>> > ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
>> "[Cannot stop VM backup. The VM backup is not in READY phase, backup phase
>> is FINALIZING. Please try again when the backup is in READY phase.]". HTTP
>> response code is 409.
>>
>> So your backup was already finalized, and it is stuck in "finalizing"
>> phase.
>>
>> Usually this means the backup on libvirt side was already stopped, but
>> engine
>> failed to detect this and failed to complete the finalize step
>> (ovirt-engine bug).
>>
>> You need to ensure if the backup was stopped on vdsm side.
>>
>> - If the vm was stopped, the bacukp is not running
>> - If the vm is running, we can make sure the backup is stopped using
>>
>>     vdsm-client VM stop_backup
>> vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
>> backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
>>
>
The VM is still running.
The host (I see it in its events with relation to backup errors) is ov200.
BTW: how can I see the mapping between host id and hostname (from the db
and/or api)?

[root@ov200 ~]# vdsm-client VM stop_backup
vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
{
    "code": 0,
    "message": "Done"
}
[root@ov200 ~]#


>> If this succeeds, the backup is not running on vdsm side.
>>
>
I preseum from the output above that the command succeeded, correct?

If this fails, you may need stop the VM to end the backup.
>>
>> If the backup was stopped, you may need to delete the scratch disks
>> used in this backup.
>> You can find the scratch disks ids in engine logs, and delete them
>> from engine UI.
>>
>
Any insight for finding the scratch disks ids in engine.log?
See here my engine.log and timestamp of backup (as seen in database above)
is 15:31 on 03 September:

https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing


>> Finally, after you cleaned up vdsm side, you can delete the backup
>> from engine database,
>> and unlock the disks.
>>
>> Pavel, can you provide instructions on how to clean up engine db after
>> stuck backup?
>>
>
> Can you please try manually updating the 'phase" of the problematic
> backup entry in the "vm_backups" DB table to 1 of the final phases, which
> are either "Succeeded" or "Failed"?
> This should allow creating a new backup.
> [image: image.png]
>
>
>>
>> After vdsm and engine were cleaned, new backup should work normally.
>>
>
OK, so I wait for Nir input about scratch disks removal and then I go with
changing the phase column for the backup.


>> >> 2. File a bug about this
>> > Filed this one, hope its is correct; I chose ovirt-imageio as the
>> product and Client as the component:
>>
>> In general backup bugs should be filed for ovirt-engine. ovirt-imageio
>> is rarely the
>> cause for a bug. We will move the bug to ovirt-imageio if needed.
>>
>> > https://bugzilla.redhat.com/show_bug.cgi?id=2001136
>>
>> Thanks!
>>
>> Nir
>>
>
ok.

Gianluca

_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/DYYI3VQABMRUSD2W2C2SY24CRUDDAAHX/

[ovirt-users] Re: how to remove a failed backup operation

Reply via email to