[ovirt-users] Re: how to remove a failed backup operation

2021-09-09 Thread Gianluca Cecchi
On Thu, Sep 9, 2021 at 2:21 PM Nir Soffer  wrote:

> On Thu, Sep 9, 2021 at 12:53 PM Nir Soffer  wrote:
> ...
> >> Any insight for finding the scratch disks ids in engine.log?
> >> See here my engine.log and timestamp of backup (as seen in database
> above) is 15:31 on 03 September:
> >>
> >>
> https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing
> >
> >
> > To find the scratch disks the best way is to use the UI - open the
> storage > disks tab
> > and change the content type to "Backup scratch disks"
> > (see attached screenshot)
>

I confirm no scratch disks has been left in my case


> Regardless, it is useful to understand engine log, here are the
> relevant events in
> your log:
>
>
[snip]

11. Error in the backup command - not sure why...
>
> [snip]
>
> 12. Errors writing to database - no space left
>
>
>
[snip]


> This seems to be the root cause for the engine failure - engine cannot
> write to the
> database, so it cannot complete handling of the backup command.
>

[snip]


>
> So both scratch disks were removed as expected, and the only issue is the
> backup
> stuck in the finalizing state.
>
> Because the root cause is no space on the database disk, caused by user
> error
> (filling up engine disk by mistake), I don't think we can do much about
> this.
>
> Nir
>


Indeed I didn't recall my filesystem layout. The full was in my home dir
but as I have no dedicated /home filesystem, it generated a / filesystem
full and so impacting also with PostgreSQL database for the engine that
uses /var/lib/pgsql/data/base.
This goes and confirms your recommendation of not using the engine for
running the backup.

currently in fact the layout of filesystems of my external engine is :

[g.cecchi@ovmgr1 ~]$ df -h
Filesystem  Size  Used Avail Use% Mounted on
devtmpfs4.9G 0  4.9G   0% /dev
tmpfs   4.9G   24K  4.9G   1% /dev/shm
tmpfs   4.9G   25M  4.9G   1% /run
tmpfs   4.9G 0  4.9G   0% /sys/fs/cgroup
/dev/mapper/cl-root  43G  5.1G   36G  13% /
/dev/sda2   976M  199M  710M  22% /boot
/dev/sda1   599M  7.3M  592M   2% /boot/efi
tmpfs   998M 0  998M   0% /run/user/1000
[g.cecchi@ovmgr1 ~]$

Thanks very much for the detailed analysis.

Ok also for the closing of the bugzilla.

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/3F2MU7CM2LXSYREAGXRX6OMZZBF2B5M6/


[ovirt-users] Re: how to remove a failed backup operation

2021-09-09 Thread Nir Soffer
On Thu, Sep 9, 2021 at 12:53 PM Nir Soffer  wrote:
...
>> Any insight for finding the scratch disks ids in engine.log?
>> See here my engine.log and timestamp of backup (as seen in database above) 
>> is 15:31 on 03 September:
>>
>> https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing
>
>
> To find the scratch disks the best way is to use the UI - open the storage > 
> disks tab
> and change the content type to "Backup scratch disks"
> (see attached screenshot)

Regardless, it is useful to understand engine log, here are the
relevant events in
your log:

$ grep 68f83141-9d03-4cb0-84d4-e71fdd8753bb engine.log
...

1. Backup started

2021-09-03 15:31:11,551+02 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-50) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] EVENT_ID:
VM_BACKUP_STARTED(10,790), Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb
for VM c8server started (User: tekka@mydomain@mydomain).

2. Creating scratch disk for disk c8_data_c8server1

2021-09-03 15:31:12,550+02 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, CreateVolumeVDSCommand(
CreateVolumeVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51',
ignoreFailoverLimit='false',
storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570',
imageGroupId='a6ce101a-f7ce-4944-93a5-e71f32dd6c12',
imageSizeInBytes='21474836480', volumeFormat='COW',
newImageId='33aa1bac-4152-492d-8a4a-b6d6c0337fec', imageType='Sparse',
newImageDescription='{"DiskAlias":"VM c8server backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk for
c8_data_c8server1","DiskDescription":"Backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk"}',
imageInitialSizeInBytes='1073741824',
imageId='----',
sourceImageGroupId='----',
shouldAddBitmaps='false'}), log id: 164ff0c7

3. Creating scratch disk for disk c8_bootdisk_c8server1

2021-09-03 15:31:12,880+02 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, CreateVolumeVDSCommand(
CreateVolumeVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51',
ignoreFailoverLimit='false',
storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570',
imageGroupId='c9521211-8e24-46ae-aa2e-6f76503527dc',
imageSizeInBytes='21474836480', volumeFormat='COW',
newImageId='48244767-c8dc-4821-be21-935207068e69', imageType='Sparse',
newImageDescription='{"DiskAlias":"VM c8server backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk for
c8_bootdisk_c8server1","DiskDescription":"Backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk"}',
imageInitialSizeInBytes='1073741824',
imageId='----',
sourceImageGroupId='----',
shouldAddBitmaps='false'}), log id: 367fe98d

We can grep for the scratch disk UUIDs:
- a6ce101a-f7ce-4944-93a5-e71f32dd6c12
- c9521211-8e24-46ae-aa2e-6f76503527dc

But let's first understand what happens to this backup...

4. Backup was started

2021-09-03 15:31:29,883+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.StartVmBackupVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] START,
StartVmBackupVDSCommand(HostName = ov200,
VmBackupVDSParameters:{hostId='cc241ec7-64fc-4c93-8cec-9e0e7005a49d',
backupId='68f83141-9d03-4cb0-84d4-e71fdd8753bb',
requireConsistency='false'}), log id: 154dbdc5
{96b0e701-7595-4f04-8569-fb1c72e6f8e0=nbd:unix:/run/vdsm/backup/68f83141-9d03-4cb0-84d4-e71fdd8753bb:exportname=sdb,
33b0f6fb-a855-465d-a628-5fce9b64496a=nbd:unix:/run/vdsm/backup/68f83141-9d03-4cb0-84d4-e71fdd8753bb:exportname=sda}
  checkpoint for backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb

The next step is creating image transfer for downloading the disks.
Based on your mail:

[ 157.8 ] Image transfer 'ccc386d3-9f9d-4727-832a-56d355d60a95' is ready

We can follow the image transfer UUID ccc386d3-9f9d-4727-832a-56d355d60a95:

5. Creating image transfer

2021-09-03 15:33:46,892+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
(default task-48) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Creating
ImageTransfer entity for command
'ccc386d3-9f9d-4727-832a-56d355d60a95', proxyEnabled: true

6. Image transfer is ready

2021-09-03 15:33:46,922+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater]
(default task-48) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Updating
image transfer ccc386d3-9f9d-4727-832a-56d355d60a95 (image
33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Transferring

The next step is finalizing this transfer, after data was downloaded,
or download
failed...

7. Image transfer finalized

2021-09-03 15:35:34,141+02 INFO

[ovirt-users] Re: how to remove a failed backup operation

2021-09-09 Thread Nir Soffer
On Wed, Sep 8, 2021 at 11:52 AM Gianluca Cecchi 
wrote:
...

> Right now what I see in the table is:
>
> engine=# \x
> Expanded display is on.
>

Nice! I did know about that


> engine=# select * from vm_backups;
> -[ RECORD 1 ]--+-
> backup_id  | 68f83141-9d03-4cb0-84d4-e71fdd8753bb
> from_checkpoint_id |
> to_checkpoint_id   | d31e35b6-bd16-46d2-a053-eabb26d283f5
> vm_id  | dc386237-1e98-40c8-9d3d-45658163d1e2
> phase  | Finalizing
>

In current code, this means that VM.stop_backup call was successful
when you asked to finalize the backup.


> _create_date   | 2021-09-03 15:31:11.447+02
> host_id| cc241ec7-64fc-4c93-8cec-9e0e7005a49d
>
> engine=#
>
> see below my doubts...
>
...

> The VM is still running.
> The host (I see it in its events with relation to backup errors) is ov200.
> BTW: how can I see the mapping between host id and hostname (from the db
> and/or api)?
>
> [root@ov200 ~]# vdsm-client VM stop_backup
> vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
> backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
> {
> "code": 0,
> "message": "Done"
> }
> [root@ov200 ~]#
>
>
>>> If this succeeds, the backup is not running on vdsm side.
>>>
>>
> I preseum from the output above that the command succeeded, correct?
>

Yes, this is how a successful command looks like. If the command fails you
will
get a non-zero code and the message will explain the failure.

If this fails, you may need stop the VM to end the backup.
>>>
>>> If the backup was stopped, you may need to delete the scratch disks
>>> used in this backup.
>>> You can find the scratch disks ids in engine logs, and delete them
>>> from engine UI.
>>>
>>
> Any insight for finding the scratch disks ids in engine.log?
> See here my engine.log and timestamp of backup (as seen in database above)
> is 15:31 on 03 September:
>
>
> https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing
>

To find the scratch disks the best way is to use the UI - open the storage
> disks tab
and change the content type to "Backup scratch disks"
(see attached screenshot)

The description and comment of the scratch disk should be enough to
detect stale scratch disks that failed to be removed after a backup.

You should be able to delete the disks from the UI/API.


>>> Finally, after you cleaned up vdsm side, you can delete the backup
>>> from engine database,
>>> and unlock the disks.
>>>
>>> Pavel, can you provide instructions on how to clean up engine db after
>>> stuck backup?
>>>
>>
>> Can you please try manually updating the 'phase" of the problematic
>> backup entry in the "vm_backups" DB table to 1 of the final phases,
>> which are either "Succeeded" or "Failed"?
>> This should allow creating a new backup.
>> [image: image.png]
>>
>>
>>>
>>> After vdsm and engine were cleaned, new backup should work normally.
>>>
>>
> OK, so I wait for Nir input about scratch disks removal and then I go with
> changing the phase column for the backup.
>

Once you stop the backup on vdsm side, you can fix the backup phase in the
database.
You don't need to delete the scratch disks before that, they can be deleted
later.

Backup stuck in the finalizing phase blocks future backups of the VM.
Scratch disks only
take logical space in your storage domain, and some physical space in your
storage.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/TBIGXOBHK2DN5Y6EJTYIAPWG3MTCCQP2/


[ovirt-users] Re: how to remove a failed backup operation

2021-09-08 Thread Gianluca Cecchi
On Sun, Sep 5, 2021 at 6:00 PM Pavel Bar  wrote:

> Hi,
> Please try the instructions below and update whether it helped.
>
> Thank you!
>
> Pavel
>
>
Thanks for input.
If I understand it correctly I have to complete the steps described by Nir
and then work at db level.

Right now what I see in the table is:

engine=# \x
Expanded display is on.
engine=# select * from vm_backups;
-[ RECORD 1 ]--+-
backup_id  | 68f83141-9d03-4cb0-84d4-e71fdd8753bb
from_checkpoint_id |
to_checkpoint_id   | d31e35b6-bd16-46d2-a053-eabb26d283f5
vm_id  | dc386237-1e98-40c8-9d3d-45658163d1e2
phase  | Finalizing
_create_date   | 2021-09-03 15:31:11.447+02
host_id| cc241ec7-64fc-4c93-8cec-9e0e7005a49d

engine=#

see below my doubts...

On Sun, 5 Sept 2021 at 18:41, Nir Soffer  wrote:
>
>> On Sat, Sep 4, 2021 at 1:08 AM Gianluca Cecchi
>>  wrote:
>> ...
>> >>> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed:
>> [Error 28] No space left on device
>> >> This error is expected if you don't have space to write the data.
>> > ok.
>>
>> I forgot to mention that running backup on engine host is not recommended.
>> It is better to run the backup on the hypervisor, speeding up the data
>> copy.
>>
>
OK, I will take care of it, thanks.

>>> How can I clean the situation?
>> >>
>> >> 1. Stop the current backup
>>
>>

> >> If stopping the backup failed, stopping the VM will stop the backup.
>>
>
OK, I will try to fix it with the VM running if possible, before going and
stopping it.


>> > But if I try the stop command I get the error
>> >
>> > [g.cecchi@ovmgr1 ~]$ python3
>> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1
>> stop dc386237-1e98-40c8-9d3d-45658163d1e2
>> 68f83141-9d03-4cb0-84d4-e71fdd8753bb
>> > [   0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
>> > Traceback (most recent call last):
>> ...
>> > ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
>> "[Cannot stop VM backup. The VM backup is not in READY phase, backup phase
>> is FINALIZING. Please try again when the backup is in READY phase.]". HTTP
>> response code is 409.
>>
>> So your backup was already finalized, and it is stuck in "finalizing"
>> phase.
>>
>> Usually this means the backup on libvirt side was already stopped, but
>> engine
>> failed to detect this and failed to complete the finalize step
>> (ovirt-engine bug).
>>
>> You need to ensure if the backup was stopped on vdsm side.
>>
>> - If the vm was stopped, the bacukp is not running
>> - If the vm is running, we can make sure the backup is stopped using
>>
>> vdsm-client VM stop_backup
>> vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
>> backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
>>
>
The VM is still running.
The host (I see it in its events with relation to backup errors) is ov200.
BTW: how can I see the mapping between host id and hostname (from the db
and/or api)?

[root@ov200 ~]# vdsm-client VM stop_backup
vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
{
"code": 0,
"message": "Done"
}
[root@ov200 ~]#


>> If this succeeds, the backup is not running on vdsm side.
>>
>
I preseum from the output above that the command succeeded, correct?

If this fails, you may need stop the VM to end the backup.
>>
>> If the backup was stopped, you may need to delete the scratch disks
>> used in this backup.
>> You can find the scratch disks ids in engine logs, and delete them
>> from engine UI.
>>
>
Any insight for finding the scratch disks ids in engine.log?
See here my engine.log and timestamp of backup (as seen in database above)
is 15:31 on 03 September:

https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing


>> Finally, after you cleaned up vdsm side, you can delete the backup
>> from engine database,
>> and unlock the disks.
>>
>> Pavel, can you provide instructions on how to clean up engine db after
>> stuck backup?
>>
>
> Can you please try manually updating the 'phase" of the problematic
> backup entry in the "vm_backups" DB table to 1 of the final phases, which
> are either "Succeeded" or "Failed"?
> This should allow creating a new backup.
> [image: image.png]
>
>
>>
>> After vdsm and engine were cleaned, new backup should work normally.
>>
>
OK, so I wait for Nir input about scratch disks removal and then I go with
changing the phase column for the backup.


>> >> 2. File a bug about this
>> > Filed this one, hope its is correct; I chose ovirt-imageio as the
>> product and Client as the component:
>>
>> In general backup bugs should be filed for ovirt-engine. ovirt-imageio
>> is rarely the
>> cause for a bug. We will move the bug to ovirt-imageio if needed.
>>
>> > https://bugzilla.redhat.com/show_bug.cgi?id=2001136
>>
>> Thanks!
>>
>> Nir
>>
>
ok.

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email 

[ovirt-users] Re: how to remove a failed backup operation

2021-09-05 Thread Pavel Bar
Hi,
Please try the instructions below and update whether it helped.

Thank you!

Pavel

On Sun, 5 Sept 2021 at 18:41, Nir Soffer  wrote:

> On Sat, Sep 4, 2021 at 1:08 AM Gianluca Cecchi
>  wrote:
> ...
> >>> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed: [Error
> 28] No space left on device
> >> This error is expected if you don't have space to write the data.
> > ok.
>
> I forgot to mention that running backup on engine host is not recommended.
> It is better to run the backup on the hypervisor, speeding up the data
> copy.
>
> You can mount the backup directory on the hypervisor (e.g. nfs) and
> use --backup-dir
> to store the backup where it should be.
>
> >>> Now if I try the same backup command (so with "full" option) and I get
> >>>
> >>> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
> "[Cannot backup VM. The VM is during a backup operation.]". HTTP response
> code is 409.
> >> This looks like a bug in the backup script - the backup should be
> finalized
> >> even if the image transfer failed, but the error you get say the vm is
> still
> >> in backup mode.
> >>
> >>> How can I clean the situation?
> >>
> >> 1. Stop the current backup
> >>
> >> If you still have the output from the command, we log the backup UUID.
> >>
> >> If you lost the backup id, you can get it using the API - visit this
> address in your browser:
> >>
> >> https://myengine/ovirt-engine/api/vms/{vm-id}/backups/
> >>
> >> Then stop the current backup using:
> >>
> >> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py stop
> vm-id backup-id
> >>
> >> If stopping the backup failed, stopping the VM will stop the backup.
> >> I hope you are running recent enough version, since in early versions
> there
> >> was a bug when you cannot stop the vm during a backup.
> >
> > It is the latest 4.4.7. I run the backup_vm.py script from the engine:
> >
> > ovirt-engine-4.4.7.7-1.el8.noarch
> > ovirt-engine-setup-plugin-imageio-4.4.7.7-1.el8.noarch
> > ovirt-imageio-common-2.2.0-1.el8.x86_64
> > ovirt-imageio-client-2.2.0-1.el8.x86_64
> > ovirt-imageio-daemon-2.2.0-1.el8.x86_64
> > python3-ovirt-engine-sdk4-4.4.13-1.el8.x86_64
>
> Looks good.
>
> > But if I try the stop command I get the error
> >
> > [g.cecchi@ovmgr1 ~]$ python3
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1
> stop dc386237-1e98-40c8-9d3d-45658163d1e2
> 68f83141-9d03-4cb0-84d4-e71fdd8753bb
> > [   0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
> > Traceback (most recent call last):
> ...
> > ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
> "[Cannot stop VM backup. The VM backup is not in READY phase, backup phase
> is FINALIZING. Please try again when the backup is in READY phase.]". HTTP
> response code is 409.
>
> So your backup was already finalized, and it is stuck in "finalizing"
> phase.
>
> Usually this means the backup on libvirt side was already stopped, but
> engine
> failed to detect this and failed to complete the finalize step
> (ovirt-engine bug).
>
> You need to ensure if the backup was stopped on vdsm side.
>
> - If the vm was stopped, the bacukp is not running
> - If the vm is running, we can make sure the backup is stopped using
>
> vdsm-client VM stop_backup
> vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
> backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb
>
> If this succeeds, the backup is not running on vdsm side.
> If this fails, you may need stop the VM to end the backup.
>
> If the backup was stopped, you may need to delete the scratch disks
> used in this backup.
> You can find the scratch disks ids in engine logs, and delete them
> from engine UI.
>
> Finally, after you cleaned up vdsm side, you can delete the backup
> from engine database,
> and unlock the disks.
>
> Pavel, can you provide instructions on how to clean up engine db after
> stuck backup?
>

Can you please try manually updating the 'phase" of the problematic backup
entry in the "vm_backups" DB table to 1 of the final phases, which are
either "Succeeded" or "Failed"?
This should allow creating a new backup.
[image: image.png]


>
> After vdsm and engine were cleaned, new backup should work normally.
>
> >> 2. File a bug about this
> > Filed this one, hope its is correct; I chose ovirt-imageio as the
> product and Client as the component:
>
> In general backup bugs should be filed for ovirt-engine. ovirt-imageio
> is rarely the
> cause for a bug. We will move the bug to ovirt-imageio if needed.
>
> > https://bugzilla.redhat.com/show_bug.cgi?id=2001136
>
> Thanks!
>
> Nir
>
>
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/KRYTLD3RLUETGTOADFO2JA6CV77B3ZE5/


[ovirt-users] Re: how to remove a failed backup operation

2021-09-05 Thread Nir Soffer
On Sat, Sep 4, 2021 at 1:08 AM Gianluca Cecchi
 wrote:
...
>>> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed: [Error 28] 
>>> No space left on device
>> This error is expected if you don't have space to write the data.
> ok.

I forgot to mention that running backup on engine host is not recommended.
It is better to run the backup on the hypervisor, speeding up the data copy.

You can mount the backup directory on the hypervisor (e.g. nfs) and
use --backup-dir
to store the backup where it should be.

>>> Now if I try the same backup command (so with "full" option) and I get
>>>
>>> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is 
>>> "[Cannot backup VM. The VM is during a backup operation.]". HTTP response 
>>> code is 409.
>> This looks like a bug in the backup script - the backup should be finalized
>> even if the image transfer failed, but the error you get say the vm is still
>> in backup mode.
>>
>>> How can I clean the situation?
>>
>> 1. Stop the current backup
>>
>> If you still have the output from the command, we log the backup UUID.
>>
>> If you lost the backup id, you can get it using the API - visit this address 
>> in your browser:
>>
>> https://myengine/ovirt-engine/api/vms/{vm-id}/backups/
>>
>> Then stop the current backup using:
>>
>> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py stop 
>> vm-id backup-id
>>
>> If stopping the backup failed, stopping the VM will stop the backup.
>> I hope you are running recent enough version, since in early versions there
>> was a bug when you cannot stop the vm during a backup.
>
> It is the latest 4.4.7. I run the backup_vm.py script from the engine:
>
> ovirt-engine-4.4.7.7-1.el8.noarch
> ovirt-engine-setup-plugin-imageio-4.4.7.7-1.el8.noarch
> ovirt-imageio-common-2.2.0-1.el8.x86_64
> ovirt-imageio-client-2.2.0-1.el8.x86_64
> ovirt-imageio-daemon-2.2.0-1.el8.x86_64
> python3-ovirt-engine-sdk4-4.4.13-1.el8.x86_64

Looks good.

> But if I try the stop command I get the error
>
> [g.cecchi@ovmgr1 ~]$ python3 
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1 stop 
> dc386237-1e98-40c8-9d3d-45658163d1e2 68f83141-9d03-4cb0-84d4-e71fdd8753bb
> [   0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
> Traceback (most recent call last):
...
> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot 
> stop VM backup. The VM backup is not in READY phase, backup phase is 
> FINALIZING. Please try again when the backup is in READY phase.]". HTTP 
> response code is 409.

So your backup was already finalized, and it is stuck in "finalizing" phase.

Usually this means the backup on libvirt side was already stopped, but engine
failed to detect this and failed to complete the finalize step
(ovirt-engine bug).

You need to ensure if the backup was stopped on vdsm side.

- If the vm was stopped, the bacukp is not running
- If the vm is running, we can make sure the backup is stopped using

vdsm-client VM stop_backup
vmID=dc386237-1e98-40c8-9d3d-45658163d1e2
backup_id=68f83141-9d03-4cb0-84d4-e71fdd8753bb

If this succeeds, the backup is not running on vdsm side.
If this fails, you may need stop the VM to end the backup.

If the backup was stopped, you may need to delete the scratch disks
used in this backup.
You can find the scratch disks ids in engine logs, and delete them
from engine UI.

Finally, after you cleaned up vdsm side, you can delete the backup
from engine database,
and unlock the disks.

Pavel, can you provide instructions on how to clean up engine db after
stuck backup?

After vdsm and engine were cleaned, new backup should work normally.

>> 2. File a bug about this
> Filed this one, hope its is correct; I chose ovirt-imageio as the product and 
> Client as the component:

In general backup bugs should be filed for ovirt-engine. ovirt-imageio
is rarely the
cause for a bug. We will move the bug to ovirt-imageio if needed.

> https://bugzilla.redhat.com/show_bug.cgi?id=2001136

Thanks!

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LVI36HQIAM4CIC5YNKBZ5JX5RIZSCDY3/


[ovirt-users] Re: how to remove a failed backup operation

2021-09-03 Thread Gianluca Cecchi
On Fri, Sep 3, 2021 at 9:35 PM Nir Soffer  wrote:

> On Fri, Sep 3, 2021 at 4:45 PM Gianluca Cecchi 
> wrote:
>
>> Hello,
>> I was trying incremental backup with the provided
>> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py and began
>> using the "full" option.
>> But I specified an incorrect dir and during backup I got error due to
>> filesystem full
>>
>> [ 156.7 ] Creating image transfer for disk
>> '33b0f6fb-a855-465d-a628-5fce9b64496a'
>>
>> [snip]

> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed: [Error 28]
>> No space left on device
>>
>
> This error is expected if you don't have space to write the data.
>

ok.


>
>>
>> Now if I try the same backup command (so with "full" option) and I get
>>
>> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
>> "[Cannot backup VM. The VM is during a backup operation.]". HTTP response
>> code is 409.
>>
>
> This looks like a bug in the backup script - the backup should be finalized
> even if the image transfer failed, but the error you get say the vm is
> still
> in backup mode.
>
>
>>
>> How can I clean the situation?
>>
>
> 1. Stop the current backup
>
> If you still have the output from the command, we log the backup UUID.
>
> If you lost the backup id, you can get it using the API - visit this
> address in your browser:
>
> https://myengine/ovirt-engine/api/vms/{vm-id}/backups/
>
> Then stop the current backup using:
>
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py stop
> vm-id backup-id
>
> If stopping the backup failed, stopping the VM will stop the backup.
> I hope you are running recent enough version, since in early versions there
> was a bug when you cannot stop the vm during a backup.
>

It is the latest 4.4.7. I run the backup_vm.py script from the engine:

ovirt-engine-4.4.7.7-1.el8.noarch
ovirt-engine-setup-plugin-imageio-4.4.7.7-1.el8.noarch
ovirt-imageio-common-2.2.0-1.el8.x86_64
ovirt-imageio-client-2.2.0-1.el8.x86_64
ovirt-imageio-daemon-2.2.0-1.el8.x86_64
python3-ovirt-engine-sdk4-4.4.13-1.el8.x86_64

But if I try the stop command I get the error

[g.cecchi@ovmgr1 ~]$ python3
/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py -c ovmgr1
stop dc386237-1e98-40c8-9d3d-45658163d1e2
68f83141-9d03-4cb0-84d4-e71fdd8753bb
[   0.0 ] Finalizing backup '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
Traceback (most recent call last):
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py",
line 493, in 
main()
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py",
line 176, in main
args.command(args)
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py",
line 262, in cmd_stop
stop_backup(connection, args.backup_uuid, args)
  File "/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py",
line 345, in stop_backup
backup_service.finalize()
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line
33869, in finalize
return self._internal_action(action, 'finalize', None, headers, query,
wait)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 299,
in _internal_action
return future.wait() if wait else future
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55,
in wait
return self._code(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 296,
in callback
self._check_fault(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132,
in _check_fault
self._raise_error(response, body)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118,
in _raise_error
raise error
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
"[Cannot stop VM backup. The VM backup is not in READY phase, backup phase
is FINALIZING. Please try again when the backup is in READY phase.]". HTTP
response code is 409.
[g.cecchi@ovmgr1 ~]$



>
> 2. File a bug about this
>


Filed this one, hope its is correct; I chose ovirt-imageio as the product
and Client as the component:

https://bugzilla.redhat.com/show_bug.cgi?id=2001136

I put information also about the error received with the stop command



>
>
>>
>> BTW: the parameter to put into ovirt.conf is backup-dir or backup_dir or
>> what?
>>
>
> ovirt.conf do not include the backup dir, only details about engine.
> Adding backup-dir
> to ovirt.conf or to backup specific configuration sounds like a good idea.
>
> Nir
>

I agree

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/OESNSO7MWVWZR2MS374ATPGYQRM2AXC3/


[ovirt-users] Re: how to remove a failed backup operation

2021-09-03 Thread Nir Soffer
On Fri, Sep 3, 2021 at 4:45 PM Gianluca Cecchi 
wrote:

> Hello,
> I was trying incremental backup with the provided
> /usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py and began
> using the "full" option.
> But I specified an incorrect dir and during backup I got error due to
> filesystem full
>
> [ 156.7 ] Creating image transfer for disk
> '33b0f6fb-a855-465d-a628-5fce9b64496a'
> [ 157.8 ] Image transfer 'ccc386d3-9f9d-4727-832a-56d355d60a95' is ready
> --- Logging error ---, 105.02 seconds, 147.48 MiB/s
>
> Traceback (most recent call last):
>   File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py",
> line 242, in _run
> handler.copy(req)
>   File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py",
> line 286, in copy
> self._src.write_to(self._dst, req.length, self._buf)
>   File
> "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py",
> line 216, in write_to
> writer.write(view[:n])
>   File
> "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py",
> line 118, in write
> self._client.write(self._position, buf)
>   File
> "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line
> 445, in write
> self._recv_reply(cmd)
>   File
> "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line
> 980, in _recv_reply
> if self._recv_reply_chunk(cmd):
>   File
> "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line
> 1031, in _recv_reply_chunk
> self._handle_error_chunk(length, flags)
>   File
> "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", line
> 1144, in _handle_error_chunk
> raise ReplyError(code, message)
> ovirt_imageio._internal.nbd.ReplyError: Writing to file failed: [Error 28]
> No space left on device
>

This error is expected if you don't have space to write the data.


>
> Now if I try the same backup command (so with "full" option) and I get
>
> ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is
> "[Cannot backup VM. The VM is during a backup operation.]". HTTP response
> code is 409.
>

This looks like a bug in the backup script - the backup should be finalized
even if the image transfer failed, but the error you get say the vm is still
in backup mode.


>
> How can I clean the situation?
>

1. Stop the current backup

If you still have the output from the command, we log the backup UUID.

If you lost the backup id, you can get it using the API - visit this
address in your browser:

https://myengine/ovirt-engine/api/vms/{vm-id}/backups/

Then stop the current backup using:

/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py stop
vm-id backup-id

If stopping the backup failed, stopping the VM will stop the backup.
I hope you are running recent enough version, since in early versions there
was a bug when you cannot stop the vm during a backup.

2. File a bug about this


>
> BTW: the parameter to put into ovirt.conf is backup-dir or backup_dir or
> what?
>

ovirt.conf do not include the backup dir, only details about engine. Adding
backup-dir
to ovirt.conf or to backup specific configuration sounds like a good idea.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/HSUZOKSHCEWQAHUOAQ6EVUHWACTGANH7/


[ovirt-users] Re: how to remove a failed backup operation

2021-09-03 Thread Strahil Nikolov via Users
This looks like a bug. It should have 'recovered' from the failure.
I'm not sure which logs would help identify the root cause.

Best Regards,Strahil Nikolov
 
 
  On Fri, Sep 3, 2021 at 16:45, Gianluca Cecchi 
wrote:   Hello,I was trying incremental backup with the provided 
/usr/share/doc/python3-ovirt-engine-sdk4/examples/backup_vm.py and began using 
the "full" option.But I specified an incorrect dir and during backup I got 
error due to filesystem full
[ 156.7 ] Creating image transfer for disk 
'33b0f6fb-a855-465d-a628-5fce9b64496a'
[ 157.8 ] Image transfer 'ccc386d3-9f9d-4727-832a-56d355d60a95' is ready
--- Logging error ---, 105.02 seconds, 147.48 MiB/s                            
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 
242, in _run
    handler.copy(req)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/io.py", line 
286, in copy
    self._src.write_to(self._dst, req.length, self._buf)
  File 
"/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/http.py", 
line 216, in write_to
    writer.write(view[:n])
  File 
"/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/backends/nbd.py", 
line 118, in write
    self._client.write(self._position, buf)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", 
line 445, in write
    self._recv_reply(cmd)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", 
line 980, in _recv_reply
    if self._recv_reply_chunk(cmd):
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", 
line 1031, in _recv_reply_chunk
    self._handle_error_chunk(length, flags)
  File "/usr/lib64/python3.6/site-packages/ovirt_imageio/_internal/nbd.py", 
line 1144, in _handle_error_chunk
    raise ReplyError(code, message)
ovirt_imageio._internal.nbd.ReplyError: Writing to file failed: [Error 28] No 
space left on device
Now if I try the same backup command (so with "full" option) and I get 

ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot 
backup VM. The VM is during a backup operation.]". HTTP response code is 409.
How can I clean the situation?
BTW: the parameter to put into ovirt.conf is backup-dir or backup_dir or what?
Thanks,Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/6OZ7ZNH5GSNFCHDSDOPBNVXMN7WLWUXC/
  
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/VIVHL2UBPXM43SU7CQCFFV2O2IO73UTQ/