On Thu, Sep 9, 2021 at 12:53 PM Nir Soffer <[email protected]> wrote:
...
>> Any insight for finding the scratch disks ids in engine.log?
>> See here my engine.log and timestamp of backup (as seen in database above) 
>> is 15:31 on 03 September:
>>
>> https://drive.google.com/file/d/1Ao1CIA2wlFCqMMKeXbxKXrWZXUrnJN2h/view?usp=sharing
>
>
> To find the scratch disks the best way is to use the UI - open the storage > 
> disks tab
> and change the content type to "Backup scratch disks"
> (see attached screenshot)

Regardless, it is useful to understand engine log, here are the
relevant events in
your log:

$ grep 68f83141-9d03-4cb0-84d4-e71fdd8753bb engine.log
...

1. Backup started

2021-09-03 15:31:11,551+02 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-50) [b302cff2-fb05-4f10-9d02-aa03b10b10e1] EVENT_ID:
VM_BACKUP_STARTED(10,790), Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb
for VM c8server started (User: tekka@mydomain@mydomain).

2. Creating scratch disk for disk c8_data_c8server1

2021-09-03 15:31:12,550+02 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, CreateVolumeVDSCommand(
CreateVolumeVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51',
ignoreFailoverLimit='false',
storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570',
imageGroupId='a6ce101a-f7ce-4944-93a5-e71f32dd6c12',
imageSizeInBytes='21474836480', volumeFormat='COW',
newImageId='33aa1bac-4152-492d-8a4a-b6d6c0337fec', imageType='Sparse',
newImageDescription='{"DiskAlias":"VM c8server backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk for
c8_data_c8server1","DiskDescription":"Backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk"}',
imageInitialSizeInBytes='1073741824',
imageId='00000000-0000-0000-0000-000000000000',
sourceImageGroupId='00000000-0000-0000-0000-000000000000',
shouldAddBitmaps='false'}), log id: 164ff0c7

3. Creating scratch disk for disk c8_bootdisk_c8server1

2021-09-03 15:31:12,880+02 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.CreateVolumeVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-39)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] START, CreateVolumeVDSCommand(
CreateVolumeVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51',
ignoreFailoverLimit='false',
storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570',
imageGroupId='c9521211-8e24-46ae-aa2e-6f76503527dc',
imageSizeInBytes='21474836480', volumeFormat='COW',
newImageId='48244767-c8dc-4821-be21-935207068e69', imageType='Sparse',
newImageDescription='{"DiskAlias":"VM c8server backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk for
c8_bootdisk_c8server1","DiskDescription":"Backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb scratch disk"}',
imageInitialSizeInBytes='1073741824',
imageId='00000000-0000-0000-0000-000000000000',
sourceImageGroupId='00000000-0000-0000-0000-000000000000',
shouldAddBitmaps='false'}), log id: 367fe98d

We can grep for the scratch disk UUIDs:
- a6ce101a-f7ce-4944-93a5-e71f32dd6c12
- c9521211-8e24-46ae-aa2e-6f76503527dc

But let's first understand what happens to this backup...

4. Backup was started

2021-09-03 15:31:29,883+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.StartVmBackupVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-32)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] START,
StartVmBackupVDSCommand(HostName = ov200,
VmBackupVDSParameters:{hostId='cc241ec7-64fc-4c93-8cec-9e0e7005a49d',
backupId='68f83141-9d03-4cb0-84d4-e71fdd8753bb',
requireConsistency='false'}), log id: 154dbdc5
{96b0e701-7595-4f04-8569-fb1c72e6f8e0=nbd:unix:/run/vdsm/backup/68f83141-9d03-4cb0-84d4-e71fdd8753bb:exportname=sdb,
33b0f6fb-a855-465d-a628-5fce9b64496a=nbd:unix:/run/vdsm/backup/68f83141-9d03-4cb0-84d4-e71fdd8753bb:exportname=sda}<domaincheckpoint>
  <description>checkpoint for backup
&apos;68f83141-9d03-4cb0-84d4-e71fdd8753bb&apos;</description>

The next step is creating image transfer for downloading the disks.
Based on your mail:

[ 157.8 ] Image transfer 'ccc386d3-9f9d-4727-832a-56d355d60a95' is ready

We can follow the image transfer UUID ccc386d3-9f9d-4727-832a-56d355d60a95:

5. Creating image transfer

2021-09-03 15:33:46,892+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.TransferDiskImageCommand]
(default task-48) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Creating
ImageTransfer entity for command
'ccc386d3-9f9d-4727-832a-56d355d60a95', proxyEnabled: true

6. Image transfer is ready

2021-09-03 15:33:46,922+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater]
(default task-48) [a79ec359-6da7-4b21-a018-1a9360a2f7d8] Updating
image transfer ccc386d3-9f9d-4727-832a-56d355d60a95 (image
33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Transferring

The next step is finalizing this transfer, after data was downloaded,
or download
failed...

7. Image transfer finalized

2021-09-03 15:35:34,141+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater]
(default task-48) [cb1714ef-e36c-4984-aee8-3ca2185be724] Updating
image transfer ccc386d3-9f9d-4727-832a-56d355d60a95 (image
33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Finalizing Success

8. Image transfer marked as finished...

2021-09-03 15:35:42,337+02 INFO
[org.ovirt.engine.core.bll.storage.disk.image.ImageTransferUpdater]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-87)
[a79ec359-6da7-4b21-a018-1a9360a2f7d8] Updating image transfer
ccc386d3-9f9d-4727-832a-56d355d60a95 (image
33b0f6fb-a855-465d-a628-5fce9b64496a) phase to Finished Success

9. Backup is stopped

99-4fd7-8ab3-e2a0f4b13f06] Running command: StopVmBackupCommand
internal: false. Entities affected :  ID:
dc386237-1e98-40c8-9d3d-45658163d1e2 Type: VMAction group BACKUP_DISK
with role type ADMIN,  ID: 96b0e701-7595-4f04-8569-fb1c72e6f8e0 Type:
DiskAction group BACKUP_DISK with role type ADMIN,  ID:
33b0f6fb-a855-465d-a628-5fce9b64496a Type: DiskAction group
BACKUP_DISK with role type ADMIN

10. Stopping backup on the host

2021-09-03 15:35:43,364+02 INFO
[org.ovirt.engine.core.vdsbroker.vdsbroker.StopVmBackupVDSCommand]
(default task-48) [893250d4-f599-4fd7-8ab3-e2a0f4b13f06] FINISH,
StopVmBackupVDSCommand, return: , log id: 228ad88b

2021-09-03 15:35:43,374+02 INFO
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(default task-48) [893250d4-f599-4fd7-8ab3-e2a0f4b13f06] EVENT_ID:
VM_BACKUP_FINALIZED(10,794), Backup
68f83141-9d03-4cb0-84d4-e71fdd8753bb for VM c8server finalized (User:
tekka@mydomain@mydomain).

11. Error in the backup command - not sure why...

2021-09-03 15:35:44,390+02 INFO
[org.ovirt.engine.core.utils.transaction.TransactionSupport]
(EE-ManagedScheduledExecutorServ
ice-engineScheduledThreadPool-Thread-7)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] transaction rolled back
2021-09-03 15:35:44,390+02 ERROR
[org.ovirt.engine.core.bll.SerialChildCommandsExecutionCallback]
(EE-ManagedScheduledExecutor
Service-engineScheduledThreadPool-Thread-7)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] Command 'StartVmBackup' id:
'68f83141-9d03-
4cb0-84d4-e71fdd8753bb' with children
[c98edaa9-e3e9-48bc-a775-39d09f34f17c,
0767fbc6-d966-4d0d-bd96-33da00dd7797] failed when
 attempting to perform the next operation, marking as 'ACTIVE'
Template.java:1050)
        ... 30 more

12. Errors writing to database - no space left

2021-09-03 15:35:45,422+02 ERROR
[org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] EVENT_ID:
VM_BACKUP_FAILED(10,792), Backup 68f83141-9d03-4cb0-84d4-e71fdd8753bb
for VM c8server failed (User: tekka@mydomain@mydomain).
2021-09-03 15:35:45,422+02 ERROR
[org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-54)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] Failed invoking callback end
method 'onFailed' for command '68f83141-9d03-4cb0-84d4-e71fdd8753bb'
with exception 'CallableStatementCallback; SQL [{call
updatestatusofimagesbyimagegroupid(?, ?)}ERROR: could not extend file
"base/16385/1398189_fsm": No space left on device
  Hint: Check free disk space.
  Where: SQL statement "UPDATE images
    SET imageStatus = v_status
    WHERE image_group_id = v_image_group_id"
PL/pgSQL function updatestatusofimagesbyimagegroupid(uuid,integer)
line 3 at SQL statement; nested exception is
org.postgresql.util.PSQLException: ERROR: could not extend file
"base/16385/1398189_fsm": No space left on device
  Hint: Check free disk space.
  Where: SQL statement "UPDATE images
    SET imageStatus = v_status
    WHERE image_group_id = v_image_group_id"
PL/pgSQL function updatestatusofimagesbyimagegroupid(uuid,integer)
line 3 at SQL statement', the callback is marked for end method
retries

This seems to be the root cause for the engine failure - engine cannot
write to the
database, so it cannot complete handling of the backup command.

Back to the scratch disk, if we grep the UUIDs we see:

$ grep a6ce101a-f7ce-4944-93a5-e71f32dd6c12 engine.log
...

2021-09-03 15:36:19,625+02 INFO
[org.ovirt.engine.core.bll.storage.backup.RemoveScratchDisksCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] Teardown
'a6ce101a-f7ce-4944-93a5-e71f32dd6c12/33aa1bac-4152-492d-8a4a-b6d6c0337fec'
on the VM host 'cc241ec7-64fc-4c93-8cec-9e0e7005a49d'

2021-09-03 15:36:20,423+02 INFO
[org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[4dd3b3c7] Lock Acquired to object
'EngineLock:{exclusiveLocks='[a6ce101a-f7ce-4944-93a5-e71f32dd6c12=DISK]',
sharedLocks=''}'

2021-09-03 15:36:20,429+02 INFO
[org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[4dd3b3c7] Running command: RemoveDiskCommand internal: true. Entities
affected :  ID: a6ce101a-f7ce-4944-93a5-e71f32dd6c12 Type: DiskAction
group DELETE_DISK with role type USER

2021-09-03 15:36:20,483+02 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[4dd3b3c7] START, DeleteImageGroupVDSCommand(
DeleteImageGroupVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51',
ignoreFailoverLimit='false',
storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570',
imageGroupId='a6ce101a-f7ce-4944-93a5-e71f32dd6c12',
postZeros='false', discard='false', forceDelete='false'}), log id:
63ed36b8

2021-09-03 15:36:21,041+02 INFO
[org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[4dd3b3c7] Lock freed to object
'EngineLock:{exclusiveLocks='[a6ce101a-f7ce-4944-93a5-e71f32dd6c12=DISK]',
sharedLocks=''}'

$ grep c9521211-8e24-46ae-aa2e-6f76503527dc engine.log
...
2021-09-03 15:36:20,084+02 INFO
[org.ovirt.engine.core.bll.storage.backup.RemoveScratchDisksCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[b302cff2-fb05-4f10-9d02-aa03b10b10e1] Teardown
'c9521211-8e24-46ae-aa2e-6f76503527dc/48244767-c8dc-4821-be21-935207068e69'
on the VM host 'cc241ec7-64fc-4c93-8cec-9e0e7005a49d'

2021-09-03 15:36:21,043+02 INFO
[org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[79f71902] Lock Acquired to object
'EngineLock:{exclusiveLocks='[c9521211-8e24-46ae-aa2e-6f76503527dc=DISK]',
sharedLocks=''}'

2021-09-03 15:36:21,048+02 INFO
[org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[79f71902] Running command: RemoveDiskCommand internal: true. Entities
affected :  ID: c9521211-8e24-46ae-aa2e-6f76503527dc Type: DiskAction
group DELETE_DISK with role type USER

2021-09-03 15:36:21,067+02 INFO
[org.ovirt.engine.core.vdsbroker.irsbroker.DeleteImageGroupVDSCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[79f71902] START, DeleteImageGroupVDSCommand(
DeleteImageGroupVDSCommandParameters:{storagePoolId='ef17cad6-7724-4cd8-96e3-9af6e529db51',
ignoreFailoverLimit='false',
storageDomainId='1de66cb8-c9f6-49cd-8a35-d5da8adad570',
imageGroupId='c9521211-8e24-46ae-aa2e-6f76503527dc',
postZeros='false', discard='false', forceDelete='false'}), log id:
5670f62c

2021-09-03 15:36:21,518+02 INFO
[org.ovirt.engine.core.bll.storage.disk.RemoveDiskCommand]
(EE-ManagedScheduledExecutorService-engineScheduledThreadPool-Thread-70)
[79f71902] Lock freed to object
'EngineLock:{exclusiveLocks='[c9521211-8e24-46ae-aa2e-6f76503527dc=DISK]',
sharedLocks=''}'

So both scratch disks were removed as expected, and the only issue is the backup
stuck in the finalizing state.

Because the root cause is no space on the database disk, caused by user error
(filling up engine disk by mistake), I don't think we can do much about this.

Nir
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/E3XFPZQEAEBMUZREWRF2SFPL7CZZP256/

Reply via email to