[ovirt-users] Re: VM Snapshot inconsistent

Benny Zlotnik Tue, 21 Jul 2020 03:51:43 -0700

I forgot to add the `\x on` to make the output readable, can you run it
with:
$ psql -U engine -d engine -c "\x on" -c "<rest of the query...>"


On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind <arsene.gschw...@unibas.ch>
wrote:

> Hi,
>
> Please find the output:
>
> select * from images where image_group_id = 
> 'd7bd480d-2c51-4141-a386-113abf75219e';
>
>
>               image_guid              |     creation_date      |     size     
> |               it_guid                |               parentid               
> | imagestatus |        lastmodified        |            vm_snapshot_id        
>     | volume_type | volume_for
>
> mat |            image_group_id            |         _create_date          |  
>        _update_date          | active | volume_classification | qcow_compat
>
> --------------------------------------+------------------------+--------------+--------------------------------------+--------------------------------------+-------------+----------------------------+--------------------------------------+-------------+-----------
>
> ----+--------------------------------------+-------------------------------+-------------------------------+--------+-----------------------+-------------
>
>  8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 
> | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 
> |           1 | 2020-07-06 20:38:36.093+02 | 
> 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 |           2 |
>
>   4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 
> 2020-07-06 20:38:36.093788+02 | f      |                     1 |           2
>
>  6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 
> | 00000000-0000-0000-0000-000000000000 | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 
> |           1 | 1970-01-01 01:00:00+01     | 
> fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 |           2 |
>
>   4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 
> 2020-07-06 20:38:52.139003+02 | t      |                     0 |           2
>
> (2 rows)
>
>
>
> SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id 
> and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';
>
>              snapshot_id              |                vm_id                 
> | snapshot_type | status | description |       creation_date        |         
>                                                               app_list
>
>                                  | vm_configuration |         _create_date    
>       |         _update_date          | memory_metadata_disk_id | 
> memory_dump_disk_id | vm_configuration_broken
>
> --------------------------------------+--------------------------------------+---------------+--------+-------------+----------------------------+----------------------------------------------------------------------------------------------------------------------
>
> ---------------------------------+------------------+-------------------------------+-------------------------------+-------------------------+---------------------+-------------------------
>
>  fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba 
> | ACTIVE        | OK     | Active VM   | 2020-04-23 14:59:20.171+02 | 
> kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt
>
> -guest-agent-common-1.0.14-1.el7 |                  | 2020-04-23 
> 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 |                         
> |                     | f
>
> (1 row)
>
>
> Thanks,
> Arsene
>
> On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote:
>
> Sorry, I only replied to the question, in addition to removing the
>
> image from the images table, you may also need to set the parent as
>
> the active image and remove the snapshot referenced by this image from
>
> the database. Can you provide the output of:
>
> $ psql -U engine -d engine -c "select * from images where
>
> image_group_id = <disk_id>";
>
>
> As well as
>
> $ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i
>
> where i.vm_snapshot_id = s.snapshot_id and i.image_guid =
>
> '6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
>
>
> On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <
>
> bzlot...@redhat.com
>
> > wrote:
>
>
> It can be done by deleting from the images table:
>
> $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
>
> '6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
>
>
> of course the database should be backed up before doing this
>
>
>
>
> On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <
>
> nsof...@redhat.com
>
> > wrote:
>
>
> On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
>
> <
>
> arsene.gschw...@unibas.ch
>
> > wrote:
>
>
> It looks like the Pivot completed successfully, see attached vdsm.log.
>
> Is there a way to recover that VM?
>
> Or would it be better to recover the VM from Backup?
>
>
> This what we see in the log:
>
>
> 1. Merge request recevied
>
>
> 2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START
>
> merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
>
> u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
>
> u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
>
> u'00000002-0002-0002-0002-000000000289'},
>
> baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
>
> topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
>
> jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
>
> from=::ffff:10.34.38.31,39226,
>
> flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
>
> vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
>
>
> To track this job, we can use the jobUUID: 
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
>
> and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> 2. Starting the merge
>
>
> 2020-07-13 11:18:30,690+0200 INFO  (jsonrpc/7) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
>
> jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
>
> chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
>
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
>
> top=None, bandwidth=0, flags=12 (vm:5945)
>
>
> We see the original chain:
>
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
>
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
>
>
> 3. The merge was completed, ready for pivot
>
>
> 2020-07-13 11:19:00,992+0200 INFO  (libvirt/events) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
>
> for drive 
> /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> is ready (vm:5847)
>
>
> At this point parent volume contains all the data in top volume and we can 
> pivot
>
> to the parent volume.
>
>
> 4. Vdsm detect that the merge is ready, and start the clean thread
>
> that will complete the merge
>
>
> 2020-07-13 11:19:06,166+0200 INFO  (periodic/1) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
>
> for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
>
>
> 5. Requesting pivot to parent volume:
>
>
> 2020-07-13 11:19:06,717+0200 INFO  (merge/720410c3) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
>
> complete active layer commit (job
>
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
>
>
> 6. Pivot was successful
>
>
> 2020-07-13 11:19:06,734+0200 INFO  (libvirt/events) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
>
> for drive 
> /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> has completed (vm:5838)
>
>
> 7. Vdsm wait until libvirt updates the xml:
>
>
> 2020-07-13 11:19:06,756+0200 INFO  (merge/720410c3) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
>
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
>
>
> 8. Syncronizing vdsm metadata
>
>
> 2020-07-13 11:19:06,776+0200 INFO  (merge/720410c3) [vdsm.api] START
>
> imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
>
> imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
>
> volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
>
> newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
>
> task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
>
>
> We can see the new chain:
>
> ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
>
>
> 2020-07-13 11:19:07,005+0200 INFO  (merge/720410c3) [storage.Image]
>
> Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
>
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)  (image:1221)
>
>
> The old chain:
>
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
>
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
>
>
> 2020-07-13 11:19:07,006+0200 INFO  (merge/720410c3) [storage.Image]
>
> Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
>
> (image:1231)
>
> 2020-07-13 11:19:07,017+0200 INFO  (merge/720410c3) [storage.Image]
>
> Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
>
> the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
>
>
> This matches what we see on storage.
>
>
> 9. Merge job is untracked
>
>
> 2020-07-13 11:19:21,134+0200 INFO  (periodic/1) [virt.vm]
>
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread
>
> <vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350>
>
> successfully completed, untracking job
>
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
>
> (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
>
> top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
>
>
> This was a successful merge on vdsm side.
>
>
> We don't see any more requests for the top volume in this log. The next step 
> to
>
> complete the merge it to delete the volume 
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> but this can be done only on the SPM.
>
>
> To understand why this did not happen, we need engine log showing this
>
> interaction,
>
> and logs from the SPM host from the same time.
>
>
> Please file a bug about this and attach these logs (and the vdsm log
>
> you sent here).
>
> Fixing this vm is important but preventing this bug for other users is even 
> more
>
> important.
>
>
>
> How to fix the volume metadata:
>
>
> 1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
>
> Change:
>
> VOLTYPE=INTERNAL
>
>
> To:
>
> VOLTYPE=LEAF
>
>
> See attached file for reference.
>
>
> 2. Truncate the file to 512 bytes
>
>
> truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
>
> 3. Verify the file size
>
>
> $ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
> -rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17
>
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
>
> 4. Get the slot number for this volume from the LV using MD_N and
>
> compute the offset
>
>
> (copied from your pdf)
>
>
> lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e
>
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca
>
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a-
>
> 85ec-4c53-a5b8-dfb4d6d987b8
>
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca
>
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
> IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000-
>
> 0000-0000-0000-000000000000
>
>
> 5. Get the metadata from the slot to verify that we change the right metadata
>
>
> dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
>
> count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8-
>
> dfb4d6d987b8.meta.bad
>
>
> Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed
>
> file, the only change
>
> should be the VOLTYPE=LEAF line, and the amount of padding.
>
>
> 6. Write new metadata to storage
>
>
> dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
>
> count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync <
>
> /tmp/8e412b5a-85ec-4c53-a5b8-
>
> dfb4d6d987b8.meta.fixed
>
>
> 7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host
>
>
> WARNING: this must be done on the SPM host, otherwise you may corrupt
>
> the VG metadata.
>
>
> If you selected the wipe-after-delete option for this disk, you want
>
> to wipe it before
>
> deleting. If you selected the discard-after-delete you want to discard
>
> the lv before
>
> deleting it.
>
>
> Activate the lv on the SPM host:
>
>
>     lvchange -ay
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> If needed, wipe it:
>
>
>     blkdiscard --zeroout --step 32m
>
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> If needed, discard it:
>
>
>     blkdiscard 
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> Deactivate the lv:
>
>
>     lvchange -an
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> Remove the lv:
>
>
>        lvremove
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> 9. Fixing engine db
>
>
> Benny, Eyal: can you add instructions how to remove the volume on the
>
> engine side?
>
>
> After the volume is removed from engine side, starting the vm will succeed.
>
>
> --
>
> *Arsène Gschwind*
> Fa. Sapify AG im Auftrag der universitaet Basel
> IT Services
> Klinelbergstr. 70 | CH-4056 Basel | Switzerland
> Tel: +41 79 449 25 63 | http://its.unibas.ch
> ITS-ServiceDesk: support-...@unibas.ch | +41 61 267 14 11
>

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ELB3C6SUS5ZU3QKTQAD2WEABW4ZBO5HU/

[ovirt-users] Re: VM Snapshot inconsistent

Reply via email to