On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
<arsene.gschw...@unibas.ch> wrote:

> It looks like the Pivot completed successfully, see attached vdsm.log.
> Is there a way to recover that VM?
> Or would it be better to recover the VM from Backup?

This what we see in the log:

1. Merge request recevied

2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START
merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
u'00000002-0002-0002-0002-000000000289'},
baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
from=::ffff:10.34.38.31,39226,
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)

To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b

2. Starting the merge

2020-07-13 11:18:30,690+0200 INFO  (jsonrpc/7) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
top=None, bandwidth=0, flags=12 (vm:5945)

We see the original chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)

3. The merge was completed, ready for pivot

2020-07-13 11:19:00,992+0200 INFO  (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive 
/rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
is ready (vm:5847)

At this point parent volume contains all the data in top volume and we can pivot
to the parent volume.

4. Vdsm detect that the merge is ready, and start the clean thread
that will complete the merge

2020-07-13 11:19:06,166+0200 INFO  (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)

5. Requesting pivot to parent volume:

2020-07-13 11:19:06,717+0200 INFO  (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
complete active layer commit (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)

6. Pivot was successful

2020-07-13 11:19:06,734+0200 INFO  (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive 
/rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
has completed (vm:5838)

7. Vdsm wait until libvirt updates the xml:

2020-07-13 11:19:06,756+0200 INFO  (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)

8. Syncronizing vdsm metadata

2020-07-13 11:19:06,776+0200 INFO  (merge/720410c3) [vdsm.api] START
imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)

We can see the new chain:
['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']

2020-07-13 11:19:07,005+0200 INFO  (merge/720410c3) [storage.Image]
Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)  (image:1221)

The old chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)

2020-07-13 11:19:07,006+0200 INFO  (merge/720410c3) [storage.Image]
Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
(image:1231)
2020-07-13 11:19:07,017+0200 INFO  (merge/720410c3) [storage.Image]
Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
the chain. Marking it ILLEGAL to prevent data corruption (image:1239)

This matches what we see on storage.

9. Merge job is untracked

2020-07-13 11:19:21,134+0200 INFO  (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread
<vdsm.virt.vm.LiveMergeCleanupThread object at 0x7fa1e0370350>
successfully completed, untracking job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97
(base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)

This was a successful merge on vdsm side.

We don't see any more requests for the top volume in this log. The next step to
complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b
but this can be done only on the SPM.

To understand why this did not happen, we need engine log showing this
interaction,
and logs from the SPM host from the same time.

Please file a bug about this and attach these logs (and the vdsm log
you sent here).
Fixing this vm is important but preventing this bug for other users is even more
important.


How to fix the volume metadata:

1. Edit 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta

Change:
VOLTYPE=INTERNAL

To:
VOLTYPE=LEAF

See attached file for reference.

2. Truncate the file to 512 bytes

truncate -s 512 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta

3. Verify the file size

$ ls -lh 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
-rw-r--r--. 1 nsoffer nsoffer 512 Jul 17 18:17
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta

4. Get the slot number for this volume from the LV using MD_N and
compute the offset

(copied from your pdf)

lvs -o vg_name,lv_name,tags | grep d7bd480d-2c51-4141-a386-113abf75219e

33777993-a3a5-4aad-a24c-dfe5e473faca
6197b30d-0732-4cc7-aef0-12f9f6e9565b
IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_58,PU_8e412b5a-
85ec-4c53-a5b8-dfb4d6d987b8

33777993-a3a5-4aad-a24c-dfe5e473faca
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
IU_d7bd480d-2c51-4141-a386-113abf75219e,MD_28,PU_00000000-
0000-0000-0000-000000000000

5. Get the metadata from the slot to verify that we change the right metadata

dd if=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
count=1 skip=1277952 iflag=skip_bytes > /tmp/8e412b5a-85ec-4c53-a5b8-
dfb4d6d987b8.meta.bad

Compare 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.bad with the fixed
file, the only change
should be the VOLTYPE=LEAF line, and the amount of padding.

6. Write new metadata to storage

dd of=/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/metadata bs=512
count=1 seek=1277952 oflag=direct,seek_bytes conv=fsync <
/tmp/8e412b5a-85ec-4c53-a5b8-
dfb4d6d987b8.meta.fixed

7. Delete the lv 6197b30d-0732-4cc7-aef0-12f9f6e9565b on the SPM host

WARNING: this must be done on the SPM host, otherwise you may corrupt
the VG metadata.

If you selected the wipe-after-delete option for this disk, you want
to wipe it before
deleting. If you selected the discard-after-delete you want to discard
the lv before
deleting it.

Activate the lv on the SPM host:

    lvchange -ay
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

If needed, wipe it:

    blkdiscard --zeroout --step 32m
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

If needed, discard it:

    blkdiscard 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

Deactivate the lv:

    lvchange -an
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

Remove the lv:

       lvremove
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

9. Fixing engine db

Benny, Eyal: can you add instructions how to remove the volume on the
engine side?

After the volume is removed from engine side, starting the vm will succeed.

Attachment: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta.fixed
Description: Binary data

_______________________________________________
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/FNVB2JQBHOU7NAHNQZQIUFKFT3MPNY5V/

Reply via email to