[ovirt-users] Re: VM Snapshot inconsistent

2020-07-30 Thread Arsène Gschwind
On Thu, 2020-07-23 at 15:17 +0300, Benny Zlotnik wrote:
I think you can remove 6197b30d-0732-4cc7-aef0-12f9f6e9565b from images and the 
corresponding snapshot, and set the parent, 
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 as active (active = 't' field), and change 
its snapshot to be active snapshot. That is if I correctly understand the 
current layout, that 6197b30d-0732-4cc7-aef0-12f9f6e9565b was removed from the 
storage and 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 is now the only volume for the 
disk

What do you mean by "change its snapshot to be active snapshot" ?
Yes correct, 6197b30d-0732-4cc7-aef0-12f9f6e9565b was removed from the storage 
and 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 is now the only volume for the disk

Thanks,
arsene

On Wed, Jul 22, 2020 at 1:32 PM Arsène Gschwind 
mailto:arsene.gschw...@unibas.ch>> wrote:
Please find the result:

psql -d engine -c "\x on" -c "select * from images where image_group_id = 
'd7bd480d-2c51-4141-a386-113abf75219e';"

Expanded display is on.

-[ RECORD 1 ]-+-

image_guid| 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

creation_date | 2020-04-23 14:59:23+02

size  | 161061273600

it_guid   | ----

parentid  | ----

imagestatus   | 1

lastmodified  | 2020-07-06 20:38:36.093+02

vm_snapshot_id| 6bc03db7-82a3-4b7e-9674-0bdd76933eb8

volume_type   | 2

volume_format | 4

image_group_id| d7bd480d-2c51-4141-a386-113abf75219e

_create_date  | 2020-04-23 14:59:20.919344+02

_update_date  | 2020-07-06 20:38:36.093788+02

active| f

volume_classification | 1

qcow_compat   | 2

-[ RECORD 2 ]-+-

image_guid| 6197b30d-0732-4cc7-aef0-12f9f6e9565b

creation_date | 2020-07-06 20:38:38+02

size  | 161061273600

it_guid   | ----

parentid  | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

imagestatus   | 1

lastmodified  | 1970-01-01 01:00:00+01

vm_snapshot_id| fd5193ac-dfbc-4ed2-b86c-21caa8009bb2

volume_type   | 2

volume_format | 4

image_group_id| d7bd480d-2c51-4141-a386-113abf75219e

_create_date  | 2020-07-06 20:38:36.093788+02

_update_date  | 2020-07-06 20:38:52.139003+02

active| t

volume_classification | 0

qcow_compat   | 2


psql -d engine -c "\x on" -c "SELECT s.* FROM snapshots s, images i where 
i.vm_snapshot_id = s.snapshot_id and i.image_guid = 
'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"

Expanded display is on.

-[ RECORD 1 
]---+--

snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2

vm_id   | b5534254-660f-44b1-bc83-d616c98ba0ba

snapshot_type   | ACTIVE

status  | OK

description | Active VM

creation_date   | 2020-04-23 14:59:20.171+02

app_list| 
kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt-guest-agent-common-1.0.14-1.el7

vm_configuration|

_create_date| 2020-04-23 14:59:20.154023+02

_update_date| 2020-07-03 17:33:17.483215+02

memory_metadata_disk_id |

memory_dump_disk_id |

vm_configuration_broken | f


Thanks.


On Tue, 2020-07-21 at 13:45 +0300, Benny Zlotnik wrote:
I forgot to add the `\x on` to make the output readable, can you run it with:
$ psql -U engine -d engine -c "\x on" -c ""

On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind 
mailto:arsene.gschw...@unibas.ch>> wrote:
Hi,

Please find the output:

select * from images where image_group_id = 
'd7bd480d-2c51-4141-a386-113abf75219e';


  image_guid  | creation_date  | size | 
  it_guid|   parentid   | 
imagestatus |lastmodified|vm_snapshot_id
| volume_type | volume_for

mat |image_group_id| _create_date  |
 _update_date  | active | volume_classification | qcow_compat

--++--+--+--+-++--+-+---

+--+---+---++---+-

 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 
---- | 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-23 Thread Benny Zlotnik
I think you can remove 6197b30d-0732-4cc7-aef0-12f9f6e9565b from images and
the corresponding snapshot, and set the parent,
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 as active (active = 't' field), and
change its snapshot to be active snapshot. That is if I correctly
understand the current layout, that 6197b30d-0732-4cc7-aef0-12f9f6e9565b
was removed from the storage and 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 is
now the only volume for the disk

On Wed, Jul 22, 2020 at 1:32 PM Arsène Gschwind 
wrote:

> Please find the result:
>
> psql -d engine -c "\x on" -c "select * from images where image_group_id = 
> 'd7bd480d-2c51-4141-a386-113abf75219e';"
>
> Expanded display is on.
>
> -[ RECORD 1 ]-+-
>
> image_guid| 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
> creation_date | 2020-04-23 14:59:23+02
>
> size  | 161061273600
>
> it_guid   | ----
>
> parentid  | ----
>
> imagestatus   | 1
>
> lastmodified  | 2020-07-06 20:38:36.093+02
>
> vm_snapshot_id| 6bc03db7-82a3-4b7e-9674-0bdd76933eb8
>
> volume_type   | 2
>
> volume_format | 4
>
> image_group_id| d7bd480d-2c51-4141-a386-113abf75219e
>
> _create_date  | 2020-04-23 14:59:20.919344+02
>
> _update_date  | 2020-07-06 20:38:36.093788+02
>
> active| f
>
> volume_classification | 1
>
> qcow_compat   | 2
>
> -[ RECORD 2 ]-+-
>
> image_guid| 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> creation_date | 2020-07-06 20:38:38+02
>
> size  | 161061273600
>
> it_guid   | ----
>
> parentid  | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
> imagestatus   | 1
>
> lastmodified  | 1970-01-01 01:00:00+01
>
> vm_snapshot_id| fd5193ac-dfbc-4ed2-b86c-21caa8009bb2
>
> volume_type   | 2
>
> volume_format | 4
>
> image_group_id| d7bd480d-2c51-4141-a386-113abf75219e
>
> _create_date  | 2020-07-06 20:38:36.093788+02
>
> _update_date  | 2020-07-06 20:38:52.139003+02
>
> active| t
>
> volume_classification | 0
>
> qcow_compat   | 2
>
>
> psql -d engine -c "\x on" -c "SELECT s.* FROM snapshots s, images i where 
> i.vm_snapshot_id = s.snapshot_id and i.image_guid = 
> '6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
>
> Expanded display is on.
>
> -[ RECORD 1 
> ]---+--
>
> snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2
>
> vm_id   | b5534254-660f-44b1-bc83-d616c98ba0ba
>
> snapshot_type   | ACTIVE
>
> status  | OK
>
> description | Active VM
>
> creation_date   | 2020-04-23 14:59:20.171+02
>
> app_list| 
> kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt-guest-agent-common-1.0.14-1.el7
>
> vm_configuration|
>
> _create_date| 2020-04-23 14:59:20.154023+02
>
> _update_date| 2020-07-03 17:33:17.483215+02
>
> memory_metadata_disk_id |
>
> memory_dump_disk_id |
>
> vm_configuration_broken | f
>
>
> Thanks.
>
>
>
> On Tue, 2020-07-21 at 13:45 +0300, Benny Zlotnik wrote:
>
> I forgot to add the `\x on` to make the output readable, can you run it
> with:
> $ psql -U engine -d engine -c "\x on" -c ""
>
> On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind 
> wrote:
>
> Hi,
>
> Please find the output:
>
> select * from images where image_group_id = 
> 'd7bd480d-2c51-4141-a386-113abf75219e';
>
>
>   image_guid  | creation_date  | size 
> |   it_guid|   parentid   
> | imagestatus |lastmodified|vm_snapshot_id
> | volume_type | volume_for
>
> mat |image_group_id| _create_date  |  
>_update_date  | active | volume_classification | qcow_compat
>
> --++--+--+--+-++--+-+---
>
> +--+---+---++---+-
>
>  8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 
> | ---- | ---- 
> |   1 | 2020-07-06 20:38:36.093+02 | 
> 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 |   2 |
>
>   4 | 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-22 Thread Arsène Gschwind
Please find the result:

psql -d engine -c "\x on" -c "select * from images where image_group_id = 
'd7bd480d-2c51-4141-a386-113abf75219e';"

Expanded display is on.

-[ RECORD 1 ]-+-

image_guid| 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

creation_date | 2020-04-23 14:59:23+02

size  | 161061273600

it_guid   | ----

parentid  | ----

imagestatus   | 1

lastmodified  | 2020-07-06 20:38:36.093+02

vm_snapshot_id| 6bc03db7-82a3-4b7e-9674-0bdd76933eb8

volume_type   | 2

volume_format | 4

image_group_id| d7bd480d-2c51-4141-a386-113abf75219e

_create_date  | 2020-04-23 14:59:20.919344+02

_update_date  | 2020-07-06 20:38:36.093788+02

active| f

volume_classification | 1

qcow_compat   | 2

-[ RECORD 2 ]-+-

image_guid| 6197b30d-0732-4cc7-aef0-12f9f6e9565b

creation_date | 2020-07-06 20:38:38+02

size  | 161061273600

it_guid   | ----

parentid  | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

imagestatus   | 1

lastmodified  | 1970-01-01 01:00:00+01

vm_snapshot_id| fd5193ac-dfbc-4ed2-b86c-21caa8009bb2

volume_type   | 2

volume_format | 4

image_group_id| d7bd480d-2c51-4141-a386-113abf75219e

_create_date  | 2020-07-06 20:38:36.093788+02

_update_date  | 2020-07-06 20:38:52.139003+02

active| t

volume_classification | 0

qcow_compat   | 2


psql -d engine -c "\x on" -c "SELECT s.* FROM snapshots s, images i where 
i.vm_snapshot_id = s.snapshot_id and i.image_guid = 
'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"

Expanded display is on.

-[ RECORD 1 
]---+--

snapshot_id | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2

vm_id   | b5534254-660f-44b1-bc83-d616c98ba0ba

snapshot_type   | ACTIVE

status  | OK

description | Active VM

creation_date   | 2020-04-23 14:59:20.171+02

app_list| 
kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt-guest-agent-common-1.0.14-1.el7

vm_configuration|

_create_date| 2020-04-23 14:59:20.154023+02

_update_date| 2020-07-03 17:33:17.483215+02

memory_metadata_disk_id |

memory_dump_disk_id |

vm_configuration_broken | f


Thanks.



On Tue, 2020-07-21 at 13:45 +0300, Benny Zlotnik wrote:
I forgot to add the `\x on` to make the output readable, can you run it with:
$ psql -U engine -d engine -c "\x on" -c ""

On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind 
mailto:arsene.gschw...@unibas.ch>> wrote:
Hi,

Please find the output:

select * from images where image_group_id = 
'd7bd480d-2c51-4141-a386-113abf75219e';


  image_guid  | creation_date  | size | 
  it_guid|   parentid   | 
imagestatus |lastmodified|vm_snapshot_id
| volume_type | volume_for

mat |image_group_id| _create_date  |
 _update_date  | active | volume_classification | qcow_compat

--++--+--+--+-++--+-+---

+--+---+---++---+-

 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 
---- | ---- |   
1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 
  2 |

  4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 
2020-07-06 20:38:36.093788+02 | f  | 1 |   2

 6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 
---- | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 |   
1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 
  2 |

  4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 
2020-07-06 20:38:52.139003+02 | t  | 0 |   2

(2 rows)



SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id 
and i.image_guid = 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-21 Thread Benny Zlotnik
I forgot to add the `\x on` to make the output readable, can you run it
with:
$ psql -U engine -d engine -c "\x on" -c ""

On Mon, Jul 20, 2020 at 2:50 PM Arsène Gschwind 
wrote:

> Hi,
>
> Please find the output:
>
> select * from images where image_group_id = 
> 'd7bd480d-2c51-4141-a386-113abf75219e';
>
>
>   image_guid  | creation_date  | size 
> |   it_guid|   parentid   
> | imagestatus |lastmodified|vm_snapshot_id
> | volume_type | volume_for
>
> mat |image_group_id| _create_date  |  
>_update_date  | active | volume_classification | qcow_compat
>
> --++--+--+--+-++--+-+---
>
> +--+---+---++---+-
>
>  8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 
> | ---- | ---- 
> |   1 | 2020-07-06 20:38:36.093+02 | 
> 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 |   2 |
>
>   4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 
> 2020-07-06 20:38:36.093788+02 | f  | 1 |   2
>
>  6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 
> | ---- | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 
> |   1 | 1970-01-01 01:00:00+01 | 
> fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 |   2 |
>
>   4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 
> 2020-07-06 20:38:52.139003+02 | t  | 0 |   2
>
> (2 rows)
>
>
>
> SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id 
> and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';
>
>  snapshot_id  |vm_id 
> | snapshot_type | status | description |   creation_date| 
>   app_list
>
>  | vm_configuration | _create_date
>   | _update_date  | memory_metadata_disk_id | 
> memory_dump_disk_id | vm_configuration_broken
>
> --+--+---++-++--
>
> -+--+---+---+-+-+-
>
>  fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba 
> | ACTIVE| OK | Active VM   | 2020-04-23 14:59:20.171+02 | 
> kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt
>
> -guest-agent-common-1.0.14-1.el7 |  | 2020-04-23 
> 14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | 
> | | f
>
> (1 row)
>
>
> Thanks,
> Arsene
>
> On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote:
>
> Sorry, I only replied to the question, in addition to removing the
>
> image from the images table, you may also need to set the parent as
>
> the active image and remove the snapshot referenced by this image from
>
> the database. Can you provide the output of:
>
> $ psql -U engine -d engine -c "select * from images where
>
> image_group_id = ";
>
>
> As well as
>
> $ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i
>
> where i.vm_snapshot_id = s.snapshot_id and i.image_guid =
>
> '6197b30d-0732-4cc7-aef0-12f9f6e9565b';"
>
>
> On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <
>
> bzlot...@redhat.com
>
> > wrote:
>
>
> It can be done by deleting from the images table:
>
> $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
>
> '6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
>
>
> of course the database should be backed up before doing this
>
>
>
>
> On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <
>
> nsof...@redhat.com
>
> > wrote:
>
>
> On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
>
> <
>
> arsene.gschw...@unibas.ch
>
> > wrote:
>
>
> It looks like the Pivot completed successfully, see attached vdsm.log.
>
> Is there a way to recover that VM?
>
> Or would it be better to recover the VM from Backup?
>
>
> This what we see in the log:
>
>
> 1. Merge request recevied
>
>
> 2020-07-13 11:18:30,282+0200 INFO  

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-20 Thread Arsène Gschwind
Hi,

Please find the output:

select * from images where image_group_id = 
'd7bd480d-2c51-4141-a386-113abf75219e';


  image_guid  | creation_date  | size | 
  it_guid|   parentid   | 
imagestatus |lastmodified|vm_snapshot_id
| volume_type | volume_for

mat |image_group_id| _create_date  |
 _update_date  | active | volume_classification | qcow_compat

--++--+--+--+-++--+-+---

+--+---+---++---+-

 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 | 2020-04-23 14:59:23+02 | 161061273600 | 
---- | ---- |   
1 | 2020-07-06 20:38:36.093+02 | 6bc03db7-82a3-4b7e-9674-0bdd76933eb8 | 
  2 |

  4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-04-23 14:59:20.919344+02 | 
2020-07-06 20:38:36.093788+02 | f  | 1 |   2

 6197b30d-0732-4cc7-aef0-12f9f6e9565b | 2020-07-06 20:38:38+02 | 161061273600 | 
---- | 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 |   
1 | 1970-01-01 01:00:00+01 | fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | 
  2 |

  4 | d7bd480d-2c51-4141-a386-113abf75219e | 2020-07-06 20:38:36.093788+02 | 
2020-07-06 20:38:52.139003+02 | t  | 0 |   2

(2 rows)



SELECT s.* FROM snapshots s, images i where i.vm_snapshot_id = s.snapshot_id 
and i.image_guid = '6197b30d-0732-4cc7-aef0-12f9f6e9565b';

 snapshot_id  |vm_id | 
snapshot_type | status | description |   creation_date| 
  app_list

 | vm_configuration | _create_date  
| _update_date  | memory_metadata_disk_id | 
memory_dump_disk_id | vm_configuration_broken

--+--+---++-++--

-+--+---+---+-+-+-

 fd5193ac-dfbc-4ed2-b86c-21caa8009bb2 | b5534254-660f-44b1-bc83-d616c98ba0ba | 
ACTIVE| OK | Active VM   | 2020-04-23 14:59:20.171+02 | 
kernel-3.10.0-957.12.2.el7,xorg-x11-drv-qxl-0.1.5-4.el7.1,kernel-3.10.0-957.12.1.el7,kernel-3.10.0-957.38.1.el7,ovirt

-guest-agent-common-1.0.14-1.el7 |  | 2020-04-23 
14:59:20.154023+02 | 2020-07-03 17:33:17.483215+02 | |  
   | f

(1 row)

Thanks,
Arsene

On Sun, 2020-07-19 at 16:34 +0300, Benny Zlotnik wrote:

Sorry, I only replied to the question, in addition to removing the

image from the images table, you may also need to set the parent as

the active image and remove the snapshot referenced by this image from

the database. Can you provide the output of:

$ psql -U engine -d engine -c "select * from images where

image_group_id = ";


As well as

$ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i

where i.vm_snapshot_id = s.snapshot_id and i.image_guid =

'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"


On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik <



bzlot...@redhat.com

> wrote:


It can be done by deleting from the images table:

$ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =

'6197b30d-0732-4cc7-aef0-12f9f6e9565b'";


of course the database should be backed up before doing this




On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer <



nsof...@redhat.com

> wrote:


On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind

<



arsene.gschw...@unibas.ch

> wrote:


It looks like the Pivot completed successfully, see attached vdsm.log.

Is there a way to recover that VM?

Or would it be better to recover the VM from Backup?


This what we see in the log:


1. Merge request recevied


2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START

merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',

u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':

u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':

u'0002-0002-0002-0002-0289'},


[ovirt-users] Re: VM Snapshot inconsistent

2020-07-19 Thread Benny Zlotnik
Sorry, I only replied to the question, in addition to removing the
image from the images table, you may also need to set the parent as
the active image and remove the snapshot referenced by this image from
the database. Can you provide the output of:
$ psql -U engine -d engine -c "select * from images where
image_group_id = ";

As well as
$ psql -U engine -d engine -c "SELECT s.* FROM snapshots s, images i
where i.vm_snapshot_id = s.snapshot_id and i.image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b';"

On Sun, Jul 19, 2020 at 12:49 PM Benny Zlotnik  wrote:
>
> It can be done by deleting from the images table:
> $ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
> '6197b30d-0732-4cc7-aef0-12f9f6e9565b'";
>
> of course the database should be backed up before doing this
>
>
>
> On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer  wrote:
> >
> > On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
> >  wrote:
> >
> > > It looks like the Pivot completed successfully, see attached vdsm.log.
> > > Is there a way to recover that VM?
> > > Or would it be better to recover the VM from Backup?
> >
> > This what we see in the log:
> >
> > 1. Merge request recevied
> >
> > 2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START
> > merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
> > u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
> > u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
> > u'0002-0002-0002-0002-0289'},
> > baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
> > topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
> > jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
> > from=:::10.34.38.31,39226,
> > flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
> > vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
> >
> > To track this job, we can use the jobUUID: 
> > 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
> > and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
> >
> > 2. Starting the merge
> >
> > 2020-07-13 11:18:30,690+0200 INFO  (jsonrpc/7) [virt.vm]
> > (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
> > jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
> > chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> > 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
> > top=None, bandwidth=0, flags=12 (vm:5945)
> >
> > We see the original chain:
> > 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> > 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
> >
> > 3. The merge was completed, ready for pivot
> >
> > 2020-07-13 11:19:00,992+0200 INFO  (libvirt/events) [virt.vm]
> > (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
> > for drive 
> > /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
> > is ready (vm:5847)
> >
> > At this point parent volume contains all the data in top volume and we can 
> > pivot
> > to the parent volume.
> >
> > 4. Vdsm detect that the merge is ready, and start the clean thread
> > that will complete the merge
> >
> > 2020-07-13 11:19:06,166+0200 INFO  (periodic/1) [virt.vm]
> > (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
> > for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
> >
> > 5. Requesting pivot to parent volume:
> >
> > 2020-07-13 11:19:06,717+0200 INFO  (merge/720410c3) [virt.vm]
> > (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
> > complete active layer commit (job
> > 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
> >
> > 6. Pivot was successful
> >
> > 2020-07-13 11:19:06,734+0200 INFO  (libvirt/events) [virt.vm]
> > (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
> > for drive 
> > /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
> > has completed (vm:5838)
> >
> > 7. Vdsm wait until libvirt updates the xml:
> >
> > 2020-07-13 11:19:06,756+0200 INFO  (merge/720410c3) [virt.vm]
> > (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
> > 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
> >
> > 8. Syncronizing vdsm metadata
> >
> > 2020-07-13 11:19:06,776+0200 INFO  (merge/720410c3) [vdsm.api] START
> > imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
> > imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
> > volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
> > newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
> > task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
> >
> > We can see the new chain:
> > ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
> >
> > 2020-07-13 11:19:07,005+0200 INFO  (merge/720410c3) [storage.Image]
> > Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> > 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)  (image:1221)
> >
> > The old chain:
> > 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> > 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-19 Thread Benny Zlotnik
It can be done by deleting from the images table:
$ psql -U engine -d engine -c "DELETE FROM images WHERE image_guid =
'6197b30d-0732-4cc7-aef0-12f9f6e9565b'";

of course the database should be backed up before doing this



On Fri, Jul 17, 2020 at 6:45 PM Nir Soffer  wrote:
>
> On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
>  wrote:
>
> > It looks like the Pivot completed successfully, see attached vdsm.log.
> > Is there a way to recover that VM?
> > Or would it be better to recover the VM from Backup?
>
> This what we see in the log:
>
> 1. Merge request recevied
>
> 2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START
> merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
> u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
> u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
> u'0002-0002-0002-0002-0289'},
> baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
> topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
> jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
> from=:::10.34.38.31,39226,
> flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
> vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)
>
> To track this job, we can use the jobUUID: 
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
> and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> 2. Starting the merge
>
> 2020-07-13 11:18:30,690+0200 INFO  (jsonrpc/7) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
> jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
> chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
> top=None, bandwidth=0, flags=12 (vm:5945)
>
> We see the original chain:
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
>
> 3. The merge was completed, ready for pivot
>
> 2020-07-13 11:19:00,992+0200 INFO  (libvirt/events) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
> for drive 
> /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
> is ready (vm:5847)
>
> At this point parent volume contains all the data in top volume and we can 
> pivot
> to the parent volume.
>
> 4. Vdsm detect that the merge is ready, and start the clean thread
> that will complete the merge
>
> 2020-07-13 11:19:06,166+0200 INFO  (periodic/1) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
> for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)
>
> 5. Requesting pivot to parent volume:
>
> 2020-07-13 11:19:06,717+0200 INFO  (merge/720410c3) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
> complete active layer commit (job
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)
>
> 6. Pivot was successful
>
> 2020-07-13 11:19:06,734+0200 INFO  (libvirt/events) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
> for drive 
> /rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
> has completed (vm:5838)
>
> 7. Vdsm wait until libvirt updates the xml:
>
> 2020-07-13 11:19:06,756+0200 INFO  (merge/720410c3) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)
>
> 8. Syncronizing vdsm metadata
>
> 2020-07-13 11:19:06,776+0200 INFO  (merge/720410c3) [vdsm.api] START
> imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
> imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
> volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
> newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
> task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)
>
> We can see the new chain:
> ['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']
>
> 2020-07-13 11:19:07,005+0200 INFO  (merge/720410c3) [storage.Image]
> Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)  (image:1221)
>
> The old chain:
> 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)
>
> 2020-07-13 11:19:07,006+0200 INFO  (merge/720410c3) [storage.Image]
> Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
> (image:1231)
> 2020-07-13 11:19:07,017+0200 INFO  (merge/720410c3) [storage.Image]
> Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
> the chain. Marking it ILLEGAL to prevent data corruption (image:1239)
>
> This matches what we see on storage.
>
> 9. Merge job is untracked
>
> 2020-07-13 11:19:21,134+0200 INFO  (periodic/1) [virt.vm]
> (vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread
> 
> successfully completed, untracking job
> 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
> (base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
> top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)
>
> This was a successful 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-17 Thread Nir Soffer
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
 wrote:

> It looks like the Pivot completed successfully, see attached vdsm.log.
> Is there a way to recover that VM?
> Or would it be better to recover the VM from Backup?

This what we see in the log:

1. Merge request recevied

2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START
merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e',
u'volumeID': u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID':
u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID':
u'0002-0002-0002-0002-0289'},
baseVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8',
topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0',
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97')
from=:::10.34.38.31,39226,
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227,
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:48)

To track this job, we can use the jobUUID: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97
and the top volume UUID: 6197b30d-0732-4cc7-aef0-12f9f6e9565b

2. Starting the merge

2020-07-13 11:18:30,690+0200 INFO  (jsonrpc/7) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting merge with
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97', original
chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top), disk='sda', base='sda[1]',
top=None, bandwidth=0, flags=12 (vm:5945)

We see the original chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)

3. The merge was completed, ready for pivot

2020-07-13 11:19:00,992+0200 INFO  (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive 
/rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
is ready (vm:5847)

At this point parent volume contains all the data in top volume and we can pivot
to the parent volume.

4. Vdsm detect that the merge is ready, and start the clean thread
that will complete the merge

2020-07-13 11:19:06,166+0200 INFO  (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Starting cleanup thread
for job: 720410c3-f1a0-4b25-bf26-cf40aa6b1f97 (vm:5809)

5. Requesting pivot to parent volume:

2020-07-13 11:19:06,717+0200 INFO  (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Requesting pivot to
complete active layer commit (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6205)

6. Pivot was successful

2020-07-13 11:19:06,734+0200 INFO  (libvirt/events) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Block job ACTIVE_COMMIT
for drive 
/rhev/data-center/mnt/blockSD/33777993-a3a5-4aad-a24c-dfe5e473faca/images/d7bd480d-2c51-4141-a386-113abf75219e/6197b30d-0732-4cc7-aef0-12f9f6e9565b
has completed (vm:5838)

7. Vdsm wait until libvirt updates the xml:

2020-07-13 11:19:06,756+0200 INFO  (merge/720410c3) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Pivot completed (job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97) (vm:6219)

8. Syncronizing vdsm metadata

2020-07-13 11:19:06,776+0200 INFO  (merge/720410c3) [vdsm.api] START
imageSyncVolumeChain(sdUUID='33777993-a3a5-4aad-a24c-dfe5e473faca',
imgUUID='d7bd480d-2c51-4141-a386-113abf75219e',
volUUID='6197b30d-0732-4cc7-aef0-12f9f6e9565b',
newChain=['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']) from=internal,
task_id=b8f605bd-8549-4983-8fc5-f2ebbe6c4666 (api:48)

We can see the new chain:
['8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8']

2020-07-13 11:19:07,005+0200 INFO  (merge/720410c3) [storage.Image]
Current chain=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)  (image:1221)

The old chain:
8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 <
6197b30d-0732-4cc7-aef0-12f9f6e9565b (top)

2020-07-13 11:19:07,006+0200 INFO  (merge/720410c3) [storage.Image]
Unlinking subchain: ['6197b30d-0732-4cc7-aef0-12f9f6e9565b']
(image:1231)
2020-07-13 11:19:07,017+0200 INFO  (merge/720410c3) [storage.Image]
Leaf volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b is being removed from
the chain. Marking it ILLEGAL to prevent data corruption (image:1239)

This matches what we see on storage.

9. Merge job is untracked

2020-07-13 11:19:21,134+0200 INFO  (periodic/1) [virt.vm]
(vmId='b5534254-660f-44b1-bc83-d616c98ba0ba') Cleanup thread

successfully completed, untracking job
720410c3-f1a0-4b25-bf26-cf40aa6b1f97
(base=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8,
top=6197b30d-0732-4cc7-aef0-12f9f6e9565b) (vm:5752)

This was a successful merge on vdsm side.

We don't see any more requests for the top volume in this log. The next step to
complete the merge it to delete the volume 6197b30d-0732-4cc7-aef0-12f9f6e9565b
but this can be done only on the SPM.

To understand why this did not happen, we need engine log showing this
interaction,
and logs from the SPM host from the same time.

Please file a bug about this and attach these logs (and the vdsm log
you sent here).
Fixing this vm is important but preventing this bug for 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-17 Thread Nir Soffer
On Thu, Jul 16, 2020 at 11:33 AM Arsène Gschwind
 wrote:
>
> On Wed, 2020-07-15 at 22:54 +0300, Nir Soffer wrote:
>
> On Wed, Jul 15, 2020 at 7:54 PM Arsène Gschwind
>
> <
>
> arsene.gschw...@unibas.ch
>
> > wrote:
>
>
> On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote:
>
>
> What we see in the data you sent:
>
>
>
> Qemu chain:
>
>
>
> $ qemu-img info --backing-chain
>
>
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> image: 
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> file format: qcow2
>
>
> virtual size: 150G (161061273600 bytes)
>
>
> disk size: 0
>
>
> cluster_size: 65536
>
>
> backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path:
>
>
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)
>
>
> backing file format: qcow2
>
>
> Format specific information:
>
>
> compat: 1.1
>
>
> lazy refcounts: false
>
>
> refcount bits: 16
>
>
> corrupt: false
>
>
>
> image: 
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
>
> file format: qcow2
>
>
> virtual size: 150G (161061273600 bytes)
>
>
> disk size: 0
>
>
> cluster_size: 65536
>
>
> Format specific information:
>
>
> compat: 1.1
>
>
> lazy refcounts: false
>
>
> refcount bits: 16
>
>
> corrupt: false
>
>
>
> Vdsm chain:
>
>
>
> $ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta
>
>
> CAP=161061273600
>
>
> CTIME=1594060718
>
>
> DESCRIPTION=
>
>
> DISKTYPE=DATA
>
>
> DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
>
>
> FORMAT=COW
>
>
> GEN=0
>
>
> IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
>
>
> LEGALITY=ILLEGAL
>
>
>
> ^^
>
>
> This is the issue, the top volume is illegal.
>
>
>
> PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
>
> TYPE=SPARSE
>
>
> VOLTYPE=LEAF
>
>
>
> $ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
>
> CAP=161061273600
>
>
> CTIME=1587646763
>
>
> DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM
>
>
> H11 HDB D13"}
>
>
> DISKTYPE=DATA
>
>
> DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
>
>
> FORMAT=COW
>
>
> GEN=0
>
>
> IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
>
>
> LEGALITY=LEGAL
>
>
> PUUID=----
>
>
> TYPE=SPARSE
>
>
> VOLTYPE=INTERNAL
>
>
>
> We set volume to ILLEGAL when we merge the top volume into the parent volume,
>
>
> and both volumes contain the same data.
>
>
>
> After we mark the volume as ILLEGAL, we pivot to the parent volume
>
>
> (8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8).
>
>
>
> If the pivot was successful, the parent volume may have new data, and starting
>
>
> the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state
>
>
> prevent this.
>
>
>
> If the pivot was not successful, the vm must be started using the top
>
>
> volume, but it
>
>
> will always fail if the volume is ILLEGAL.
>
>
>
> If the volume is ILLEGAL, trying to merge again when the VM is not running 
> will
>
>
> always fail, since vdsm does not if the pivot succeeded or not, and cannot 
> merge
>
>
> the volume in a safe way.
>
>
>
> Do you have the vdsm from all merge attempts on this disk?
>
>
> This is an extract of the vdsm logs, i may provide the complete log if it 
> would help.
>
>
> Yes, this is only the start of the merge. We see the success message
>
> but this only means the merge
>
> job was started.
>
>
> Please share the complete log, and if needed the next log. The
>
> important messages we look for are:
>
>
> Requesting pivot to complete active layer commit ...
>
>
> Follow by:
>
>
> Pivot completed ...
>
>
> If pivot failed, we expect to see this message:
>
>
> Pivot failed: ...
>
>
> After these messages we may find very important logs that explain why
>
> your disk was left
>
> in an inconsistent state.
>
> It looks like the Pivot completed successfully, see attached vdsm.log.

That's good, I'm looking in your log.

> Is there a way to recover that VM?

If the pivot was successful, qemu started to use the parent volume instead
of the top volume. In thi case you can delete the top volume, and fix the
metadata of the parent volume.

Then you need to remove the top volume from engine db, and fix the metadata
of the parent volume in engine db.

Let me veirfy first that the pivot was successful, and then I'll add
instructions
how to fix engine and volume metadata.

> Or would it be better to recover the VM from Backup?

If the backup is recent enough, it will be easier. But fixing the VM is will
prevent any data loss since the last backup.

It is not clear from the previous mails (or maybe I missed it) - is the VM
running now or stopped?

If the vm is running, checking the vm xml will show very clearly that it is not
using the top volume. You can do:

virsh -r dumpxml vm-name-or-id

> Thanks a lot
> Arsene
>
>
> Since this looks like a bug and may be useful to others, I think it is
>
> time to file a vdsm bug,
>
> and attach the logs to the bug.
>
>
> 2020-07-13 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Nir Soffer
On Wed, Jul 15, 2020 at 7:54 PM Arsène Gschwind
 wrote:
>
> On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote:
>
> What we see in the data you sent:
>
>
> Qemu chain:
>
>
> $ qemu-img info --backing-chain
>
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> image: 
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> file format: qcow2
>
> virtual size: 150G (161061273600 bytes)
>
> disk size: 0
>
> cluster_size: 65536
>
> backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path:
>
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)
>
> backing file format: qcow2
>
> Format specific information:
>
> compat: 1.1
>
> lazy refcounts: false
>
> refcount bits: 16
>
> corrupt: false
>
>
> image: 
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
> file format: qcow2
>
> virtual size: 150G (161061273600 bytes)
>
> disk size: 0
>
> cluster_size: 65536
>
> Format specific information:
>
> compat: 1.1
>
> lazy refcounts: false
>
> refcount bits: 16
>
> corrupt: false
>
>
> Vdsm chain:
>
>
> $ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta
>
> CAP=161061273600
>
> CTIME=1594060718
>
> DESCRIPTION=
>
> DISKTYPE=DATA
>
> DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
>
> FORMAT=COW
>
> GEN=0
>
> IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
>
> LEGALITY=ILLEGAL
>
>
> ^^
>
> This is the issue, the top volume is illegal.
>
>
> PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
>
> TYPE=SPARSE
>
> VOLTYPE=LEAF
>
>
> $ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
>
> CAP=161061273600
>
> CTIME=1587646763
>
> DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM
>
> H11 HDB D13"}
>
> DISKTYPE=DATA
>
> DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
>
> FORMAT=COW
>
> GEN=0
>
> IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
>
> LEGALITY=LEGAL
>
> PUUID=----
>
> TYPE=SPARSE
>
> VOLTYPE=INTERNAL
>
>
> We set volume to ILLEGAL when we merge the top volume into the parent volume,
>
> and both volumes contain the same data.
>
>
> After we mark the volume as ILLEGAL, we pivot to the parent volume
>
> (8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8).
>
>
> If the pivot was successful, the parent volume may have new data, and starting
>
> the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state
>
> prevent this.
>
>
> If the pivot was not successful, the vm must be started using the top
>
> volume, but it
>
> will always fail if the volume is ILLEGAL.
>
>
> If the volume is ILLEGAL, trying to merge again when the VM is not running 
> will
>
> always fail, since vdsm does not if the pivot succeeded or not, and cannot 
> merge
>
> the volume in a safe way.
>
>
> Do you have the vdsm from all merge attempts on this disk?
>
> This is an extract of the vdsm logs, i may provide the complete log if it 
> would help.

Yes, this is only the start of the merge. We see the success message
but this only means the merge
job was started.

Please share the complete log, and if needed the next log. The
important messages we look for are:

Requesting pivot to complete active layer commit ...

Follow by:

Pivot completed ...

If pivot failed, we expect to see this message:

Pivot failed: ...

After these messages we may find very important logs that explain why
your disk was left
in an inconsistent state.

Since this looks like a bug and may be useful to others, I think it is
time to file a vdsm bug,
and attach the logs to the bug.

> 2020-07-13 11:18:30,257+0200 INFO  (jsonrpc/5) [api.virt] START 
> merge(drive={u'imageID': u'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 
> u'volumeID': u'6172a270-5f73-464d-bebd-8bf0658c1de0', u'domainID': 
> u'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', u'poolID': 
> u'0002-0002-0002-0002-0289'}, ba
> seVolUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 
> topVolUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', bandwidth=u'0', 
> jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667') 
> from=:::10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, 
> vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
> 8)
> 2020-07-13 11:18:30,271+0200 INFO  (jsonrpc/5) [vdsm.api] START 
> getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 
> spUUID='0002-0002-0002-0002-0289', 
> imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 
> volUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', options=None) from=::fff
> f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, 
> task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:48)
> 2020-07-13 11:18:30,281+0200 INFO  (jsonrpc/6) [api.virt] START 
> merge(drive={u'imageID': u'b8e8b8b6-edd1-4d40-b80b-259268ff4878', 
> u'volumeID': u'28ed1acb-9697-43bd-980b-fe4317a06f24', u'domainID': 
> u'6b82f31b-fa2a-406b-832d-64d9666e1bcc', u'poolID': 
> u'0002-0002-0002-0002-0289'}, ba
> seVolUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 
> 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Arsène Gschwind
On Wed, 2020-07-15 at 17:46 +0300, Nir Soffer wrote:

What we see in the data you sent:


Qemu chain:


$ qemu-img info --backing-chain

/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

image: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

file format: qcow2

virtual size: 150G (161061273600 bytes)

disk size: 0

cluster_size: 65536

backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path:

/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)

backing file format: qcow2

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false


image: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

file format: qcow2

virtual size: 150G (161061273600 bytes)

disk size: 0

cluster_size: 65536

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false


Vdsm chain:


$ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta

CAP=161061273600

CTIME=1594060718

DESCRIPTION=

DISKTYPE=DATA

DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca

FORMAT=COW

GEN=0

IMAGE=d7bd480d-2c51-4141-a386-113abf75219e

LEGALITY=ILLEGAL


^^

This is the issue, the top volume is illegal.


PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

TYPE=SPARSE

VOLTYPE=LEAF


$ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta

CAP=161061273600

CTIME=1587646763

DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM

H11 HDB D13"}

DISKTYPE=DATA

DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca

FORMAT=COW

GEN=0

IMAGE=d7bd480d-2c51-4141-a386-113abf75219e

LEGALITY=LEGAL

PUUID=----

TYPE=SPARSE

VOLTYPE=INTERNAL


We set volume to ILLEGAL when we merge the top volume into the parent volume,

and both volumes contain the same data.


After we mark the volume as ILLEGAL, we pivot to the parent volume

(8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8).


If the pivot was successful, the parent volume may have new data, and starting

the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state

prevent this.


If the pivot was not successful, the vm must be started using the top

volume, but it

will always fail if the volume is ILLEGAL.


If the volume is ILLEGAL, trying to merge again when the VM is not running will

always fail, since vdsm does not if the pivot succeeded or not, and cannot merge

the volume in a safe way.


Do you have the vdsm from all merge attempts on this disk?

This is an extract of the vdsm logs, i may provide the complete log if it would 
help.

2020-07-13 11:18:30,257+0200 INFO  (jsonrpc/5) [api.virt] START 
merge(drive={u'imageID': u'6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', u'volumeID': 
u'6172a270-5f73-464d-bebd-8bf0658c1de0', u'domainID': 
u'a6f2625d-0f21-4d81-b98c-f545d5f86f8e', u'poolID': 
u'0002-0002-0002-0002-0289'}, ba
seVolUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', 
topVolUUID=u'6172a270-5f73-464d-bebd-8bf0658c1de0', bandwidth=u'0', 
jobUUID=u'5059c2ce-e2a0-482d-be93-2b79e8536667') from=:::10.34.38.31,39226, 
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, 
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
8)
2020-07-13 11:18:30,271+0200 INFO  (jsonrpc/5) [vdsm.api] START 
getVolumeInfo(sdUUID='a6f2625d-0f21-4d81-b98c-f545d5f86f8e', 
spUUID='0002-0002-0002-0002-0289', 
imgUUID='6c1445b3-33ac-4ec4-8e43-483d4a6da4e3', 
volUUID=u'a9d5fe18-f1bd-462e-95f7-42a50e81eb11', options=None) from=::fff
f:10.34.38.31,39226, flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, 
task_id=877c30b3-660c-4bfa-a215-75df8d03657e (api:48)
2020-07-13 11:18:30,281+0200 INFO  (jsonrpc/6) [api.virt] START 
merge(drive={u'imageID': u'b8e8b8b6-edd1-4d40-b80b-259268ff4878', u'volumeID': 
u'28ed1acb-9697-43bd-980b-fe4317a06f24', u'domainID': 
u'6b82f31b-fa2a-406b-832d-64d9666e1bcc', u'poolID': 
u'0002-0002-0002-0002-0289'}, ba
seVolUUID=u'29f99f8d-d8a6-475a-928c-e2ffdba76d80', 
topVolUUID=u'28ed1acb-9697-43bd-980b-fe4317a06f24', bandwidth=u'0', 
jobUUID=u'241dfab0-2ef2-45a6-a22f-c7122e9fc193') from=:::10.34.38.31,39226, 
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, 
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
8)
2020-07-13 11:18:30,282+0200 INFO  (jsonrpc/7) [api.virt] START 
merge(drive={u'imageID': u'd7bd480d-2c51-4141-a386-113abf75219e', u'volumeID': 
u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', u'domainID': 
u'33777993-a3a5-4aad-a24c-dfe5e473faca', u'poolID': 
u'0002-0002-0002-0002-0289'}, ba
seVolUUID=u'8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8', 
topVolUUID=u'6197b30d-0732-4cc7-aef0-12f9f6e9565b', bandwidth=u'0', 
jobUUID=u'720410c3-f1a0-4b25-bf26-cf40aa6b1f97') from=:::10.34.38.31,39226, 
flow_id=4a8b9527-06a3-4be6-9bb9-88630febc227, 
vmId=b5534254-660f-44b1-bc83-d616c98ba0ba (api:4
8)
2020-07-13 11:18:30,299+0200 INFO  (jsonrpc/6) [vdsm.api] START 
getVolumeInfo(sdUUID='6b82f31b-fa2a-406b-832d-64d9666e1bcc', 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Nir Soffer
What we see in the data you sent:

Qemu chain:

$ qemu-img info --backing-chain
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
image: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
file format: qcow2
virtual size: 150G (161061273600 bytes)
disk size: 0
cluster_size: 65536
backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path:
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

image: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
file format: qcow2
virtual size: 150G (161061273600 bytes)
disk size: 0
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false

Vdsm chain:

$ cat 6197b30d-0732-4cc7-aef0-12f9f6e9565b.meta
CAP=161061273600
CTIME=1594060718
DESCRIPTION=
DISKTYPE=DATA
DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
FORMAT=COW
GEN=0
IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
LEGALITY=ILLEGAL

^^
This is the issue, the top volume is illegal.

PUUID=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8
TYPE=SPARSE
VOLTYPE=LEAF

$ cat 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8.meta
CAP=161061273600
CTIME=1587646763
DESCRIPTION={"DiskAlias":"cpslpd01_Disk1","DiskDescription":"SAP SLCM
H11 HDB D13"}
DISKTYPE=DATA
DOMAIN=33777993-a3a5-4aad-a24c-dfe5e473faca
FORMAT=COW
GEN=0
IMAGE=d7bd480d-2c51-4141-a386-113abf75219e
LEGALITY=LEGAL
PUUID=----
TYPE=SPARSE
VOLTYPE=INTERNAL

We set volume to ILLEGAL when we merge the top volume into the parent volume,
and both volumes contain the same data.

After we mark the volume as ILLEGAL, we pivot to the parent volume
(8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8).

If the pivot was successful, the parent volume may have new data, and starting
the vm using the top volume may corrupt the vm filesystem. The ILLEGAL state
prevent this.

If the pivot was not successful, the vm must be started using the top
volume, but it
will always fail if the volume is ILLEGAL.

If the volume is ILLEGAL, trying to merge again when the VM is not running will
always fail, since vdsm does not if the pivot succeeded or not, and cannot merge
the volume in a safe way.

Do you have the vdsm from all merge attempts on this disk?

The most important log is the one showing the original merge. If the merge
succeeded, we should see a log showing the new libvirt chain, which
should contain
only the parent volume.

Nir
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/LPUSJIQMK2JG2IC5EIODOR3S2JPNLIKS/


[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Arsène Gschwind
On Wed, 2020-07-15 at 16:28 +0300, Nir Soffer wrote:

On Wed, Jul 15, 2020 at 4:00 PM Arsène Gschwind

<



arsene.gschw...@unibas.ch

> wrote:


On Wed, 2020-07-15 at 15:42 +0300, Nir Soffer wrote:


On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind


<




arsene.gschw...@unibas.ch



wrote:



Hi Nir,



I've followed your guide, please find attached the informations.


Thanks a lot for your help.



Thanks, looking at the data.



Quick look in the pdf show that one qemu-img info command failed:



---


lvchange -ay 
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b



lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert


6197b30d-0732-4cc7-aef0-12f9f6e9565b


33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a- 5.00g



qemu-img info --backing-chain


/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


qemu-img: Could not open


'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':


It is clear now - qemu could not open the backing file:


lv=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8


You must activate all the volumes in this image. I think my

instructions was not clear enough.


1. Find all lvs related to this image


2. Activate all of them


for lv_name in lv-name-1 lv-name-2 lv-name-3; do

lvchange -ay vg-name/$lv_name

done


3. Run qemu-img info on the LEAF volume


4. Deactivate the lvs activated in step 2.


Ouups, sorry .

Now it should be correct.


qemu-img info --backing-chain 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

image: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

file format: qcow2

virtual size: 150G (161061273600 bytes)

disk size: 0

cluster_size: 65536

backing file: 8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8 (actual path: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8)

backing file format: qcow2

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false


image: 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

file format: qcow2

virtual size: 150G (161061273600 bytes)

disk size: 0

cluster_size: 65536

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false



---



Maybe this lv was deactivated by vdsm after you activate it? Please


try to activate it again and


run the command again.



Sending all the info in text format in the mail message would make it


easier to respond.


I did it again with the same result, and the LV was still activated.


lvchange -ay 
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


  LV   VG   
Attr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert


  6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca 
-wi-a- 5.00g


qemu-img info --backing-chain 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


qemu-img: Could not open 
'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
 Could not open 
'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
 No such file or directory


lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


  LV   VG   
Attr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert


  6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca 
-wi-a- 5.00g



Sorry for the PDF, it was easier for me, but I will post everything in the mail 
from now on.




Arsene



On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:



On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind



<





arsene.gschw...@unibas.ch





wrote:




On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:




On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind




<






arsene.gschw...@unibas.ch







wrote:





Hi,





I running oVirt 4.3.9 with FC based storage.




I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
VM Snapshot and that task failed after a while and since then the Snapshot is 
inconsistent.




disk1 : Snapshot still visible in DB and on Storage using LVM commands




disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
merge did run correctly)




disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
merge did run correctly)





When I try to delete the snapshot again it runs forever and 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Nir Soffer
On Wed, Jul 15, 2020 at 4:00 PM Arsène Gschwind
 wrote:
>
> On Wed, 2020-07-15 at 15:42 +0300, Nir Soffer wrote:
>
> On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind
>
> <
>
> arsene.gschw...@unibas.ch
>
> > wrote:
>
>
> Hi Nir,
>
>
> I've followed your guide, please find attached the informations.
>
> Thanks a lot for your help.
>
>
> Thanks, looking at the data.
>
>
> Quick look in the pdf show that one qemu-img info command failed:
>
>
> ---
>
> lvchange -ay 
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>
> lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
>
> 6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> 33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a- 5.00g
>
>
> qemu-img info --backing-chain
>
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> qemu-img: Could not open
>
> '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':

It is clear now - qemu could not open the backing file:

lv=8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8

You must activate all the volumes in this image. I think my
instructions was not clear enough.

1. Find all lvs related to this image

2. Activate all of them

for lv_name in lv-name-1 lv-name-2 lv-name-3; do
lvchange -ay vg-name/$lv_name
done

3. Run qemu-img info on the LEAF volume

4. Deactivate the lvs activated in step 2.

>
> ---
>
>
> Maybe this lv was deactivated by vdsm after you activate it? Please
>
> try to activate it again and
>
> run the command again.
>
>
> Sending all the info in text format in the mail message would make it
>
> easier to respond.
>
> I did it again with the same result, and the LV was still activated.
>
> lvchange -ay 
> 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>   LV   VG   
> Attr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>
>   6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca 
> -wi-a- 5.00g
>
> qemu-img info --backing-chain 
> /dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
> qemu-img: Could not open 
> '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
>  Could not open 
> '/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
>  No such file or directory
>
> lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
>
>   LV   VG   
> Attr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
>
>   6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca 
> -wi-a- 5.00g
>
>
> Sorry for the PDF, it was easier for me, but I will post everything in the 
> mail from now on.
>
>
>
> Arsene
>
>
> On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:
>
>
> On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind
>
>
> <
>
>
> arsene.gschw...@unibas.ch
>
>
>
> wrote:
>
>
>
> On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
>
>
>
> On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
>
>
>
> <
>
>
>
> arsene.gschw...@unibas.ch
>
>
>
>
>
> wrote:
>
>
>
>
> Hi,
>
>
>
>
> I running oVirt 4.3.9 with FC based storage.
>
>
>
> I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
> VM Snapshot and that task failed after a while and since then the Snapshot is 
> inconsistent.
>
>
>
> disk1 : Snapshot still visible in DB and on Storage using LVM commands
>
>
>
> disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
> merge did run correctly)
>
>
>
> disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
> merge did run correctly)
>
>
>
>
> When I try to delete the snapshot again it runs forever and nothing happens.
>
>
>
>
> Did you try also when the vm is not running?
>
>
>
> Yes I've tried that without success
>
>
>
>
> In general the system is designed so trying again a failed merge will complete
>
>
>
> the merge.
>
>
>
>
> If the merge does complete, there may be some bug that the system cannot
>
>
>
> handle.
>
>
>
>
> Is there a way to suppress that snapshot?
>
>
>
> Is it possible to merge disk1 with its snapshot using LVM commands and then 
> cleanup the Engine DB?
>
>
>
>
> Yes but it is complicated. You need to understand the qcow2 chain
>
>
>
> on storage, complete the merge manually using qemu-img commit,
>
>
>
> update the metadata manually (even harder), then update engine db.
>
>
>
>
> The best way - if the system cannot recover, is to fix the bad metadata
>
>
>
> that cause the system to fail, and the let the system recover itself.
>
>
>
>
> Which storage domain format are you using? V5? V4?
>
>
>
> I'm using storage format V5 on FC.
>
>
>
> Fixing the 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Arsène Gschwind
On Wed, 2020-07-15 at 15:42 +0300, Nir Soffer wrote:

On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind

<



arsene.gschw...@unibas.ch

> wrote:


Hi Nir,


I've followed your guide, please find attached the informations.

Thanks a lot for your help.


Thanks, looking at the data.


Quick look in the pdf show that one qemu-img info command failed:


---

lvchange -ay 
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b


lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert

6197b30d-0732-4cc7-aef0-12f9f6e9565b

33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a- 5.00g


qemu-img info --backing-chain

/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

qemu-img: Could not open

'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':

Could not open '/dev/33777993-

a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No

such file or directory

---


Maybe this lv was deactivated by vdsm after you activate it? Please

try to activate it again and

run the command again.


Sending all the info in text format in the mail message would make it

easier to respond.

I did it again with the same result, and the LV was still activated.

lvchange -ay 
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

  LV   VG   
Attr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert

  6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca 
-wi-a- 5.00g

qemu-img info --backing-chain 
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

qemu-img: Could not open 
'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
 Could not open 
'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
 No such file or directory

lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

  LV   VG   
Attr   LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert

  6197b30d-0732-4cc7-aef0-12f9f6e9565b 33777993-a3a5-4aad-a24c-dfe5e473faca 
-wi-a- 5.00g


Sorry for the PDF, it was easier for me, but I will post everything in the mail 
from now on.



Arsene


On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:


On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind


<




arsene.gschw...@unibas.ch



wrote:



On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:



On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind



<





arsene.gschw...@unibas.ch





wrote:




Hi,




I running oVirt 4.3.9 with FC based storage.



I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
VM Snapshot and that task failed after a while and since then the Snapshot is 
inconsistent.



disk1 : Snapshot still visible in DB and on Storage using LVM commands



disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
merge did run correctly)



disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
merge did run correctly)




When I try to delete the snapshot again it runs forever and nothing happens.




Did you try also when the vm is not running?



Yes I've tried that without success




In general the system is designed so trying again a failed merge will complete



the merge.




If the merge does complete, there may be some bug that the system cannot



handle.




Is there a way to suppress that snapshot?



Is it possible to merge disk1 with its snapshot using LVM commands and then 
cleanup the Engine DB?




Yes but it is complicated. You need to understand the qcow2 chain



on storage, complete the merge manually using qemu-img commit,



update the metadata manually (even harder), then update engine db.




The best way - if the system cannot recover, is to fix the bad metadata



that cause the system to fail, and the let the system recover itself.




Which storage domain format are you using? V5? V4?



I'm using storage format V5 on FC.



Fixing the metadata is not easy.



First you have to find the volumes related to this disk. You can find


the disk uuid and storage


domain uuid in engine ui, and then you can find the volumes like this:



lvs -o vg_name,lv_name,tags | grep disk-uuid



For every lv, you will have a tag MD_N where n is a number. This is


the slot number


in the metadata volume.



You need to calculate the offset of the metadata area for every volume using:



offset = 1024*1024 + 8192 * N



Then you can copy the metadata block using:



dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset



[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Nir Soffer
On Wed, Jul 15, 2020 at 3:12 PM Arsène Gschwind
 wrote:
>
> Hi Nir,
>
> I've followed your guide, please find attached the informations.
> Thanks a lot for your help.

Thanks, looking at the data.

Quick look in the pdf show that one qemu-img info command failed:

---
lvchange -ay 
33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b

lvs 33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
6197b30d-0732-4cc7-aef0-12f9f6e9565b
33777993-a3a5-4aad-a24c-dfe5e473faca -wi-a- 5.00g

qemu-img info --backing-chain
/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/6197b30d-0732-4cc7-aef0-12f9f6e9565b
qemu-img: Could not open
'/dev/33777993-a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8':
Could not open '/dev/33777993-
a3a5-4aad-a24c-dfe5e473faca/8e412b5a-85ec-4c53-a5b8-dfb4d6d987b8': No
such file or directory
---

Maybe this lv was deactivated by vdsm after you activate it? Please
try to activate it again and
run the command again.

Sending all the info in text format in the mail message would make it
easier to respond.

>
> Arsene
>
> On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:
>
> On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind
>
> <
>
> arsene.gschw...@unibas.ch
>
> > wrote:
>
>
> On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
>
>
> On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
>
>
> <
>
>
> arsene.gschw...@unibas.ch
>
>
>
> wrote:
>
>
>
> Hi,
>
>
>
> I running oVirt 4.3.9 with FC based storage.
>
>
> I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
> VM Snapshot and that task failed after a while and since then the Snapshot is 
> inconsistent.
>
>
> disk1 : Snapshot still visible in DB and on Storage using LVM commands
>
>
> disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
> merge did run correctly)
>
>
> disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
> merge did run correctly)
>
>
>
> When I try to delete the snapshot again it runs forever and nothing happens.
>
>
>
> Did you try also when the vm is not running?
>
>
> Yes I've tried that without success
>
>
>
> In general the system is designed so trying again a failed merge will complete
>
>
> the merge.
>
>
>
> If the merge does complete, there may be some bug that the system cannot
>
>
> handle.
>
>
>
> Is there a way to suppress that snapshot?
>
>
> Is it possible to merge disk1 with its snapshot using LVM commands and then 
> cleanup the Engine DB?
>
>
>
> Yes but it is complicated. You need to understand the qcow2 chain
>
>
> on storage, complete the merge manually using qemu-img commit,
>
>
> update the metadata manually (even harder), then update engine db.
>
>
>
> The best way - if the system cannot recover, is to fix the bad metadata
>
>
> that cause the system to fail, and the let the system recover itself.
>
>
>
> Which storage domain format are you using? V5? V4?
>
>
> I'm using storage format V5 on FC.
>
>
> Fixing the metadata is not easy.
>
>
> First you have to find the volumes related to this disk. You can find
>
> the disk uuid and storage
>
> domain uuid in engine ui, and then you can find the volumes like this:
>
>
> lvs -o vg_name,lv_name,tags | grep disk-uuid
>
>
> For every lv, you will have a tag MD_N where n is a number. This is
>
> the slot number
>
> in the metadata volume.
>
>
> You need to calculate the offset of the metadata area for every volume using:
>
>
> offset = 1024*1024 + 8192 * N
>
>
> Then you can copy the metadata block using:
>
>
> dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset
>
> conv=skip_bytes > lv-name.meta
>
>
> Please share these files.
>
>
> This part is not needed in 4.4, we have a new StorageDomain dump API,
>
> that can find the same
>
> info in one command:
>
>
> vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \
>
> jq '.volumes | .[] | select(.image=="disk-uuid")'
>
>
> The second step is to see what is the actual qcow2 chain. Find the
>
> volume which is the LEAF
>
> by grepping the metadata files. In some cases you may have more than
>
> one LEAF (which may
>
> be the problem).
>
>
> Then activate all volumes using:
>
>
> lvchange -ay vg-name/lv-name
>
>
> Now you can get the backing chain using qemu-img and the LEAF volume.
>
>
> qemu-img info --backing-chain /dev/vg-name/lv-name
>
>
> If you have more than one LEAF, run this on all LEAFs. Ony one of them
>
> will be correct.
>
>
> Please share also output of qemu-img.
>
>
> Once we finished with the volumes, deactivate them:
>
>
> lvchange -an vg-name/lv-name
>
>
> Based on the output, we can tell what is the real chain, and what is
>
> the chain as seen by
>
> vdsm metadata, and what is the required fix.
>
>
> Nir
>
>
>
> Thanks.
>
>
>
> Thanks for any hint or help.
>
>
> rgds , arsene
>
>
>
> --
>
>
>
> Arsène Gschwind <
>
>
> arsene.gschw...@unibas.ch
>
>
>
>
>
> 

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-15 Thread Arsène Gschwind
Hi Nir,

I've followed your guide, please find attached the informations.
Thanks a lot for your help.

Arsene

On Tue, 2020-07-14 at 23:47 +0300, Nir Soffer wrote:

On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind

<



arsene.gschw...@unibas.ch

> wrote:


On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:


On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind


<




arsene.gschw...@unibas.ch



wrote:



Hi,



I running oVirt 4.3.9 with FC based storage.


I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
VM Snapshot and that task failed after a while and since then the Snapshot is 
inconsistent.


disk1 : Snapshot still visible in DB and on Storage using LVM commands


disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
merge did run correctly)


disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
merge did run correctly)



When I try to delete the snapshot again it runs forever and nothing happens.



Did you try also when the vm is not running?


Yes I've tried that without success



In general the system is designed so trying again a failed merge will complete


the merge.



If the merge does complete, there may be some bug that the system cannot


handle.



Is there a way to suppress that snapshot?


Is it possible to merge disk1 with its snapshot using LVM commands and then 
cleanup the Engine DB?



Yes but it is complicated. You need to understand the qcow2 chain


on storage, complete the merge manually using qemu-img commit,


update the metadata manually (even harder), then update engine db.



The best way - if the system cannot recover, is to fix the bad metadata


that cause the system to fail, and the let the system recover itself.



Which storage domain format are you using? V5? V4?


I'm using storage format V5 on FC.


Fixing the metadata is not easy.


First you have to find the volumes related to this disk. You can find

the disk uuid and storage

domain uuid in engine ui, and then you can find the volumes like this:


lvs -o vg_name,lv_name,tags | grep disk-uuid


For every lv, you will have a tag MD_N where n is a number. This is

the slot number

in the metadata volume.


You need to calculate the offset of the metadata area for every volume using:


offset = 1024*1024 + 8192 * N


Then you can copy the metadata block using:


dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset

conv=skip_bytes > lv-name.meta


Please share these files.


This part is not needed in 4.4, we have a new StorageDomain dump API,

that can find the same

info in one command:


vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \

jq '.volumes | .[] | select(.image=="disk-uuid")'


The second step is to see what is the actual qcow2 chain. Find the

volume which is the LEAF

by grepping the metadata files. In some cases you may have more than

one LEAF (which may

be the problem).


Then activate all volumes using:


lvchange -ay vg-name/lv-name


Now you can get the backing chain using qemu-img and the LEAF volume.


qemu-img info --backing-chain /dev/vg-name/lv-name


If you have more than one LEAF, run this on all LEAFs. Ony one of them

will be correct.


Please share also output of qemu-img.


Once we finished with the volumes, deactivate them:


lvchange -an vg-name/lv-name


Based on the output, we can tell what is the real chain, and what is

the chain as seen by

vdsm metadata, and what is the required fix.


Nir



Thanks.



Thanks for any hint or help.


rgds , arsene



--



Arsène Gschwind <




arsene.gschw...@unibas.ch





Universitaet Basel


___


Users mailing list --




users@ovirt.org




To unsubscribe send an email to




users-le...@ovirt.org




Privacy Statement:




https://www.ovirt.org/privacy-policy.html




oVirt Code of Conduct:




https://www.ovirt.org/community/about/community-guidelines/




List Archives:




https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/





--


Arsène Gschwind

Fa. Sapify AG im Auftrag der universitaet Basel

IT Services

Klinelbergstr. 70 | CH-4056 Basel | Switzerland

Tel: +41 79 449 25 63 |



http://its.unibas.ch


ITS-ServiceDesk:



support-...@unibas.ch

 | +41 61 267 14 11


--

Arsène Gschwind mailto:arsene.gschw...@unibas.ch>>
Universitaet Basel


cpslpd01.pdf
Description: cpslpd01.pdf
CAP=354334801920
CTIME=158765
DESCRIPTION={"DiskAlias":"cpslpd01_HANADB_Disk1","DiskDescription":"SAP SLCM H11 HDB D13 data"}

[ovirt-users] Re: VM Snapshot inconsistent

2020-07-14 Thread Gianluca Cecchi
On Tue, Jul 14, 2020 at 10:50 PM Nir Soffer  wrote:

>
>
> Fixing the metadata is not easy.
>
> First you have to find the volumes related to this disk. You can find
> the disk uuid and storage
> domain uuid in engine ui, and then you can find the volumes like this:
>
> lvs -o vg_name,lv_name,tags | grep disk-uuid
>

Only to add that possibly the RHV logical volumes are filtered out at
lvm.conf level, so in this case it could be necessary to bypass the filter
to see the information, like this:

lvs --config 'devices { filter = [ "a|.*|" ] }' -o vg_name,lv_name,tags |
grep disk-uuid

Gianluca
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/QQBFG4AJPG5TUQUVY2NQF37YAK7PPEBI/


[ovirt-users] Re: VM Snapshot inconsistent

2020-07-14 Thread Nir Soffer
On Tue, Jul 14, 2020 at 7:51 PM Arsène Gschwind
 wrote:
>
> On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:
>
> On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
>
> <
>
> arsene.gschw...@unibas.ch
>
> > wrote:
>
>
> Hi,
>
>
> I running oVirt 4.3.9 with FC based storage.
>
> I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
> VM Snapshot and that task failed after a while and since then the Snapshot is 
> inconsistent.
>
> disk1 : Snapshot still visible in DB and on Storage using LVM commands
>
> disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
> merge did run correctly)
>
> disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
> merge did run correctly)
>
>
> When I try to delete the snapshot again it runs forever and nothing happens.
>
>
> Did you try also when the vm is not running?
>
> Yes I've tried that without success
>
>
> In general the system is designed so trying again a failed merge will complete
>
> the merge.
>
>
> If the merge does complete, there may be some bug that the system cannot
>
> handle.
>
>
> Is there a way to suppress that snapshot?
>
> Is it possible to merge disk1 with its snapshot using LVM commands and then 
> cleanup the Engine DB?
>
>
> Yes but it is complicated. You need to understand the qcow2 chain
>
> on storage, complete the merge manually using qemu-img commit,
>
> update the metadata manually (even harder), then update engine db.
>
>
> The best way - if the system cannot recover, is to fix the bad metadata
>
> that cause the system to fail, and the let the system recover itself.
>
>
> Which storage domain format are you using? V5? V4?
>
> I'm using storage format V5 on FC.

Fixing the metadata is not easy.

First you have to find the volumes related to this disk. You can find
the disk uuid and storage
domain uuid in engine ui, and then you can find the volumes like this:

lvs -o vg_name,lv_name,tags | grep disk-uuid

For every lv, you will have a tag MD_N where n is a number. This is
the slot number
in the metadata volume.

You need to calculate the offset of the metadata area for every volume using:

offset = 1024*1024 + 8192 * N

Then you can copy the metadata block using:

dd if=/dev/vg-name/metadata bs=512 count=1 skip=$offset
conv=skip_bytes > lv-name.meta

Please share these files.

This part is not needed in 4.4, we have a new StorageDomain dump API,
that can find the same
info in one command:

vdsm-client StorageDomain dump sd_id=storage-domain-uuid | \
jq '.volumes | .[] | select(.image=="disk-uuid")'

The second step is to see what is the actual qcow2 chain. Find the
volume which is the LEAF
by grepping the metadata files. In some cases you may have more than
one LEAF (which may
be the problem).

Then activate all volumes using:

lvchange -ay vg-name/lv-name

Now you can get the backing chain using qemu-img and the LEAF volume.

qemu-img info --backing-chain /dev/vg-name/lv-name

If you have more than one LEAF, run this on all LEAFs. Ony one of them
will be correct.

Please share also output of qemu-img.

Once we finished with the volumes, deactivate them:

lvchange -an vg-name/lv-name

Based on the output, we can tell what is the real chain, and what is
the chain as seen by
vdsm metadata, and what is the required fix.

Nir

>
> Thanks.
>
>
> Thanks for any hint or help.
>
> rgds , arsene
>
>
> --
>
>
> Arsène Gschwind <
>
> arsene.gschw...@unibas.ch
>
> >
>
> Universitaet Basel
>
> ___
>
> Users mailing list --
>
> users@ovirt.org
>
>
> To unsubscribe send an email to
>
> users-le...@ovirt.org
>
>
> Privacy Statement:
>
> https://www.ovirt.org/privacy-policy.html
>
>
> oVirt Code of Conduct:
>
> https://www.ovirt.org/community/about/community-guidelines/
>
>
> List Archives:
>
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/
>
>
>
> --
>
> Arsène Gschwind
> Fa. Sapify AG im Auftrag der universitaet Basel
> IT Services
> Klinelbergstr. 70 | CH-4056 Basel | Switzerland
> Tel: +41 79 449 25 63 | http://its.unibas.ch
> ITS-ServiceDesk: support-...@unibas.ch | +41 61 267 14 11
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5DZ2TEEFGFGTACLMPVZXSQO7AXJGIB37/


[ovirt-users] Re: VM Snapshot inconsistent

2020-07-14 Thread Arsène Gschwind
On Tue, 2020-07-14 at 16:50 +, Arsène Gschwind wrote:
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:

On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind

<



arsene.gschw...@unibas.ch

> wrote:


Hi,


I running oVirt 4.3.9 with FC based storage.

I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
VM Snapshot and that task failed after a while and since then the Snapshot is 
inconsistent.

disk1 : Snapshot still visible in DB and on Storage using LVM commands

disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
merge did run correctly)

disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
merge did run correctly)


When I try to delete the snapshot again it runs forever and nothing happens.


Did you try also when the vm is not running?

Yes I've tried that without success


In general the system is designed so trying again a failed merge will complete

the merge.


If the merge does complete, there may be some bug that the system cannot

handle.


Is there a way to suppress that snapshot?

Is it possible to merge disk1 with its snapshot using LVM commands and then 
cleanup the Engine DB?


Yes but it is complicated. You need to understand the qcow2 chain

on storage, complete the merge manually using qemu-img commit,

update the metadata manually (even harder), then update engine db.


The best way - if the system cannot recover, is to fix the bad metadata

that cause the system to fail, and the let the system recover itself.

Do you have some hint how to fix the metadata?

Thanks a lot.


Which storage domain format are you using? V5? V4?

I'm using storage format V5 on FC.

Thanks.


Thanks for any hint or help.

rgds , arsene


--


Arsène Gschwind <



arsene.gschw...@unibas.ch

>

Universitaet Basel

___

Users mailing list --



users@ovirt.org


To unsubscribe send an email to



users-le...@ovirt.org


Privacy Statement:



https://www.ovirt.org/privacy-policy.html


oVirt Code of Conduct:



https://www.ovirt.org/community/about/community-guidelines/


List Archives:



https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/



--

Arsène Gschwind
Fa. Sapify AG im Auftrag der universitaet Basel
IT Services
Klinelbergstr. 70 | CH-4056 Basel | Switzerland
Tel: +41 79 449 25 63 | http://its.unibas.ch
ITS-ServiceDesk: support-...@unibas.ch | +41 61 
267 14 11

___

Users mailing list --



users@ovirt.org


To unsubscribe send an email to



users-le...@ovirt.org


Privacy Statement:



https://www.ovirt.org/privacy-policy.html


oVirt Code of Conduct:



https://www.ovirt.org/community/about/community-guidelines/


List Archives:



https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y4A2PG6PNTSW2DR72QREH3IW6DITCU4U/


--

Arsène Gschwind
Fa. Sapify AG im Auftrag der universitaet Basel
IT Services
Klinelbergstr. 70 | CH-4056 Basel | Switzerland
Tel: +41 79 449 25 63 | http://its.unibas.ch
ITS-ServiceDesk: support-...@unibas.ch | +41 61 
267 14 11
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WXK7YFY7NBER6HWZ5G2MCGMFX3VD5KH/


[ovirt-users] Re: VM Snapshot inconsistent

2020-07-14 Thread Arsène Gschwind
On Tue, 2020-07-14 at 19:10 +0300, Nir Soffer wrote:

On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind

<



arsene.gschw...@unibas.ch

> wrote:


Hi,


I running oVirt 4.3.9 with FC based storage.

I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
VM Snapshot and that task failed after a while and since then the Snapshot is 
inconsistent.

disk1 : Snapshot still visible in DB and on Storage using LVM commands

disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
merge did run correctly)

disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
merge did run correctly)


When I try to delete the snapshot again it runs forever and nothing happens.


Did you try also when the vm is not running?

Yes I've tried that without success


In general the system is designed so trying again a failed merge will complete

the merge.


If the merge does complete, there may be some bug that the system cannot

handle.


Is there a way to suppress that snapshot?

Is it possible to merge disk1 with its snapshot using LVM commands and then 
cleanup the Engine DB?


Yes but it is complicated. You need to understand the qcow2 chain

on storage, complete the merge manually using qemu-img commit,

update the metadata manually (even harder), then update engine db.


The best way - if the system cannot recover, is to fix the bad metadata

that cause the system to fail, and the let the system recover itself.


Which storage domain format are you using? V5? V4?

I'm using storage format V5 on FC.

Thanks.


Thanks for any hint or help.

rgds , arsene


--


Arsène Gschwind <



arsene.gschw...@unibas.ch

>

Universitaet Basel

___

Users mailing list --



users@ovirt.org


To unsubscribe send an email to



users-le...@ovirt.org


Privacy Statement:



https://www.ovirt.org/privacy-policy.html


oVirt Code of Conduct:



https://www.ovirt.org/community/about/community-guidelines/


List Archives:



https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/



--

Arsène Gschwind
Fa. Sapify AG im Auftrag der universitaet Basel
IT Services
Klinelbergstr. 70 | CH-4056 Basel | Switzerland
Tel: +41 79 449 25 63 | http://its.unibas.ch
ITS-ServiceDesk: support-...@unibas.ch | +41 61 
267 14 11
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/Y4A2PG6PNTSW2DR72QREH3IW6DITCU4U/


[ovirt-users] Re: VM Snapshot inconsistent

2020-07-14 Thread Nir Soffer
On Tue, Jul 14, 2020 at 5:37 PM Arsène Gschwind
 wrote:
>
> Hi,
>
> I running oVirt 4.3.9 with FC based storage.
> I'm running several VM with 3 disks on 3 different SD. Lately we did delete a 
> VM Snapshot and that task failed after a while and since then the Snapshot is 
> inconsistent.
> disk1 : Snapshot still visible in DB and on Storage using LVM commands
> disk2: Snapshot still visible in DB but not on storage anymore (It seems the 
> merge did run correctly)
> disk3: Snapshot still visible in DB but not on storage ansmore (It seems the 
> merge did run correctly)
>
> When I try to delete the snapshot again it runs forever and nothing happens.

Did you try also when the vm is not running?

In general the system is designed so trying again a failed merge will complete
the merge.

If the merge does complete, there may be some bug that the system cannot
handle.

> Is there a way to suppress that snapshot?
> Is it possible to merge disk1 with its snapshot using LVM commands and then 
> cleanup the Engine DB?

Yes but it is complicated. You need to understand the qcow2 chain
on storage, complete the merge manually using qemu-img commit,
update the metadata manually (even harder), then update engine db.

The best way - if the system cannot recover, is to fix the bad metadata
that cause the system to fail, and the let the system recover itself.

Which storage domain format are you using? V5? V4?

> Thanks for any hint or help.
> rgds , arsene
>
> --
>
> Arsène Gschwind 
> Universitaet Basel
> ___
> Users mailing list -- users@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/privacy-policy.html
> oVirt Code of Conduct: 
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives: 
> https://lists.ovirt.org/archives/list/users@ovirt.org/message/5WZ6KO2LVD3ZA2JNNIHJRCXG65HO4LMZ/
___
Users mailing list -- users@ovirt.org
To unsubscribe send an email to users-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/users@ovirt.org/message/ZC5534S2DTTYAJT5RQQWCSPOCEAT532Y/