On Fri, Mar 18, 2016 at 7:55 PM, Nathanaël Blanchet <blanc...@abes.fr> wrote: > Hello, > > I can create snapshot when no one exists but I'm not able to remove it > after.
Do you try to remove it when the vm is running? > It concerns many of my vms, and when stopping them, they can't boot anymore > because of the illegal status of the disks, this leads me in a critical > situation > > VM fedora23 is down with error. Exit message: Unable to get volume size for > domain 5ef8572c-0ab5-4491-994a-e4c30230a525 volume > e5969faa-97ea-41df-809b-cc62161ab1bc > > As far as I didn't initiate any live merge, am I concerned by this bug > https://bugzilla.redhat.com/show_bug.cgi?id=1306741? > I'm running 3.6.2, will upgrade to 3.6.3 solve this issue? If you tried to remove a snapshot while the vm is running you did initiate live merge, and this bug may effect you. Adding Greg for adding more info about this. > > 2016-03-18 18:26:57,652 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotCommand] > (org.ovirt.thread.pool-8-thread-39) [a1e222d] Ending command > 'org.ovirt.engine.core.bll.RemoveSnapshotCommand' with failure. > 2016-03-18 18:26:57,663 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotCommand] > (org.ovirt.thread.pool-8-thread-39) [a1e222d] Could not delete image > '46e9ecc8-e168-4f4d-926c-e769f5df1f2c' from snapshot > '88fcf167-4302-405e-825f-ad7e0e9f6564' > 2016-03-18 18:26:57,678 WARN > [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] > (org.ovirt.thread.pool-8-thread-39) [a1e222d] Correlation ID: a1e222d, Job > ID: 00d3e364-7e47-4022-82ff-f772cd79d4a1, Call Stack: null, Custom Event ID: > -1, Message: Due to partial snapshot removal, Snapshot 'test' of VM > 'fedora23' now contains only the following disks: 'fedora23_Disk1'. > 2016-03-18 18:26:57,695 ERROR > [org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand] > (org.ovirt.thread.pool-8-thread-39) [724e99fd] Ending command > 'org.ovirt.engine.core.bll.RemoveSnapshotSingleDiskCommand' with failure. > 2016-03-18 18:26:57,708 ERROR > [org.ovirt.engine.core.dal.dbbroker.auditloghandlin > > Thank you for your help. > > > Le 23/02/2016 19:51, Greg Padgett a écrit : >> >> On 02/22/2016 07:10 AM, Marcelo Leandro wrote: >>> >>> Hello, >>> >>> The bug with snapshot it will be fixed in ovirt 3.6.3? >>> >>> thanks. >>> >> >> Hi Marcelo, >> >> Yes, the bug below (bug 1301709) is now targeted to 3.6.3. >> >> Thanks, >> Greg >> >>> 2016-02-18 11:34 GMT-03:00 Adam Litke <ali...@redhat.com>: >>>> >>>> On 18/02/16 10:37 +0100, Rik Theys wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> On 02/17/2016 05:29 PM, Adam Litke wrote: >>>>>> >>>>>> >>>>>> On 17/02/16 11:14 -0500, Greg Padgett wrote: >>>>>>> >>>>>>> >>>>>>> On 02/17/2016 03:42 AM, Rik Theys wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> On 02/16/2016 10:52 PM, Greg Padgett wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> On 02/16/2016 08:50 AM, Rik Theys wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> From the above I conclude that the disk with id that ends with >>>>>>>>> >>>>>>>>> >>>>>>>>> Similar to what I wrote to Marcelo above in the thread, I'd >>>>>>>>> recommend >>>>>>>>> running the "VM disk info gathering tool" attached to [1]. It's >>>>>>>>> the >>>>>>>>> best way to ensure the merge was completed and determine which >>>>>>>>> image >>>>>>>>> is >>>>>>>>> the "bad" one that is no longer in use by any volume chains. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I've ran the disk info gathering tool and this outputs (for the >>>>>>>> affected >>>>>>>> VM): >>>>>>>> >>>>>>>> VM lena >>>>>>>> Disk b2390535-744f-4c02-bdc8-5a897226554b >>>>>>>> (sd:a7ba2db3-517c-408a-8b27-ea45989d6416) >>>>>>>> Volumes: >>>>>>>> 24d78600-22f4-44f7-987b-fbd866736249 >>>>>>>> >>>>>>>> The id of the volume is the ID of the snapshot that is marked >>>>>>>> "illegal". >>>>>>>> So the "bad" image would be the dc39 one, which according to the UI >>>>>>>> is >>>>>>>> in use by the "Active VM" snapshot. Can this make sense? >>>>>>> >>>>>>> >>>>>>> >>>>>>> It looks accurate. Live merges are "backwards" merges, so the merge >>>>>>> would have pushed data from the volume associated with "Active VM" >>>>>>> into the volume associated with the snapshot you're trying to remove. >>>>>>> >>>>>>> Upon completion, we "pivot" so that the VM uses that older volume, >>>>>>> and >>>>>>> we update the engine database to reflect this (basically we >>>>>>> re-associate that older volume with, in your case, "Active VM"). >>>>>>> >>>>>>> In your case, it seems the pivot operation was done, but the database >>>>>>> wasn't updated to reflect it. Given snapshot/image associations >>>>>>> e.g.: >>>>>>> >>>>>>> VM Name Snapshot Name Volume >>>>>>> ------- ------------- ------ >>>>>>> My-VM Active VM 123-abc >>>>>>> My-VM My-Snapshot 789-def >>>>>>> >>>>>>> My-VM in your case is actually running on volume 789-def. If you run >>>>>>> the db fixup script and supply ("My-VM", "My-Snapshot", "123-abc") >>>>>>> (note the volume is the newer, "bad" one), then it will switch the >>>>>>> volume association for you and remove the invalid entries. >>>>>>> >>>>>>> Of course, I'd shut down the VM, and back up the db beforehand. >>>>> >>>>> >>>>> >>>>> I've executed the sql script and it seems to have worked. Thanks! >>>>> >>>>>>> "Active VM" should now be unused; it previously (pre-merge) was the >>>>>>> data written since the snapshot was taken. Normally the larger >>>>>>> actual >>>>>>> size might be from qcow format overhead. If your listing above is >>>>>>> complete (ie one volume for the vm), then I'm not sure why the base >>>>>>> volume would have a larger actual size than virtual size. >>>>>>> >>>>>>> Adam, Nir--any thoughts on this? >>>>>> >>>>>> >>>>>> >>>>>> There is a bug which has caused inflation of the snapshot volumes when >>>>>> performing a live merge. We are submitting fixes for 3.5, 3.6, and >>>>>> master right at this moment. >>>>> >>>>> >>>>> >>>>> Which bug number is assigned to this bug? Will upgrading to a release >>>>> with a fix reduce the disk usage again? >>>> >>>> >>>> >>>> See https://bugzilla.redhat.com/show_bug.cgi?id=1301709 for the bug. >>>> It's about a clone disk failure after the problem occurs. >>>> Unfortunately, there is not an automatic way to repair the raw base >>>> volumes if they were affected by this bug. They will need to be >>>> manually shrunk using lvreduce if you are certain that they are >>>> inflated. >>>> >>>> >>>> -- >>>> Adam Litke >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> Users@ovirt.org >>>> http://lists.ovirt.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> Users mailing list >>> Users@ovirt.org >>> http://lists.ovirt.org/mailman/listinfo/users >>> >> >> _______________________________________________ >> Users mailing list >> Users@ovirt.org >> http://lists.ovirt.org/mailman/listinfo/users > > > -- > Nathanaël Blanchet > > Supervision réseau > Pôle Infrastrutures Informatiques > 227 avenue Professeur-Jean-Louis-Viala > 34193 MONTPELLIER CEDEX 5 > Tél. 33 (0)4 67 54 84 55 > Fax 33 (0)4 67 54 84 14 > blanc...@abes.fr > > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users