On Thu, Apr 9, 2015 at 6:10 PM, Eric Blake <ebl...@redhat.com> wrote:
> On 04/08/2015 11:22 PM, Deepak Shetty wrote: > > + [Cinder] and [Tempest] in the $subject since this affects them too > > > > On Thu, Apr 9, 2015 at 4:22 AM, Eric Blake <ebl...@redhat.com> wrote: > > > >> On 04/08/2015 12:01 PM, Deepak Shetty wrote: > >>> > >>> Questions: > >>> > >>> 1) Is this a valid scenario being tested ? Some say yes, I am not sure, > >>> since the test makes sure that instance is OFF before snap is deleted > and > >>> this doesn't work for fs-backed drivers as they use hyp assisted snap > >> which > >>> needs domain to be active. > >> > >> Logically, it should be possible to delete snapshots when a domain is > >> off (qemu-img can do it, but libvirt has not yet been taught how to > >> manage it, in part because qemu-img is not as friendly as qemu in having > >> a re-connectible Unix socket monitor for tracking long-running > progress). > >> > > > > Is there a bug/feature already opened for this ? > > Libvirt has this bug: https://bugzilla.redhat.com/show_bug.cgi?id=987719 > which tracks generic ability of libvirt to delete snapshots; ideally, > the code to manage snapshots will work for both online and persistent > offline guests, but it may result in splitting the work into multiple bugs. > > I can't access this bug report, it seems "private", I need to authenticate. > > I didn't understand much > > on what you > > mean by re-connectible unix socket :)... are you hinting that qemu-img > > doesn't have > > ability to attach to a qemu / VM process for long time over unix socket ? > > For online guest control, libvirt normally creates a Unix socket, then > starts qemu with its -qmp monitor pointing to that socket. That way, if > libvirtd goes away and then restarts, it can reconnect as a client to > the existing socket file, and qemu never has to know that the person on > the other end changed. With that QMP monitor, libvirt can query qemu's > current state at will, get event notifications when long-running jobs > have finished, and issue commands to terminate long-running jobs early, > even if it is a different libvirtd issuing a later command than the one > that started the command. > > qemu-img, on the other hand, only has the -p option or SIGUSR1 signal > for outputting progress to stderr on a long-running operation (not the > most machine-parseable), but is not otherwise controllable. It does not > have a management connection through a Unix socket. I guess in thinking > about it a bit more, a Unix socket is not essential; as long as the old > libvirtd starts qemu-img in a manner that tracks its pid and collects > stderr reliably, then restarting libvirtd can send SIGUSR1 to the pid > and track the changes to stderr to estimate how far along things are. > > Also, the idea has been proposed that qemu-img is not necessary; libvirt > could use qemu -M none to create a dummy machine with no CPUs and JUST > disk images, and then use the qemu QMP monitor as usual to perform block > operations on those disks by reusing the code it already has working for > online guests. But even this approach needs coding into libvirt. > > -- > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > Hi, I'd like to progress on this issue, so I will spend some time on it. Let's recap. The issue is "deleting a Cinder snapshot that was created during an Nova Instance snapshot (booted from a cinder volume) doesn't work when the original Nova Instance is stopped". This bug only arises when a Cinder driver uses the feature called "QEMU Assisted Snapshots"/live-snapshot. (currently only GlusterFS, but soon generic NFS when https://blueprints.launchpad.net/cinder/+spec/nfs-snapshots gets in). This issue is triggered by the Tempest scenario "test_volume_boot_pattern". This scenario: [does some stuff] 1) Creates a cinder volume from an Cirros Image 2) Boot a Nova Instance on the volume 3) Make a snapshot of this instance (which creates a cinder snapshot because the instance was booted from a volume), using the feature QEMU Assisted Snapshots [do some other stuff] 4) stop the instance created in step 2 then delete the snapshot created in step 3. The deletion of snapshot created in step 3 fails because Nova wants libvirt to do a blockRebase (see https://github.com/openstack/nova/blob/68f6f080b2cddd3d4e97dc25a98e0c84c4979b8a/nova/virt/libvirt/driver.py#L1920 ) For reference, there's a bug targeting Cinder for this : https://bugs.launchpad.net/cinder/+bug/1444806 What I'd like to do, but I am asking your advice first is: Just before doing the call to virt_dom.blockRebase(), check if the domain is running, and if not call "qemu-img rebase -b $rebase_base rebase_disk". (this idea was brought up by Eric Blake in the previous reply). Questions: Is it safe to do so ? Is it the right approach ? (given that I don't really want to wait for libvirt to support blockRebase on offline domain) Thanks a lot ! Jordan
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev