Re: [openstack-dev] [Nova] [Cinder] [Tempest] Regarding deleting snapshot when instance is OFF

2015-06-17 Thread Jordan Pittier
On Tue, Jun 16, 2015 at 3:33 PM, Jordan Pittier jordan.pitt...@scality.com
wrote:

 On Thu, Apr 9, 2015 at 6:10 PM, Eric Blake ebl...@redhat.com wrote:

 On 04/08/2015 11:22 PM, Deepak Shetty wrote:
  + [Cinder] and [Tempest] in the $subject since this affects them too
 
  On Thu, Apr 9, 2015 at 4:22 AM, Eric Blake ebl...@redhat.com wrote:
 
  On 04/08/2015 12:01 PM, Deepak Shetty wrote:
 
  Questions:
 
  1) Is this a valid scenario being tested ? Some say yes, I am not
 sure,
  since the test makes sure that instance is OFF before snap is deleted
 and
  this doesn't work for fs-backed drivers as they use hyp assisted snap
  which
  needs domain to be active.
 
  Logically, it should be possible to delete snapshots when a domain is
  off (qemu-img can do it, but libvirt has not yet been taught how to
  manage it, in part because qemu-img is not as friendly as qemu in
 having
  a re-connectible Unix socket monitor for tracking long-running
 progress).
 
 
  Is there a bug/feature already opened for this ?

 Libvirt has this bug: https://bugzilla.redhat.com/show_bug.cgi?id=987719
 which tracks generic ability of libvirt to delete snapshots; ideally,
 the code to manage snapshots will work for both online and persistent
 offline guests, but it may result in splitting the work into multiple
 bugs.


 I can't access this bug report, it seems private, I need to authenticate.


  I didn't understand much
  on what you
  mean by re-connectible unix socket :)... are you hinting that qemu-img
  doesn't have
  ability to attach to a qemu / VM process for long time over unix socket
 ?

 For online guest control, libvirt normally creates a Unix socket, then
 starts qemu with its -qmp monitor pointing to that socket.  That way, if
 libvirtd goes away and then restarts, it can reconnect as a client to
 the existing socket file, and qemu never has to know that the person on
 the other end changed.  With that QMP monitor, libvirt can query qemu's
 current state at will, get event notifications when long-running jobs
 have finished, and issue commands to terminate long-running jobs early,
 even if it is a different libvirtd issuing a later command than the one
 that started the command.

 qemu-img, on the other hand, only has the -p option or SIGUSR1 signal
 for outputting progress to stderr on a long-running operation (not the
 most machine-parseable), but is not otherwise controllable.  It does not
 have a management connection through a Unix socket.  I guess in thinking
 about it a bit more, a Unix socket is not essential; as long as the old
 libvirtd starts qemu-img in a manner that tracks its pid and collects
 stderr reliably, then restarting libvirtd can send SIGUSR1 to the pid
 and track the changes to stderr to estimate how far along things are.

 Also, the idea has been proposed that qemu-img is not necessary; libvirt
 could use qemu -M none to create a dummy machine with no CPUs and JUST
 disk images, and then use the qemu QMP monitor as usual to perform block
 operations on those disks by reusing the code it already has working for
 online guests.  But even this approach needs coding into libvirt.

 --
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe:
 openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


 Hi,
 I'd like to progress on this issue, so I will spend some time on it.

 Let's recap. The issue is deleting a Cinder snapshot that was created
 during an Nova Instance snapshot (booted from a cinder volume) doesn't work
 when the original Nova Instance is stopped. This bug only arises when a
 Cinder driver uses the feature called QEMU Assisted
 Snapshots/live-snapshot. (currently only GlusterFS, but soon generic NFS
 when https://blueprints.launchpad.net/cinder/+spec/nfs-snapshots gets in).

 This issue is triggered by the Tempest scenario
 test_volume_boot_pattern. This scenario:
 [does some stuff]
 1) Creates a cinder volume from an Cirros Image
 2) Boot a Nova Instance on the volume
 3) Make a snapshot of this instance (which creates a cinder snapshot
 because the instance was booted from a volume), using the feature QEMU
 Assisted Snapshots
 [do some other stuff]
 4) stop the instance created in step 2 then delete the snapshot created in
 step 3.

 The deletion of snapshot created in step 3 fails because Nova wants
 libvirt to do a blockRebase (see
 https://github.com/openstack/nova/blob/68f6f080b2cddd3d4e97dc25a98e0c84c4979b8a/nova/virt/libvirt/driver.py#L1920
 )

 For reference, there's a bug targeting Cinder for this :
 https://bugs.launchpad.net/cinder/+bug/1444806

 What I'd like to do, but I am asking your advice first is:
 Just before doing the call to virt_dom.blockRebase(), check if the domain
 is running, and if 

Re: [openstack-dev] [Nova] [Cinder] [Tempest] Regarding deleting snapshot when instance is OFF

2015-06-16 Thread Jordan Pittier
On Thu, Apr 9, 2015 at 6:10 PM, Eric Blake ebl...@redhat.com wrote:

 On 04/08/2015 11:22 PM, Deepak Shetty wrote:
  + [Cinder] and [Tempest] in the $subject since this affects them too
 
  On Thu, Apr 9, 2015 at 4:22 AM, Eric Blake ebl...@redhat.com wrote:
 
  On 04/08/2015 12:01 PM, Deepak Shetty wrote:
 
  Questions:
 
  1) Is this a valid scenario being tested ? Some say yes, I am not sure,
  since the test makes sure that instance is OFF before snap is deleted
 and
  this doesn't work for fs-backed drivers as they use hyp assisted snap
  which
  needs domain to be active.
 
  Logically, it should be possible to delete snapshots when a domain is
  off (qemu-img can do it, but libvirt has not yet been taught how to
  manage it, in part because qemu-img is not as friendly as qemu in having
  a re-connectible Unix socket monitor for tracking long-running
 progress).
 
 
  Is there a bug/feature already opened for this ?

 Libvirt has this bug: https://bugzilla.redhat.com/show_bug.cgi?id=987719
 which tracks generic ability of libvirt to delete snapshots; ideally,
 the code to manage snapshots will work for both online and persistent
 offline guests, but it may result in splitting the work into multiple bugs.


I can't access this bug report, it seems private, I need to authenticate.


  I didn't understand much
  on what you
  mean by re-connectible unix socket :)... are you hinting that qemu-img
  doesn't have
  ability to attach to a qemu / VM process for long time over unix socket ?

 For online guest control, libvirt normally creates a Unix socket, then
 starts qemu with its -qmp monitor pointing to that socket.  That way, if
 libvirtd goes away and then restarts, it can reconnect as a client to
 the existing socket file, and qemu never has to know that the person on
 the other end changed.  With that QMP monitor, libvirt can query qemu's
 current state at will, get event notifications when long-running jobs
 have finished, and issue commands to terminate long-running jobs early,
 even if it is a different libvirtd issuing a later command than the one
 that started the command.

 qemu-img, on the other hand, only has the -p option or SIGUSR1 signal
 for outputting progress to stderr on a long-running operation (not the
 most machine-parseable), but is not otherwise controllable.  It does not
 have a management connection through a Unix socket.  I guess in thinking
 about it a bit more, a Unix socket is not essential; as long as the old
 libvirtd starts qemu-img in a manner that tracks its pid and collects
 stderr reliably, then restarting libvirtd can send SIGUSR1 to the pid
 and track the changes to stderr to estimate how far along things are.

 Also, the idea has been proposed that qemu-img is not necessary; libvirt
 could use qemu -M none to create a dummy machine with no CPUs and JUST
 disk images, and then use the qemu QMP monitor as usual to perform block
 operations on those disks by reusing the code it already has working for
 online guests.  But even this approach needs coding into libvirt.

 --
 Eric Blake   eblake redhat com+1-919-301-3266
 Libvirt virtualization library http://libvirt.org


 __
 OpenStack Development Mailing List (not for usage questions)
 Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Hi,
I'd like to progress on this issue, so I will spend some time on it.

Let's recap. The issue is deleting a Cinder snapshot that was created
during an Nova Instance snapshot (booted from a cinder volume) doesn't work
when the original Nova Instance is stopped. This bug only arises when a
Cinder driver uses the feature called QEMU Assisted
Snapshots/live-snapshot. (currently only GlusterFS, but soon generic NFS
when https://blueprints.launchpad.net/cinder/+spec/nfs-snapshots gets in).

This issue is triggered by the Tempest scenario test_volume_boot_pattern.
This scenario:
[does some stuff]
1) Creates a cinder volume from an Cirros Image
2) Boot a Nova Instance on the volume
3) Make a snapshot of this instance (which creates a cinder snapshot
because the instance was booted from a volume), using the feature QEMU
Assisted Snapshots
[do some other stuff]
4) stop the instance created in step 2 then delete the snapshot created in
step 3.

The deletion of snapshot created in step 3 fails because Nova wants libvirt
to do a blockRebase (see
https://github.com/openstack/nova/blob/68f6f080b2cddd3d4e97dc25a98e0c84c4979b8a/nova/virt/libvirt/driver.py#L1920
)

For reference, there's a bug targeting Cinder for this :
https://bugs.launchpad.net/cinder/+bug/1444806

What I'd like to do, but I am asking your advice first is:
Just before doing the call to virt_dom.blockRebase(), check if the domain
is running, and if not call qemu-img rebase -b $rebase_base rebase_disk.
(this idea was brought up by Eric Blake in the previous reply).


Re: [openstack-dev] [Nova] [Cinder] [Tempest] Regarding deleting snapshot when instance is OFF

2015-04-09 Thread Eric Blake
On 04/08/2015 11:22 PM, Deepak Shetty wrote:
 + [Cinder] and [Tempest] in the $subject since this affects them too
 
 On Thu, Apr 9, 2015 at 4:22 AM, Eric Blake ebl...@redhat.com wrote:
 
 On 04/08/2015 12:01 PM, Deepak Shetty wrote:

 Questions:

 1) Is this a valid scenario being tested ? Some say yes, I am not sure,
 since the test makes sure that instance is OFF before snap is deleted and
 this doesn't work for fs-backed drivers as they use hyp assisted snap
 which
 needs domain to be active.

 Logically, it should be possible to delete snapshots when a domain is
 off (qemu-img can do it, but libvirt has not yet been taught how to
 manage it, in part because qemu-img is not as friendly as qemu in having
 a re-connectible Unix socket monitor for tracking long-running progress).

 
 Is there a bug/feature already opened for this ?

Libvirt has this bug: https://bugzilla.redhat.com/show_bug.cgi?id=987719
which tracks generic ability of libvirt to delete snapshots; ideally,
the code to manage snapshots will work for both online and persistent
offline guests, but it may result in splitting the work into multiple bugs.

 I didn't understand much
 on what you
 mean by re-connectible unix socket :)... are you hinting that qemu-img
 doesn't have
 ability to attach to a qemu / VM process for long time over unix socket ?

For online guest control, libvirt normally creates a Unix socket, then
starts qemu with its -qmp monitor pointing to that socket.  That way, if
libvirtd goes away and then restarts, it can reconnect as a client to
the existing socket file, and qemu never has to know that the person on
the other end changed.  With that QMP monitor, libvirt can query qemu's
current state at will, get event notifications when long-running jobs
have finished, and issue commands to terminate long-running jobs early,
even if it is a different libvirtd issuing a later command than the one
that started the command.

qemu-img, on the other hand, only has the -p option or SIGUSR1 signal
for outputting progress to stderr on a long-running operation (not the
most machine-parseable), but is not otherwise controllable.  It does not
have a management connection through a Unix socket.  I guess in thinking
about it a bit more, a Unix socket is not essential; as long as the old
libvirtd starts qemu-img in a manner that tracks its pid and collects
stderr reliably, then restarting libvirtd can send SIGUSR1 to the pid
and track the changes to stderr to estimate how far along things are.

Also, the idea has been proposed that qemu-img is not necessary; libvirt
could use qemu -M none to create a dummy machine with no CPUs and JUST
disk images, and then use the qemu QMP monitor as usual to perform block
operations on those disks by reusing the code it already has working for
online guests.  But even this approach needs coding into libvirt.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cinder] [Tempest] Regarding deleting snapshot when instance is OFF

2015-04-08 Thread Deepak Shetty
+ [Cinder] and [Tempest] in the $subject since this affects them too

On Thu, Apr 9, 2015 at 4:22 AM, Eric Blake ebl...@redhat.com wrote:

 On 04/08/2015 12:01 PM, Deepak Shetty wrote:
 
  Questions:
 
  1) Is this a valid scenario being tested ? Some say yes, I am not sure,
  since the test makes sure that instance is OFF before snap is deleted and
  this doesn't work for fs-backed drivers as they use hyp assisted snap
 which
  needs domain to be active.

 Logically, it should be possible to delete snapshots when a domain is
 off (qemu-img can do it, but libvirt has not yet been taught how to
 manage it, in part because qemu-img is not as friendly as qemu in having
 a re-connectible Unix socket monitor for tracking long-running progress).


Is there a bug/feature already opened for this ? I didn't understand much
on what you
mean by re-connectible unix socket :)... are you hinting that qemu-img
doesn't have
ability to attach to a qemu / VM process for long time over unix socket ?

Looks like many believe that this should be a valid scenario but it
currently breaks the
fs-backed cinder drivers as the testcase proves.



 
 
  2) If this is valid scenario, then it means libvirt.py in nova should be
  modified NOT to raise error, but continue with the snap delete (as if
  volume was not attached) and take care of the dom xml (so that domain is
  still bootable post snap deletion), is this the way to go ?

 Obviously, it would be nice to get libvirt to support offline snapshot
 deletion, but until then, upper layers will have to work around
 libvirt's shortcomings.  I don't know if that helps answer your
 questions, though.


Thanks, it does in a way.

Q to Tempest folks,
Given that libvirt doesn't support this scenario yet, can fs-backed
cinder drivers affected
by this be able to skip this testcase (using storage_protocol = 'glusterfs'
for
gluster case) until either this is supported by libvirt or some workaround
in Nova is
decided upon ?

Appreciate your inputs.

thanx,
deepak
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev