Reviewed: https://review.openstack.org/519464 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=bca425a33f52584051348a3ace832be8151299a7 Submitter: Zuul Branch: master
commit bca425a33f52584051348a3ace832be8151299a7 Author: Eric M Gonzalez <[email protected]> Date: Mon Nov 13 14:02:27 2017 -0600 unquiesce instance on volume snapshot failure This patch adds an exception catch to "snapshot_volume_backed()" of compute/api.py that catches (at the moment) _all_ exceptions from the underlying cinderclient. Previously, if the instance is quiesced ( frozen filesystem ) then the exception will break execution of the function, skipping the needed unquiesce, and leave the instance in a frozen state. Now, the exception catch will unquiesce the instance if it was prior to the failure. Got a unit test in place with the help of Matt Riedemann. test_snapshot_volume_backed_with_quiesce_create_snap_fails Change-Id: I60de179c72eede6746696f29462ee9d805dace47 Closes-bug: #1731986 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1731986 Title: nova snapshot_volume_backed failure does not thaw filesystems Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) ocata series: Confirmed Status in OpenStack Compute (nova) pike series: In Progress Status in OpenStack Compute (nova) queens series: In Progress Bug description: Noticed in OpenStack Mitaka (commit 9825c80), but the function (snapshot_volume_backed) is unchanged as of commit a4fc1bcd. backends: Libvirt + Ceph. When Nova attempts to create an image / snapshot of a volume-backed instance it first quiesces the instance in `snapshot_volume_backed()`. It then loops over all of the block devices associated with that instance. However, there is no exception handling in the for loop and any failures on the part of Cinder are bubbled up and through the `snapshot_volume_backed()` function. This causes the needed `unquiesce()` to never be called on the instance, leaving it in an inconsistent (read-only) state. This can cause operational errors in the instance leaving it unusable. In my case, the steps for reproduction are: 1) nova create image / ( "create snapshot" via horizon ) 2) nova/compute/api snapshot_volume_backed() calls quiesce 3) "qemu-ga: info: guest-fsfreeze called" is seen in instance 4) cinder fails snapshot of volume due to OverLimit 5) cinder raises OverLimit 6) snapshot_volume_backed() never finishes due to OverLimit 7) filesystem is never thawed 8) instance unusable I am in the process of writing and testing a patch and will have a review for it soon. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1731986/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

