Author: Eric M Gonzalez <e...@awnix.com>
Date: Mon Nov 13 14:02:27 2017 -0600
unquiesce instance on volume snapshot failure
This patch adds an exception catch to "snapshot_volume_backed()" of
compute/api.py that catches (at the moment) _all_ exceptions from the
underlying cinderclient. Previously, if the instance is quiesced ( frozen
filesystem ) then the exception will break execution of the function,
skipping the needed unquiesce, and leave the instance in a frozen state.
Now, the exception catch will unquiesce the instance if it was prior to
Got a unit test in place with the help of Matt Riedemann.
** Changed in: nova
Status: In Progress => Fix Released
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
nova snapshot_volume_backed failure does not thaw filesystems
Status in OpenStack Compute (nova):
Status in OpenStack Compute (nova) ocata series:
Status in OpenStack Compute (nova) pike series:
Status in OpenStack Compute (nova) queens series:
Noticed in OpenStack Mitaka (commit 9825c80), but the function
(snapshot_volume_backed) is unchanged as of commit a4fc1bcd. backends:
Libvirt + Ceph.
When Nova attempts to create an image / snapshot of a volume-backed
instance it first quiesces the instance in `snapshot_volume_backed()`.
It then loops over all of the block devices associated with that
instance. However, there is no exception handling in the for loop and
any failures on the part of Cinder are bubbled up and through the
`snapshot_volume_backed()` function. This causes the needed
`unquiesce()` to never be called on the instance, leaving it in an
inconsistent (read-only) state. This can cause operational errors in
the instance leaving it unusable.
In my case, the steps for reproduction are:
1) nova create image / ( "create snapshot" via horizon )
2) nova/compute/api snapshot_volume_backed() calls quiesce
3) "qemu-ga: info: guest-fsfreeze called" is seen in instance
4) cinder fails snapshot of volume due to OverLimit
5) cinder raises OverLimit
6) snapshot_volume_backed() never finishes due to OverLimit
7) filesystem is never thawed
8) instance unusable
I am in the process of writing and testing a patch and will have a
review for it soon.
To manage notifications about this bug go to:
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : firstname.lastname@example.org
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help : https://help.launchpad.net/ListHelp