Reviewed: https://review.opendev.org/649951 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=56ca4d32ddf944b541b8a6c46f07275e7d8472bc Submitter: Zuul Branch: master
commit 56ca4d32ddf944b541b8a6c46f07275e7d8472bc Author: Lee Yarwood <[email protected]> Date: Thu Apr 4 09:09:04 2019 +0100 libvirt: Avoid using os-brick encryptors when device_path isn't provided When disconnecting an encrypted volume the Libvirt driver uses the presence of a Libvirt secret associated with the volume to determine if the new style native QEMU LUKS decryption or original decryption method using os-brick encrytors is used. While this works well in most deployments some issues have been observed in Kolla based environments where the Libvirt secrets are not fully persisted between host reboots or container upgrades. This can lead to _detach_encryptor attempting to build an encryptor which will fail if the associated connection_info for the volume does not contain a device_path, such as in the case for encrypted rbd volumes. This change adds a simple conditional to _detach_encryptor to ensure we return when device_path is not present in connection_info and native QEMU LUKS decryption is available. This handles the specific use case where we are certain that the encrypted volume was never decrypted using the os-brick encryptors, as these require a local block device on the compute host and have thus never supported rbd. It is still safe to build an encryptor and call detach_volume when a device_path is present however as change I9f52f89b8466d036 made such calls idempotent within os-brick. Change-Id: Id670f13a7f197e71c77dc91276fc2fba2fc5f314 Closes-bug: #1821696 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1821696 Title: Failed to start instances with encrypted volumes Status in kolla-ansible: In Progress Status in kolla-ansible rocky series: New Status in kolla-ansible stein series: In Progress Status in OpenStack Compute (nova): Fix Released Bug description: Description =========== We hit this bug after doing a complete cluster shutdown due to server room maintenance. The bug is however more easily reproducible. When cold starting an instance with an encrypted volume attached, it fails so start with a VolumeEncryptionNotSupported error. https://github.com/openstack/os- brick/blob/stable/rocky/os_brick/encryptors/cryptsetup.py#L52 Steps to reproduce ================== * Deploy Openstack with Barbican support using Kolla. * Create an encrypted volume type * Create an encrypted volume * Create an instance and attach the encrypted folder * Enjoy your new instance and volume, install software and store data * In our case, we shut down the entire cluster and restarted it again. First all instances were stopped in Horizon using Shut down instance command. We use Ceph so we then stopped that using these procedures https://ceph.com/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/ and then shut down the compute / storage nodes and then the controller nodes one by one. Then we started the cluster in the reverse order, verified all services were up and running, examined logs and then started the instances. * Instances without encrypted volumes started fine. * Instances with encrypted volumes fail to start with VolumeEncryptionNotSupported. Note: It is possible to recreate the problem by using a Hard Reboot (possibly related https://bugs.launchpad.net/nova/+bug/1597234) or by shutting down instances and then restarting all Openstack services on that compute node. Expected results ================ Instances with encrypted volumes should start fine, even after a Hard Reboot or a complete cluster shutdown. Actual results ============== Instances with encrypted volumes failed to start with VolumeEncryptionNotSupported https://pastebin.com/mvMbJQRb Environment =========== 1. Openstack version Environment is established by Kolla (Rocky release). 2. Hypervisor KVM on RHEL 3. Storage type Ceph using Kolla (Rocky release) Analysis ======== There seems to be a problem related to this code not behaving as expected: https://github.com/openstack/nova/blob/stable/rocky/nova/virt/libvirt/driver.py#L1049 It seems that it is expected that the exception should be ignored and logged, but for some reason, the `ctxt.reraise = False` does not work as expected: self.force_reraise() is called in https://github.com/openstack/oslo.utils/blob/stable/rocky/oslo_utils/excutils.py#L220 which it should not have hit since `reraise` is expected to be `False`. We did some hacking and just swallowed the exception by commenting out the `excutils.save_and_reraise_exception()` section and replacing it with a simple `pass`. Then the instance booted - but it could not boot from the image. But, it was then possible to remove the encrypted volume attachment, reboot the server and then reattach the encrypted volume. To manage notifications about this bug go to: https://bugs.launchpad.net/kolla-ansible/+bug/1821696/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

