Reviewed: https://review.openstack.org/383859 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4aa39c44a4b08ee4e05548d5c258e795089b2bdd Submitter: Jenkins Branch: master
commit 4aa39c44a4b08ee4e05548d5c258e795089b2bdd Author: Matthew Booth <mbo...@redhat.com> Date: Fri Oct 7 19:14:38 2016 +0100 libvirt: Fix races with nfs volume mount/umount A single nfs export typically contains multiple volumes. We were handling this in the libvirt driver by: 1. On mount, we 'ensure' the mount is available, so we don't fail if another instance already has it mounted. 2. On umount, we trap and ignore 'device is busy' so we don't fail if another instance is already using it. Unfortunately, while this works for serial mounts and unmounts, there are multiple failure cases when volumes from the same export are mounted and unmounted simultaneously. It causes an error if an instance is stopped: as the qemu process is not actively using the mountpoint it will not prevent an unmount for another volume on the same mountpoint from succeeding. It will not be possible to restart the instance, because its mountpoint will not be mounted. To fix this, we create a singleton manager object, which tracks mounts and umount requests per export, and calls the real mount/umount only when required. It uses per-export locks to allow concurrency while avoiding races. Because we now expect to know the state of the host at all times, we no longer need to execute speculative mount/umount commands. As we track attachments (a mapping from volume to instance) rather than volumes, we also gracefully support multi-attach. This change implements this for nfs, but the solution is intended to be extended to all LibvirtBaseFileSystemVolumeDrivers. Closes-Bug: #1421550 Change-Id: I3155984d76df06371a6c45f633aa448168a96d64 ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1421550 Title: Creating VM image fails under the race condition with detaching volume Status in OpenStack Compute (nova): Fix Released Bug description: Environment: nova 2014.2.1 cinder 2014.2.1 Ubuntu 14.04 LTS Cinder volumes whose backend is NFS are used. There are two 'ACTIVE' VM instances on the same compute node. Creating VM image(VM snapshot) fails under the race condition with detaching volume for the other VM instance. In creating VM image, starting VM instance fails(remains 'SHUTOFF' state) and the VM image is deleted. nova-compute's log is as follows: --------------------------------------------------------------------------------------------------------------- 2015-01-26 10:28:47,000.744 11535 ERROR nova.virt.libvirt.driver [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] Error launching a defined domain with XML: <domain type='kvm'> (snipped...) 2015-01-26 10:28:47,000.767 11535 DEBUG nova.compute.manager [req-7ee6f579-63f5-4822-a2b3-10e53bb1dce0 None] [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Cleaning up image e8c3255c-b4e8-4324-addb-365c5d7b1868 decorated_function /usr/lib/python2.7/dist-packages/nova/compute/manager.py:373 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] Traceback (most recent call last): 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 369, in decorated_function 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] *args, **kwargs) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3027, in snapshot_instance 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] task_states.IMAGE_SNAPSHOT) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 3058, in _snapshot_instance 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] update_task_state) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 1733, in snapshot 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] new_dom = self._create_domain(domain=virt_dom) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4338, in _create_domain 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] LOG.error(err) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/openstack/common/excutils.py", line 82, in __exit__ 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] six.reraise(self.type_, self.value, self.tb) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 4329, in _create_domain 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] domain.createWithFlags(launch_flags) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] result = proxy_call(self._autowrap, f, *args, **kwargs) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] rv = execute(f, *args, **kwargs) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] six.reraise(c, e, tb) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] rv = meth(*args, **kwargs) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] File "/usr/lib/python2.7/dist-packages/libvirt.py", line 896, in createWithFlags 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] libvirtError: Failed to open file '/var/lib/nova/mnt/339b9d35866664794a8155657a049127/volume-27f59813-76f0-4b56-ba42-75922537c36c': No such file or directory 2015-01-26 10:28:47,000.767 11535 TRACE nova.compute.manager [instance: d613cfc9-109a-4920-bbc1-41ce4146ace0] --------------------------------------------------------------------------------------------------------------- In detaching volume, umounting NFS is performed without checking whether the other VM instance is being attached volumes or not. So if the other VM is stopped, umounting NFS succeeds. If there are no processes using the NFS directory, the NFS directory is umounted. nova/virt/libvirt/volumes.py(2014.2.1): --------------------------------------------------------------------------------------------------------------- class LibvirtNFSVolumeDriver(LibvirtBaseVolumeDriver): (snipped...) def disconnect_volume(self, connection_info, disk_dev): """Disconnect the volume.""" export = connection_info['data']['export'] mount_path = os.path.join(CONF.libvirt.nfs_mount_point_base, utils.get_hash_str(export)) try: utils.execute('umount', mount_path, run_as_root=True) except processutils.ProcessExecutionError as exc: if ('device is busy' in exc.message or 'target is busy' in exc.message): LOG.debug("The NFS share %s is still in use.", export) else: LOG.exception(_LE("Couldn't unmount the NFS share %s"), export) --------------------------------------------------------------------------------------------------------------- * This code has been added in https://review.openstack.org/#/c/76558/. A VM instance is stopped once by creating VM image. And then detaching volume for the other VM instance on the same compute node is executed. If there are no VMs connecting cinder volumes, umounting NFS directory succeeds. After VM snapshot is completed, the VM instance is restarted. But the VM instance cannot access volumes because NFS directory has been umounted. So the error occurs and the VM instance cannot be restarted. And this issue also occurs under the race condition with starting a VM instance and detaching volumes for another VM instance('ACTIVE') on the same compute node. 1. _connect_volume in starting a VM instance(mount NFS directory if not mounted.) 2. _disconnect_volume in detaching volume(umount NFS directory if no processes use it.) 3. The libvirt domain starts in starting a VM instance To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1421550/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp