I think regardless of how we ended up with this situation, we're still in a position where we have a public-facing API that could lead to data-corruption when used in a specific way. That should never be the case. I would think re-using the already possible 400 response code to update-volume when used with a multi-attach volume to indicate that it can't be done, without a new microversion, would be the cleaned way of getting out of this pickle.
On Wed, Jun 6, 2018 at 2:55 PM, Jay Pipes <[email protected]> wrote: > On 06/06/2018 07:46 AM, Matthew Booth wrote: >> >> TL;DR I think we need to entirely disable swap volume for multiattach >> volumes, and this will be an api breaking change with no immediate >> workaround. >> >> I was looking through tempest and came across >> >> api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach. >> This test does: >> >> Create 2 multiattach volumes >> Create 2 servers >> Attach volume 1 to both servers >> ** Swap volume 1 for volume 2 on server 1 ** >> Check all is attached as expected >> >> The problem with this is that swap volume is a copy operation. > > > Is it, though? The original blueprint and implementation seem to suggest > that the swap_volume operation was nothing more than changing the mountpoint > for a volume to point to a different location (in a safe > manner that didn't lose any reads or writes). > > https://blueprints.launchpad.net/nova/+spec/volume-swap > > Nothing about the description of swap_volume() in the virt driver interface > mentions swap_volume() being a "copy operation": > > https://github.com/openstack/nova/blob/76ec078d3781fb55c96d7aaca4fb73a74ce94d96/nova/virt/driver.py#L476 > >> We don't just replace one volume with another, we copy the contents >> from one to the other and then do the swap. We do this with a qemu >> drive mirror operation, which is able to do this copy safely without >> needing to make the source read-only because it can also track writes >> to the source and ensure the target is updated again. Here's a link >> to the libvirt logs showing a drive mirror operation during the swap >> volume of an execution of the above test: > > After checking the source code, the libvirt virt driver is the only virt > driver that implements swap_volume(), so it looks to me like a public HTTP > API method was added that was specific to libvirt's implementation of drive > mirroring. Yay, more implementation leaking out through the API. > >> >> http://logs.openstack.org/58/567258/5/check/nova-multiattach/d23fad8/logs/libvirt/libvirtd.txt.gz#_2018-06-04_10_57_05_201 >> >> The problem is that when the volume is attached to more than one VM, >> the hypervisor doing the drive mirror *doesn't* know about writes on >> the other attached VMs, so it can't do that copy safely, and the >> result is data corruption. > > > Would it be possible to swap the volume by doing what Vish originally > described in the blueprint: pause the VM, swap the volume mountpoints > (potentially after migrating the underlying volume), start the VM? > >> > Note that swap volume isn't visible to the >> >> guest os, so this can't be addressed by the user. This is a data >> corrupter, and we shouldn't allow it. However, it is in released code >> and users might be doing it already, so disabling it would be a >> user-visible api change with no immediate workaround. > > > I'd love to know who is actually using the swap_volume() functionality, > actually. I'd especially like to know who is using swap_volume() with > multiattach. > >> However, I think we're attempting to do the wrong thing here anyway, >> and the above tempest test is explicit testing behaviour that we don't >> want. The use case for swap volume is that a user needs to move volume >> data for attached volumes, e.g. to new faster/supported/maintained >> hardware. > > > Is that the use case? > > As was typical, there's no mention of a use case on the original blueprint. > It just says "This feature allows a user or administrator to transparently > swap out a cinder volume that connected to an instance." Which is hardly a > use case since it uses the feature name in a description of the feature > itself. :( > > The commit message (there was only a single commit for this functionality > [1]) mentions overwriting data on the new volume: > > Adds support for transparently swapping an attached volume with > another volume. Note that this overwrites all data on the new volume > with data from the old volume. > > Yes, that is the commit message in its entirety. Of course, the commit had > no documentation at all in it, so there's no ability to understand what the > original use case really was here. > > https://review.openstack.org/#/c/28995/ > > If the use case was really "that a user needs to move volume data for > attached volumes", why not just pause the VM, detach the volume, do a > openstack volume migrate to the new destination, reattach the volume and > start the VM? That would mean no libvirt/QEMU-specific implementation > behaviour leaking out of the public HTTP API and allow the volume service > (Cinder) to do its job properly. > > >> With single attach that's exactly what they get: the end >> user should never notice. With multi-attach they don't get that. We're >> basically forking the shared volume at a point in time, with the >> instance which did the swap writing to the new location while all >> others continue writing to the old location. Except that even the fork >> is broken, because they'll get a corrupt, inconsistent copy rather >> than point in time. I can't think of a use case for this behaviour, >> and it certainly doesn't meet the original design intent. >> >> What they really want is for the multi-attached volume to be copied >> from location a to location b and for all attachments to be updated. >> Unfortunately I don't think we're going to be in a position to do that >> any time soon, but I also think users will be unhappy if they're no >> longer able to move data at all because it's multi-attach. We can >> compromise, though, if we allow a multiattach volume to be moved as >> long as it only has a single attachment. This means the operator can't >> move this data without disruption to users, but at least it's not >> fundamentally immovable. >> >> This would require some cooperation with cinder to achieve, as we need >> to be able to temporarily prevent cinder from allowing new >> attachments. A natural way to achieve this would be to allow a >> multi-attach volume with only a single attachment to be redesignated >> not multiattach, but there might be others. The flow would then be: >> >> Detach volume from server 2 >> Set multiattach=False on volume >> Migrate volume on server 1 >> Set multiattach=True on volume >> Attach volume to server 2 >> >> Combined with a patch to nova to disallow swap_volume on any >> multiattach volume, this would then be possible if inconvenient. >> >> Regardless of any other changes, though, I think it's urgent that we >> disable the ability to swap_volume a multiattach volume because we >> don't want users to start using this relatively new, but broken, >> feature. > > > Or we could deprecate the swap_volume Compute API operation and use Cinder > for all of this. > > But sure, we could also add more cruft to the Compute API and add more > conditional "it works but only when X" docs to the API reference. > > Just my two cents, > -jay > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- -- Artom Lifshitz Software Engineer, OpenStack Compute DFG __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
