TL;DR I think we need to entirely disable swap volume for multiattach volumes, and this will be an api breaking change with no immediate workaround.
I was looking through tempest and came across api.compute.admin.test_volume_swap.TestMultiAttachVolumeSwap.test_volume_swap_with_multiattach. This test does: Create 2 multiattach volumes Create 2 servers Attach volume 1 to both servers ** Swap volume 1 for volume 2 on server 1 ** Check all is attached as expected The problem with this is that swap volume is a copy operation. We don't just replace one volume with another, we copy the contents from one to the other and then do the swap. We do this with a qemu drive mirror operation, which is able to do this copy safely without needing to make the source read-only because it can also track writes to the source and ensure the target is updated again. Here's a link to the libvirt logs showing a drive mirror operation during the swap volume of an execution of the above test: http://logs.openstack.org/58/567258/5/check/nova-multiattach/d23fad8/logs/libvirt/libvirtd.txt.gz#_2018-06-04_10_57_05_201 The problem is that when the volume is attached to more than one VM, the hypervisor doing the drive mirror *doesn't* know about writes on the other attached VMs, so it can't do that copy safely, and the result is data corruption. Note that swap volume isn't visible to the guest os, so this can't be addressed by the user. This is a data corrupter, and we shouldn't allow it. However, it is in released code and users might be doing it already, so disabling it would be a user-visible api change with no immediate workaround. However, I think we're attempting to do the wrong thing here anyway, and the above tempest test is explicit testing behaviour that we don't want. The use case for swap volume is that a user needs to move volume data for attached volumes, e.g. to new faster/supported/maintained hardware. With single attach that's exactly what they get: the end user should never notice. With multi-attach they don't get that. We're basically forking the shared volume at a point in time, with the instance which did the swap writing to the new location while all others continue writing to the old location. Except that even the fork is broken, because they'll get a corrupt, inconsistent copy rather than point in time. I can't think of a use case for this behaviour, and it certainly doesn't meet the original design intent. What they really want is for the multi-attached volume to be copied from location a to location b and for all attachments to be updated. Unfortunately I don't think we're going to be in a position to do that any time soon, but I also think users will be unhappy if they're no longer able to move data at all because it's multi-attach. We can compromise, though, if we allow a multiattach volume to be moved as long as it only has a single attachment. This means the operator can't move this data without disruption to users, but at least it's not fundamentally immovable. This would require some cooperation with cinder to achieve, as we need to be able to temporarily prevent cinder from allowing new attachments. A natural way to achieve this would be to allow a multi-attach volume with only a single attachment to be redesignated not multiattach, but there might be others. The flow would then be: Detach volume from server 2 Set multiattach=False on volume Migrate volume on server 1 Set multiattach=True on volume Attach volume to server 2 Combined with a patch to nova to disallow swap_volume on any multiattach volume, this would then be possible if inconvenient. Regardless of any other changes, though, I think it's urgent that we disable the ability to swap_volume a multiattach volume because we don't want users to start using this relatively new, but broken, feature. Matt -- Matthew Booth Red Hat OpenStack Engineer, Compute DFG Phone: +442070094448 (UK) __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
