I've submitted one more patch with potential fix: https://review.openstack.org/#/c/326258/
Timofey On Mon, Jun 6, 2016 at 11:58 PM, Timofei Durakov <[email protected]> wrote: > On Mon, Jun 6, 2016 at 11:26 PM, Matt Riedemann < > [email protected]> wrote: > >> On 6/6/2016 12:15 PM, Matt Riedemann wrote: >> >>> On 1/8/2016 12:28 PM, Mark McLoughlin wrote: >>> >>>> On Fri, 2016-01-08 at 14:11 +0000, Daniel P. Berrange wrote: >>>> >>>>> On Thu, Jan 07, 2016 at 09:07:00PM +0000, Mark McLoughlin wrote: >>>>> >>>>>> On Thu, 2016-01-07 at 12:23 +0100, Sahid Orentino Ferdjaoui >>>>>> wrote: >>>>>> >>>>>>> On Mon, Jan 04, 2016 at 09:12:06PM +0000, Mark McLoughlin >>>>>>> wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> commit 8ecf93e[1] got me thinking - the live_migration_flag >>>>>>>> config option unnecessarily allows operators choose arbitrary >>>>>>>> behavior of the migrateToURI() libvirt call, to the extent >>>>>>>> that we allow the operator to configure a behavior that can >>>>>>>> result in data loss[1]. >>>>>>>> >>>>>>>> I see that danpb recently said something similar: >>>>>>>> >>>>>>>> https://review.openstack.org/171098 >>>>>>>> >>>>>>>> "Honestly, I wish we'd just kill off 'live_migration_flag' >>>>>>>> and 'block_migration_flag' as config options. We really >>>>>>>> should not be exposing low level libvirt API flags as admin >>>>>>>> tunable settings. >>>>>>>> >>>>>>>> Nova should really be in charge of picking the correct set of >>>>>>>> flags for the current libvirt version, and the operation it >>>>>>>> needs to perform. We might need to add other more sensible >>>>>>>> config options in their place [..]" >>>>>>>> >>>>>>> >>>>>>> Nova should really handle internal flags and this serie is >>>>>>> running in the right way. >>>>>>> >>>>>>> ... >>>>>>>> >>>>>>> >>>>>>> 4) Add a new config option for tunneled versus native: >>>>>>>> >>>>>>>> [libvirt] live_migration_tunneled = true >>>>>>>> >>>>>>>> This enables the use of the VIR_MIGRATE_TUNNELLED flag. We >>>>>>>> have historically defaulted to tunneled mode because it >>>>>>>> requires the least configuration and is currently the only >>>>>>>> way to have a secure migration channel. >>>>>>>> >>>>>>>> danpb's quote above continues with: >>>>>>>> >>>>>>>> "perhaps a "live_migration_secure_channel" to indicate that >>>>>>>> migration must use encryption, which would imply use of >>>>>>>> TUNNELLED flag" >>>>>>>> >>>>>>>> So we need to discuss whether the config option should >>>>>>>> express the choice of tunneled vs native, or whether it >>>>>>>> should express another choice which implies tunneled vs >>>>>>>> native. >>>>>>>> >>>>>>>> https://review.openstack.org/263434 >>>>>>>> >>>>>>> >>>>>>> We probably have to consider that operator does not know much >>>>>>> about internal libvirt flags, so options we are exposing for >>>>>>> him should reflect benefice of using them. I commented on your >>>>>>> review we should at least explain benefice of using this option >>>>>>> whatever the name is. >>>>>>> >>>>>> >>>>>> As predicted, plenty of discussion on this point in the review >>>>>> :) >>>>>> >>>>>> You're right that we don't give the operator any guidance in the >>>>>> help message about how to choose true or false for this: >>>>>> >>>>>> Whether to use tunneled migration, where migration data is >>>>>> transported over the libvirtd connection. If True, we use the >>>>>> VIR_MIGRATE_TUNNELLED migration flag >>>>>> >>>>>> libvirt's own docs on this are here: >>>>>> >>>>>> https://libvirt.org/migration.html#transport >>>>>> >>>>>> which emphasizes: >>>>>> >>>>>> - the data copies involved in tunneling - the extra configuration >>>>>> steps required for native - the encryption support you get when >>>>>> tunneling >>>>>> >>>>>> The discussions I've seen on this topic wrt Nova have revolved >>>>>> around: >>>>>> >>>>>> - that tunneling allows for an encrypted transport[1] - that >>>>>> qemu's NBD based drive-mirror block migration isn't supported >>>>>> using tunneled mode, and that danpb is working on fixing this >>>>>> limitation in libvirt - "selective" block migration[2] won't work >>>>>> with the fallback qemu block migration support, and so won't >>>>>> currently work in tunneled mode >>>>>> >>>>> >>>>> I'm not working on fixing it, but IIRC some other dev had proposed >>>>> patches. >>>>> >>>>> >>>>>> So, the advise to operators would be: >>>>>> >>>>>> - You may want to choose tunneled=False for improved block >>>>>> migration capabilities, but this limitation will go away in >>>>>> future. - You may want to choose tunneled=False if you wish to >>>>>> trade and encrypted transport for a (potentially negligible) >>>>>> performance improvement. >>>>>> >>>>>> Does that make sense? >>>>>> >>>>>> As for how to name the option, and as I said in the review, I >>>>>> think it makes sense to be straightforward here and make it >>>>>> clearly about choosing to disable libvirt's tunneled transport. >>>>>> >>>>>> If we name it any other way, I think our explanation for >>>>>> operators will immediately jump to explaining (a) that it >>>>>> influences the TUNNELLED flag, and (b) the differences between >>>>>> the tunneled and native transports. So, if we're going to have to >>>>>> talk about tunneled versus native, why obscure that detail? >>>>>> >>>>> >>>>> Ultimately we need to recognise that libvirt's tunnelled mode was >>>>> added as a hack, to work around fact that QEMU lacked any kind of >>>>> native security capabilities & didn't appear likely to ever get >>>>> them at that time. As well as not working with modern NBD based >>>>> block device encryption, it really sucks for performance because it >>>>> introduces many extra data copies. So it is going to be quite poor >>>>> for large VMs with heavy rate of data dirtying. >>>>> >>>>> The only long term relative "benefit" of tunnelled mode is that it >>>>> avoids the need to open extra firewall ports. >>>>> >>>>> IMHO, the long term future is to *never* use tunnelled mode for >>>>> QEMU. This will be viable when my support for native TLS support in >>>>> QEMU migration + NBD protocols is merged. I'm hopeful this wil be >>>>> for QEMU 2.6 >>>>> >>>>> But, Pawel strongly disagrees. >>>>>> >>>>>> One last point I'd make is this isn't about adding a *new* >>>>>> configuration capability for operators. As we deprecate and >>>>>> remove these configuration options, we need to be careful not to >>>>>> remove a capability that operators are currently depending on for >>>>>> arguably reasonable reasons. >>>>>> >>>>> >>>>> My view is that "live_migration_tunneled" is a reasonable >>>>> parameter to add, because there is a genuine need to let admins >>>>> choose this behaviour. We should make sure it is correctly done as >>>>> a tri-state flag though, when it is 'None', Nova should pick what >>>>> it things is the best approach based on QEMU version. Probably to >>>>> use QEMU native when it supports TLS, otherwise use tunnelled if >>>>> possible to get security. >>>>> >>>> >>>> Great feedback. I buy that. >>>> >>>> [1] - https://review.openstack.org/#/c/171098/ [2] - >>>>>> https://review.openstack.org/#/c/227278 >>>>>> >>>>>> >>>>>> 5) Add a new config option for additional migration flags: >>>>>>>> >>>>>>>> [libvirt] live_migration_extra_flags = >>>>>>>> VIR_MIGRATE_COMPRESSED >>>>>>>> >>>>>>>> This allows operators to continue to experiment with libvirt >>>>>>>> behaviors in safe ways without each use case having to be >>>>>>>> accounted for. >>>>>>>> >>>>>>>> https://review.openstack.org/263435 >>>>>>>> >>>>>>>> We would disallow setting the following flags via this >>>>>>>> option: >>>>>>>> >>>>>>>> VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_TUNNELLED >>>>>>>> VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE >>>>>>>> VIR_MIGRATE_NON_SHARED_INC VIR_MIGRATE_NON_SHARED_DISK >>>>>>>> >>>>>>>> which would allow the following currently available flags to >>>>>>>> be set: >>>>>>>> >>>>>>> >>>>>>> VIR_MIGRATE_PAUSED VIR_MIGRATE_CHANGE_PROTECTION >>>>>>>> VIR_MIGRATE_UNSAFE VIR_MIGRATE_OFFLINE >>>>>>>> VIR_MIGRATE_COMPRESSED VIR_MIGRATE_ABORT_ON_ERROR >>>>>>>> VIR_MIGRATE_AUTO_CONVERGE VIR_MIGRATE_RDMA_PIN_ALL >>>>>>>> >>>>>>> >>>>>>> We can probably consider to provide VIR_MIGRATE_PAUSED and >>>>>>> VIR_MIGRATE_COMPRESSED as dedicated options too ? >>>>>>> >>>>>> >>>>>> Yes, I think any options we see regularly added to extra_flags >>>>>> by operators, and as we understand the use cases better, then we >>>>>> can add dedicated options for them. >>>>>> >>>>> >>>>> I really don't see a case for letting the admin set >>>>> VIR_MIGRATE_PAUSED at a host level. If we want the ability to force >>>>> a running VM to end up paused after migration, this is a feature to >>>>> be added to the Nova migration API. >>>>> >>>>> The VIR_MIGRATE_COMPRESSED is not as simple as just enabling a >>>>> flag, there are other associated runtime tunables that need >>>>> setting. There was a spec discussing this which was not approved as >>>>> a suitable strategy for using it could not be agreed. >>>>> >>>>> In the review, Pawel is making a case for not allowing the >>>>>> operator to enable COMPRESSED or AUTO_CONVERGE. >>>>>> >>>>> >>>>> I agree really. As per my comments, I in fact struggle to see a >>>>> credible case for allowing any of the remaining flags to be >>>>> enabled. They are all cases that Nova should be made todo the right >>>>> thing, possibly in relation to API parameters or other deployment >>>>> choices. >>>>> >>>> >>>> Fair enough. I figured it would be a necessary safety valve against >>>> operators who value the flexibility of the current configuration >>>> options, but you make a good case. >>>> >>>> I'll drop the extra_flags option and make tunneled a tri-state. >>>> >>>> Thanks, Mark. >>>> >>>> >>>> __________________________________________________________________________ >>>> >>>> >>>> >>>> OpenStack Development Mailing List (not for usage questions) >>> >>>> Unsubscribe: >>>> [email protected]?subject:unsubscribe >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>>> >>> I'm going to be a jackass and skip this entire thread with a question. >>> >>> The live migration CI job we have in the experimental queue is blocked >>> on this failure right now [1]. >>> >>> "Live Migration failure: Operation not supported: Selecting disks to >>> migrate is not implemented for tunnelled migration" >>> >>> That's running with ubuntu 16.04 and libvirt 1.2.17. >>> >>> I was looking at the deprecated live_migration_flag which tells you to >>> use the live_migration_tunnelled flag, which defaults to None and is a >>> no-op, even though the help text on the option says if it's omitted >>> we'll do our best to figure out if you want to use tunneling based on >>> the hypervisor's encryption support. That doesn't actually happen by >>> default [2]. >>> >>> So what should happen? It seems some of this series was stalled or >>> incomplete? >>> >>> We could change the configuration of our live migration CI job, but I'd >>> like to see the job work with the defaults that we ship. >>> >>> I'm not entirely familiar with the thing that's causing the failure >>> except I believe we're calling migrateToURI3 with the tunnelled flag >>> (because it's in the default live_migration_flag option) and a list of >>> devices [3][4]. >>> >>> Would it be better to use migrateToURI2 if migrate_disks is in params >>> but the VIR_MIGRATE_TUNNELLED flag is set? >>> >>> [1] https://bugs.launchpad.net/nova/+bug/1589591 >>> [2] >>> >>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L559 >>> >>> [3] >>> >>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5740-5751 >>> >>> [4] >>> >>> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/guest.py#L523 >>> >>> >>> >> Timofey is working on a fix here: >> >> https://review.openstack.org/#/c/326111/ >> >> I've left some comments inline, but I'm mostly wondering about this now: >> >> >> https://github.com/openstack/nova/blob/f5c9ebd56075f8eb04f9f0e683f85bacdcd68c38/nova/virt/libvirt/driver.py#L5360 >> >> The code there says that if there are BDMs and libvirt<1.2.17 and you're >> doing block migration, it's unsupported and fails. The comment also says >> that it can't do that in tunneled method either - which is the bug we're >> hitting. >> >> But should we add the block_migration_flag check in >> check_can_live_migrate_source? Or does it work to live migrate in tunneled >> mode as long as we don't pass 'migrate_disks'? > > So we got 2 options: > > - raise exception in case of tunneled block migration with > migrate_disks param > - as migrate flags are supposed to be nova internals this is bad > option > - get rid of one of these option: > - don't use VIR_MIGRATE_TUNNELLED - looks most preferable for now > - don't pass migrate disks param (it will force block live > migration to fail for instances with attached volumes, and volume-backed > instances with config-drive(vfat)) > > > > >> -- >> >> Thanks, >> >> Matt Riedemann >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> [email protected]?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
