On Fri, Feb 12, 2016 at 04:21:27PM +0000, Murray, Paul (HP Cloud) wrote: > This time with a tag in case anyone is filtering...
Yep, I was filtering, and would've missed it without your tag. :-) > From: Murray, Paul (HP Cloud) > Sent: 12 February 2016 16:16 > To: [email protected] > Subject: [openstack-dev] Update on live migration priority > > The objective for the live migration priority is to improve the > stability of migrations based on operator experience. The high level > approach is to do the following: > > 1. Improve CI > > 2. Improve documentation > > > 3. Improve manageability of migrations > > 4. Fix bugs > > In this cycle we targeted a few immediately implementable features > that would help, specifically giving operators commands to allow them > to manage migrations (inspect progress, force completion, and cancel) > and improve security (split-networks and remove ssh-based > resize/migration; aka storage pools). > > Most of these are on track to be completed in this cycle with the > exception of storage pools work which is being deferred. Further > details follow. > > Expand CI coverage - in progress > > There is a job in the experimental queue called: > gate-tempest-dsvm-multinode-live-migrationqueued. This will become the > job that performs live migration tests; any live migration tests in > other jobs will be removed. At present the job has been configured to > cover different storage configurations including cinder, NFS, ceph. > Tests are now being added to the job. Patches are currently up for > live migration of instances with swap and instances with ephemeral > disks. > > Please trigger the experimental queue if your patches touch migrations > in some way so we can check the stability of the jobs. Once stable and > with sufficient tests we will promote the job from the experimental > queue so that it always runs. > > See: https://review.openstack.org/#/q/topic:lm_test > > Improve API docs - done > > Some changes were made to the API guide for moving servers, including > better descriptions for the server actions migrate, live migrate, > shelve, resize and evacuate ( > http://developer.openstack.org/api-guide/compute/server_concepts.html#server-actions > ) and a section that describes reasons for moving VMs with common use > cases outlined ( > http://developer.openstack.org/api-guide/compute/server_concepts.html#moving-servers > ) > > Block live migration with attached volumes - done > > The selective block device migration API in libvirt 1.2.17 is used to > allow block migration when volumes are attached. A follow on patch to > allow readonly drives to be copied in block migration has not been > completed. This patch is required to allow iso9600 format config > drives to be migrated. Without it only vfat config drives can be > migrated. There is still some thought going into that - see: > https://review.openstack.org/#/c/234659 > > Force complete - requires python-novaclient change > > Force-complete forces a live migration to complete by pausing the VM > and restarting it when it has completed migration. This is intended as > a brute force way to make a VM complete its migration when it is > taking too long. In the future auto-converge and post-copy will be > looked at. These became available in qemu 2.5. > > Force complete is done in nova but still requires a change to > python-novaclient to implement the CLI. > > Cancel - in progress > > Cancel stops a live migration, leaving it on the source host with the > migration status left as "cancelled". This is in progress and follows > the pattern of force-complete. Unfortunately this needs to be bundled > up into one patch to avoid multiple API bumps. > > Patches for review: > https://review.openstack.org/#/q/status:open+topic:bp/abort-live-migration > > Progress reporting - in progress (no pun intended) > > Progress reporting introduces migrations as a sub-resource of servers > and adds progress data to the migration record. There was some debate > at the mid cycle and on the mailing list about how to record this > transient data. It is a waste to keep writing it to the database, but > as it is generated at the compute manager but examined at the API it > was felt that writing it to the database is necessary to fit the > existing architecture. The conclusions was that writing to the > database every 5 seconds would not cause a significant overhead. > Alternatives could be persued later if necessary. For discussion see > this ML thread: > http://lists.openstack.org/pipermail/openstack-dev/2016-February/085662.html > and the IRC meeting transcript here: > http://eavesdrop.openstack.org/meetings/nova_live_migration/2016/nova_live_migration.2016-02-09-14.01.log.html > > Patches for review: > https://review.openstack.org/#/q/status:open+topic:bp/live-migration-progress-report > > Split networking - done > > Split networking adds a configuration parameter to specify > live_migration_inbound_addr as the ip address or host name to be used > as the target for migration traffic. This allows migration traffic to > be isolated on a separate network to other management traffic, > providing an opportunity to islate service levels for the two networks > and improve security by moving unencrypted migration traffic to an > isolated network. > > Resize/cold migrate using storage pools - deferred > > The objective here was to change the libvirt implementation of migrate > and resize to use libvirt storage pools instead of scp/rsync over ssh > with passwordless keys. Storage pools are supported in all versions of > libvrit supported by nova, so it was thought that by changing the > implementation it would be possible to drop the ssh based code. > However two flaws in this approach arose: the recently added ploop > storage device does not work with storage pools in libvirt and the > libvirt data copy implementation is very inefficient and so slower > than scp or rsync. > > The guys at Parallels kindly agreed to implement storage pools support > for ploop in libvirt and this work is already making progress. Work > was also started in libvirt to improve the copy performance. These > features will be available in a future release, so we will need to > maintain old ssh-based migration for libvirt as well as refactor and > implement the storage pools based alternative. > > Work has started on refactoring the libvirt driver code but the > following blueprints will be deferred beyond mitaka: > http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/use-libvirt-storage-pools.html > http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/migrate-libvirt-volumes.html > > Deprecate migration flags - done > > There are a lot of migration flags used with libvirt that are either > redundant or can be inferred from the deployed configuration. These > are being deprecated and will be removed in the next cycle. > > See: > https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:deprecate-migration-flags-config This is a nice cleanup, as now I can stop traiging countless bugs or comments on IRC about what flags one ought to set. Thanks for the overall summary/update! PS: If it's possible to make your email client wrap long sentences, please do so, it's a little hard to read. </me-stops-being-a-pest> -- /kashyap __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
