The objective for the live migration priority is to improve the stability of 
migrations based on operator experience. The high level approach is to do the 
following:

1.       Improve CI

2.       Improve documentation

3.       Improve manageability of migrations

4.       Fix bugs

In this cycle we targeted a few immediately implementable features that would 
help, specifically giving operators commands to allow them to manage migrations 
(inspect progress, force completion, and cancel) and improve security 
(split-networks and remove ssh-based resize/migration; aka storage pools).

Most of these are on track to be completed in this cycle with the exception of 
storage pools work which is being deferred. Further details follow.

Expand CI coverage - in progress

There is a job in the experimental queue called: 
gate-tempest-dsvm-multinode-live-migrationqueued. This will become the job that 
performs live migration tests; any live migration tests in other jobs will be 
removed. At present the job has been configured to cover different storage 
configurations including cinder, NFS, ceph. Tests are now being added to the 
job. Patches are currently up for live migration of instances with swap and 
instances with ephemeral disks.

Please trigger the experimental queue if your patches touch migrations in some 
way so we can check the stability of the jobs. Once stable and with sufficient 
tests we will promote the job from the experimental queue so that it always 
runs.

See: https://review.openstack.org/#/q/topic:lm_test

Improve API docs - done

Some changes were made to the API guide for moving servers, including better 
descriptions for the server actions migrate, live migrate, shelve, resize and 
evacuate ( 
http://developer.openstack.org/api-guide/compute/server_concepts.html#server-actions
 ) and a section that describes reasons for moving VMs with common use cases 
outlined ( 
http://developer.openstack.org/api-guide/compute/server_concepts.html#moving-servers
 )

Block live migration with attached volumes - done

The selective block device migration API in libvirt 1.2.17 is used to allow 
block migration when volumes are attached. A follow on patch to allow readonly 
drives to be copied in block migration has not been completed. This patch is 
required to allow iso9600 format config drives to be migrated. Without it only 
vfat config drives can be migrated. There is still some thought going into that 
- see: https://review.openstack.org/#/c/234659

Force complete - requires python-novaclient change

Force-complete forces a live migration to complete  by pausing the VM and 
restarting it when it has completed migration. This is intended as a brute 
force way to make a VM complete its migration when it is taking too long. In 
the future auto-converge and post-copy will be looked at. These became 
available in qemu 2.5.

Force complete is done in nova but still requires a change to python-novaclient 
to implement the CLI.

Cancel - in progress

Cancel stops a live migration, leaving it on the source host with the migration 
status left as "cancelled". This is in progress and follows the pattern of 
force-complete. Unfortunately this needs to be bundled up into one patch to 
avoid multiple API bumps.

Patches for review:
https://review.openstack.org/#/q/status:open+topic:bp/abort-live-migration

Progress reporting - in progress (no pun intended)

Progress reporting introduces migrations as a sub-resource of servers and adds 
progress data to the migration record. There was some debate at the mid cycle 
and on the mailing list about how to record this transient data. It is a waste 
to keep writing it to the database, but as it is generated at the compute 
manager but examined at the API it was felt that writing it to the database is 
necessary to fit the existing architecture. The conclusions was that writing to 
the database every 5 seconds would not cause a significant overhead. 
Alternatives could be persued later if necessary. For discussion see this ML 
thread: 
http://lists.openstack.org/pipermail/openstack-dev/2016-February/085662.html 
and the IRC meeting transcript here: 
http://eavesdrop.openstack.org/meetings/nova_live_migration/2016/nova_live_migration.2016-02-09-14.01.log.html

Patches for review:
https://review.openstack.org/#/q/status:open+topic:bp/live-migration-progress-report

Split networking - done

Split networking adds a configuration parameter to specify 
live_migration_inbound_addr as the ip address or host name to be used as the 
target for migration traffic. This allows migration traffic to be isolated on a 
separate network to other management traffic, providing an opportunity to 
islate service levels for the two networks and improve security by moving 
unencrypted migration traffic to an isolated network.

Resize/cold migrate using storage pools - deferred

The objective here was to change the libvirt implementation of migrate and 
resize to use libvirt storage pools instead of scp/rsync over ssh with 
passwordless keys. Storage pools are supported in all versions of libvrit 
supported by nova, so it was thought that by changing the implementation it 
would be possible to drop the ssh based code. However two flaws in this 
approach arose: the recently added ploop storage device does not work with 
storage pools in libvirt and the libvirt data copy implementation is very 
inefficient and so slower than scp or rsync.

The guys at Parallels kindly agreed to implement storage pools support for 
ploop in libvirt and this work is already making progress. Work was also 
started in libvirt to improve the copy performance. These features will be 
available in a future release, so we will need to maintain old ssh-based 
migration for libvirt as well as refactor and implement the storage pools based 
alternative.

Work has started on refactoring the libvirt driver code but the following 
blueprints will be deferred beyond mitaka:
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/use-libvirt-storage-pools.html
http://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/migrate-libvirt-volumes.html

Deprecate migration flags - done

There are a lot of migration flags used with libvirt that are either redundant 
or can be inferred from the deployed configuration. These are being deprecated 
and will be removed in the next cycle.

See:
https://review.openstack.org/#/q/project:openstack/nova+branch:master+topic:deprecate-migration-flags-config


Feel free to respond with corrections or additions.

Regards,
Paul

Paul Murray
Technical Lead, HPE Cloud
Hewlett Packard Enterprise
+44 117 316 2527


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to