Hi, I have been exercising the numa topology related features in kilo (cpu pinning, numa topology, huge pages) and have seen that there are issues when an operation moves an instance between compute nodes. In summary, the numa_topology is not recalculated for the destination node, which results in the instance running with the wrong topology (or even failing to run if the topology isn't supported on the destination). This impacts live migration, cold migration, resize and evacuate.
I have spent some time over the last couple weeks and have a working fix for these issues that I would like to push upstream. The fix for cold migration and resize is the most straightfoward, so I plan to start there. At a high level, here is what I have done to fix cold migrate and resize: - Add the source_numa_topology and dest_numa_topology to the migration object and migrations table. - When a resize_claim is done, store the claimed numa topology in the dest_numa_topology in the migration record. Also store the current numa topology as the source_numa_topology in the migration record. - Use the source_numa_topology and dest_numa_topology from the migration record in the resource accounting when referencing migration claims as appropriate. This is done for claims, dropped claims and the resource audit. - Set the numa_topology in the instance after the cold migration/resize is finished to the dest_numa_topology from the migration object - done in finish_resize RPC on the destination compute to match where the rest of the resources for the instance are updated (there is a call to _set_instance_info here that sets the memory, vcpus, disk space, etc... for the migrated instance). - Set the numa_topology in the instance if the cold migration/resize is reverted to the source_numa_topology from the migration object - done in finish_revert_resize RPC on the source compute. I would appreciate any comments on my approach. I plan to start submitting the code for this against bug 1417667 - I will split it into several chunks to make it easier to review. Fixing live migration was significantly more effort - I'll start a different thread on that once I have feedback on the above approach. Thanks, Bart Wensley, Member of Technical Staff, Wind River __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev