I believe TWC - (medberry on irc) was lamenting to me about cpusets, different hypervisors HW configs, and unassigned vcpu's in numa nodes.
The problem is the migration does not re-define the domain.xml, specifically, the vcpu mapping to match what makes sense on the new host. I believe the issue is more pronounced when you go from a compute node with more cores to a compute node with less cores. I believe the opposite migration works, just the vcpu/numa nodes are all wrong. CC'ing him as well. ___________________________________________________________________ Kris Lindgren Senior Linux Systems Engineer GoDaddy On 9/25/15, 11:53 AM, "Steve Gordon" <sgor...@redhat.com> wrote: >Adding Nikola as he has been working on this. > >----- Original Message ----- >> From: "Aubrey Wells" <awe...@digiumcloud.com> >> To: openstack-operators@lists.openstack.org >> >> Greetings, >> Trying to decide if this is a bug or just a config option that I can't >> find. The setup I'm currently testing in my lab with is two compute nodes >> running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c >> + HT). I don't have any CPU pinning enabled in my nova config, which seems >> to have the effect of setting in libvirt.xml a vcpu cpuset element like (if >> created on the 40c node): >> >> <vcpu >> cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu> >> >> And then if I migrate that instance to the 16c node, it will bomb out with >> an exception: >> >> Live Migration failure: Invalid value >> '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus': >> Invalid argument >> >> Which makes sense, since that node doesn't have any vcpus after 15 (0-15). >> >> I can fix the symptom by commenting out a line in >> nova/virt/libvirt/config.py (circa line 1831) so it always has an empty >> cpuset and thus doesn't write that line to libvirt.xml: >> # vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset)) >> >> And the instance will happily migrate to the host with less CPUs, but this >> loses some of the benefit of openstack trying to evenly spread out the core >> usage on the host, at least that's what I think the purpose of that is. >> >> I'd rather fix it the right way if there's a config option I don't see or >> file a bug if its a bug. >> >> What I think should be happening is that when it creates the libvirt >> definition on the destination compute node, it write out the correct cpuset >> per the specs of the hardware its going on to. >> >> If it matters, in my nova-compute.conf file, I also have cpu mode and model >> defined to allow me to migrate between the two different architectures to >> begin with (the 40c is Sandybridge and the 16c is Westmere so I set it to >> the lowest common denominator of Westmere): >> >> cpu_mode=custom >> cpu_model=Westmere >> >> Any help is appreciated. >> >> --------------------- >> Aubrey >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > >-- >Steve Gordon, RHCE >Sr. Technical Product Manager, >Red Hat Enterprise Linux OpenStack Platform > >_______________________________________________ >OpenStack-operators mailing list >OpenStack-operators@lists.openstack.org >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators