This is a long-standing issue. Nikola has been working on it in Liberty for the CPU pinning case, not sure about the non-pinned case. And of course patching back to Kilo hasn't been done yet.

Aubrey, what you're seeing is definitely a bug. There is an existing bug https://bugs.launchpad.net/nova/+bug/1417667 but that is specifically for dedicated CPUs which doesn't apply in your case. Please feel free to open a new bug.

Chris

On 09/25/2015 12:16 PM, Kris G. Lindgren wrote:
I believe TWC - (medberry on irc) was lamenting to me about cpusets, different 
hypervisors HW configs, and unassigned vcpu's in numa nodes.

The problem is the migration does not re-define the domain.xml, specifically, 
the vcpu mapping to match what makes sense on the new host.  I believe the 
issue is more pronounced when you go from a compute node with more cores to a 
compute node with less cores. I believe the opposite migration works, just the 
vcpu/numa nodes are all wrong.

CC'ing him as well.
___________________________________________________________________
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy







On 9/25/15, 11:53 AM, "Steve Gordon" <[email protected]> wrote:

Adding Nikola as he has been working on this.

----- Original Message -----
From: "Aubrey Wells" <[email protected]>
To: [email protected]

Greetings,
Trying to decide if this is a bug or just a config option that I can't
find. The setup I'm currently testing in my lab with is two compute nodes
running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c
+ HT). I don't have any CPU pinning enabled in my nova config, which seems
to have the effect of setting in libvirt.xml a vcpu cpuset element like (if
created on the 40c node):

<vcpu
cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu>

And then if I migrate that instance to the 16c node, it will bomb out with
an exception:

Live Migration failure: Invalid value
'0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus':
Invalid argument

Which makes sense, since that node doesn't have any vcpus after 15 (0-15).

I can fix the symptom by commenting out a line in
nova/virt/libvirt/config.py (circa line 1831) so it always has an empty
cpuset and thus doesn't write that line to libvirt.xml:
# vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset))

And the instance will happily migrate to the host with less CPUs, but this
loses some of the benefit of openstack trying to evenly spread out the core
usage on the host, at least that's what I think the purpose of that is.

I'd rather fix it the right way if there's a config option I don't see or
file a bug if its a bug.

What I think should be happening is that when it creates the libvirt
definition on the destination compute node, it write out the correct cpuset
per the specs of the hardware its going on to.

If it matters, in my nova-compute.conf file, I also have cpu mode and model
defined to allow me to migrate between the two different architectures to
begin with (the 40c is Sandybridge and the 16c is Westmere so I set it to
the lowest common denominator of Westmere):

cpu_mode=custom
cpu_model=Westmere

Any help is appreciated.

---------------------
Aubrey

_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


--
Steve Gordon, RHCE
Sr. Technical Product Manager,
Red Hat Enterprise Linux OpenStack Platform

_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators



_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to