On 19.11.2013 20:18, yunhong jiang wrote:
On Tue, 2013-11-19 at 12:52 +0000, Daniel P. Berrange wrote:
On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:
Hi all,

I would like to hear your thoughts about core pinning in Openstack.
Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
what can be used by instances. I didn't find blueprint, but I think
this feature is for isolate cpus used by host from cpus used by
instances(VCPUs).

But, from performance point of view it is better to exclusively
dedicate PCPUs for VCPUs and emulator. In some cases you may want to
guarantee that only one instance(and its VCPUs) is using certain
PCPUs.  By using core pinning you can optimize instance performance
based on e.g. cache sharing, NUMA topology, interrupt handling, pci
pass through(SR-IOV) in multi socket hosts etc.

We have already implemented feature like this(PoC with limitations)
to Nova Grizzly version and would like to hear your opinion about
it.

The current implementation consists of three main parts:
- Definition of pcpu-vcpu maps for instances and instance spawning
- (optional) Compute resource and capability advertising including
free pcpus and NUMA topology.
- (optional) Scheduling based on free cpus and NUMA topology.

The implementation is quite simple:

(additional/optional parts)
Nova-computes are advertising free pcpus and NUMA topology in same
manner than host capabilities. Instances are scheduled based on this
information.

(core pinning)
admin can set PCPUs for VCPUs and for emulator process, or select
NUMA cell for instance vcpus, by adding key:value pairs to flavor's
extra specs.

EXAMPLE:
instance has 4 vcpus
<key>:<value>
vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
emulator:5 --> emulator pinned to pcpu5
or
numacell:0 --> all vcpus are pinned to pcpus in numa cell 0.

In nova-compute, core pinning information is read from extra specs
and added to domain xml same way as cpu quota values(cputune).

<cputune>
       <vcpupin vcpu='0' cpuset='1'/>
       <vcpupin vcpu='1' cpuset='2'/>
       <vcpupin vcpu='2' cpuset='3'/>
       <vcpupin vcpu='3' cpuset='4'/>
       <emulatorpin cpuset='5'/>
</cputune>

What do you think? Implementation alternatives? Is this worth of
blueprint? All related comments are welcome!
I think there are several use cases mixed up in your descriptions
here which should likely be considered independantly

  - pCPU/vCPU pinning

    I don't really think this is a good idea as a general purpose
    feature in its own right. It tends to lead to fairly inefficient
    use of CPU resources when you consider that a large % of guests
    will be mostly idle most of the time. It has a fairly high
    administrative burden to maintain explicit pinning too. This
    feels like a data center virt use case rather than cloud use
    case really.

  - Dedicated CPU reservation

    The ability of an end user to request that their VM (or their
    group of VMs) gets assigned a dedicated host CPU set to run on.
    This is obviously something that would have to be controlled
    at a flavour level, and in a commercial deployment would carry
    a hefty pricing premium.

    I don't think you want to expose explicit pCPU/vCPU placement
    for this though. Just request the high level concept and allow
    the virt host to decide actual placement
I think pcpu/vcpu pinning could be considered like an extension for dedicated cpu reservation feature. And I agree that if we exclusively dedicate pcpus for VMs it is inefficient from cloud point of view, but in some case, end user may want to be sure(and ready to pay) that their VMs have resources available e.g. for sudden load peaks.

So, here is my proposal how dedicated cpu reservation would function on high level:

When an end user wants VM with nn vcpus which are running on dedicated host cpu set, admin could enable it by setting a new "dedicate_pcpu" parameter in a flavor(e.g. optional flavor parameter). By default, amount of pcpus and vcpus could be same. And as option, explicit vcpu/pcpu pinning could be done by defining vcpu/pcpu relations to flavors extra specs(vcpupin:0 0...).

In the virt driver there is two alternatives how to do the pcpu sharing 1. all dedicated pcpus are shared with all vcpus(default case) or 2. each vcpu has dedicated pcpu(vcpu 0 will be pinned to the first pcpu in a cpu set, vcpu 1 to the second pcpu and so on). Vcpu/pcpu pinning option could be used to extend the latter case.

In any case, before VM with or without dedicated pcpus is launched the virt driver must take care of that the dedicated pcpus are excluded from existing VMs and from a new VMs and that there are enough free pcpus for placement. And I think minimum amount of pcpus for VMs without dedicated pcpus must be configurable somewhere.

Comments?

Br, Tuomas


  - Host NUMA placement.

    By not taking NUMA into account currently the libvirt driver
    at least is badly wasting resources. Having too much cross-numa
    node memory access by guests just kills scalability. The virt
    driver should really automaticall figure out cpu & memory pinning
    within the scope of a NUMA node automatically. No admin config
    should be required for this.

  - Guest NUMA topology

    If the flavour memory size / cpu count exceeds the size of a
    single NUMA node, then the flavour should likely have a way to
    express that the guest should see multiple NUMA nodes. The
    virt host would then set guest NUMA topology to match the way
    it places vCPUs & memory on host NUMA nodes. Again you don't
    want explicit pcpu/vcpu mapping done by the admin for this.



Regards,
Daniel
Quite clear splitting and +1 for P/V pin option.

--jyh



_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to