Re: [openstack-dev] [nova] Core pinning

2013-11-27 Thread Tuomas Paappanen

On 19.11.2013 20:18, yunhong jiang wrote:

On Tue, 2013-11-19 at 12:52 +, Daniel P. Berrange wrote:

On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:

Hi all,

I would like to hear your thoughts about core pinning in Openstack.
Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
what can be used by instances. I didn't find blueprint, but I think
this feature is for isolate cpus used by host from cpus used by
instances(VCPUs).

But, from performance point of view it is better to exclusively
dedicate PCPUs for VCPUs and emulator. In some cases you may want to
guarantee that only one instance(and its VCPUs) is using certain
PCPUs.  By using core pinning you can optimize instance performance
based on e.g. cache sharing, NUMA topology, interrupt handling, pci
pass through(SR-IOV) in multi socket hosts etc.

We have already implemented feature like this(PoC with limitations)
to Nova Grizzly version and would like to hear your opinion about
it.

The current implementation consists of three main parts:
- Definition of pcpu-vcpu maps for instances and instance spawning
- (optional) Compute resource and capability advertising including
free pcpus and NUMA topology.
- (optional) Scheduling based on free cpus and NUMA topology.

The implementation is quite simple:

(additional/optional parts)
Nova-computes are advertising free pcpus and NUMA topology in same
manner than host capabilities. Instances are scheduled based on this
information.

(core pinning)
admin can set PCPUs for VCPUs and for emulator process, or select
NUMA cell for instance vcpus, by adding key:value pairs to flavor's
extra specs.

EXAMPLE:
instance has 4 vcpus
key:value
vcpus:1,2,3,4 -- vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
emulator:5 -- emulator pinned to pcpu5
or
numacell:0 -- all vcpus are pinned to pcpus in numa cell 0.

In nova-compute, core pinning information is read from extra specs
and added to domain xml same way as cpu quota values(cputune).

cputune
   vcpupin vcpu='0' cpuset='1'/
   vcpupin vcpu='1' cpuset='2'/
   vcpupin vcpu='2' cpuset='3'/
   vcpupin vcpu='3' cpuset='4'/
   emulatorpin cpuset='5'/
/cputune

What do you think? Implementation alternatives? Is this worth of
blueprint? All related comments are welcome!

I think there are several use cases mixed up in your descriptions
here which should likely be considered independantly

  - pCPU/vCPU pinning

I don't really think this is a good idea as a general purpose
feature in its own right. It tends to lead to fairly inefficient
use of CPU resources when you consider that a large % of guests
will be mostly idle most of the time. It has a fairly high
administrative burden to maintain explicit pinning too. This
feels like a data center virt use case rather than cloud use
case really.

  - Dedicated CPU reservation

The ability of an end user to request that their VM (or their
group of VMs) gets assigned a dedicated host CPU set to run on.
This is obviously something that would have to be controlled
at a flavour level, and in a commercial deployment would carry
a hefty pricing premium.

I don't think you want to expose explicit pCPU/vCPU placement
for this though. Just request the high level concept and allow
the virt host to decide actual placement
I think pcpu/vcpu pinning could be considered like an extension for 
dedicated cpu reservation feature. And I agree that if we exclusively 
dedicate pcpus for VMs it is inefficient from cloud point of view, but 
in some case, end user may want to be sure(and ready to pay) that their 
VMs have resources available e.g. for sudden load peaks.


So, here is my proposal how dedicated cpu reservation would function on 
high level:


When an end user wants VM with nn vcpus which are running on dedicated 
host cpu set, admin could enable it by setting a new dedicate_pcpu 
parameter in a flavor(e.g. optional flavor parameter). By default, 
amount of pcpus and vcpus could be same. And as option, explicit 
vcpu/pcpu pinning could be done by defining vcpu/pcpu relations to 
flavors extra specs(vcpupin:0 0...).


In the virt driver there is two alternatives how to do the pcpu sharing 
1. all dedicated pcpus are shared with all vcpus(default case) or 2. 
each vcpu has dedicated pcpu(vcpu 0 will be pinned to the first pcpu in 
a cpu set, vcpu 1 to the second pcpu and so on). Vcpu/pcpu pinning 
option could be used to extend the latter case.


In any case, before VM with or without dedicated pcpus is launched the 
virt driver must take care of that the dedicated pcpus are excluded from 
existing VMs and from a new VMs and that there are enough free pcpus for 
placement. And I think minimum amount of pcpus for VMs without dedicated 
pcpus must be configurable somewhere.


Comments?

Br, Tuomas



  - Host NUMA placement.

By not taking NUMA into account currently the libvirt driver
at least is badly wasting resources. Having 

Re: [openstack-dev] [nova] Core pinning

2013-11-27 Thread Daniel P. Berrange
On Wed, Nov 27, 2013 at 03:50:47PM +0200, Tuomas Paappanen wrote:
 On Tue, 2013-11-19 at 12:52 +, Daniel P. Berrange wrote:
 I think there are several use cases mixed up in your descriptions
 here which should likely be considered independantly
 
   - pCPU/vCPU pinning
 
 I don't really think this is a good idea as a general purpose
 feature in its own right. It tends to lead to fairly inefficient
 use of CPU resources when you consider that a large % of guests
 will be mostly idle most of the time. It has a fairly high
 administrative burden to maintain explicit pinning too. This
 feels like a data center virt use case rather than cloud use
 case really.
 
   - Dedicated CPU reservation
 
 The ability of an end user to request that their VM (or their
 group of VMs) gets assigned a dedicated host CPU set to run on.
 This is obviously something that would have to be controlled
 at a flavour level, and in a commercial deployment would carry
 a hefty pricing premium.
 
 I don't think you want to expose explicit pCPU/vCPU placement
 for this though. Just request the high level concept and allow
 the virt host to decide actual placement
 I think pcpu/vcpu pinning could be considered like an extension for
 dedicated cpu reservation feature. And I agree that if we
 exclusively dedicate pcpus for VMs it is inefficient from cloud
 point of view, but in some case, end user may want to be sure(and
 ready to pay) that their VMs have resources available e.g. for
 sudden load peaks.
 
 So, here is my proposal how dedicated cpu reservation would function
 on high level:
 
 When an end user wants VM with nn vcpus which are running on
 dedicated host cpu set, admin could enable it by setting a new
 dedicate_pcpu parameter in a flavor(e.g. optional flavor
 parameter). By default, amount of pcpus and vcpus could be same. And
 as option, explicit vcpu/pcpu pinning could be done by defining
 vcpu/pcpu relations to flavors extra specs(vcpupin:0 0...).
 
 In the virt driver there is two alternatives how to do the pcpu
 sharing 1. all dedicated pcpus are shared with all vcpus(default
 case) or 2. each vcpu has dedicated pcpu(vcpu 0 will be pinned to
 the first pcpu in a cpu set, vcpu 1 to the second pcpu and so on).
 Vcpu/pcpu pinning option could be used to extend the latter case.
 
 In any case, before VM with or without dedicated pcpus is launched
 the virt driver must take care of that the dedicated pcpus are
 excluded from existing VMs and from a new VMs and that there are
 enough free pcpus for placement. And I think minimum amount of pcpus
 for VMs without dedicated pcpus must be configurable somewhere.
 
 Comments?

I still don't believe that vcpu:pcpu pinning is something we want
to do, even with dedicated CPUs. There are always threads in the
host doing work on behalf of the VM that are not related to vCPUs.
For example the main QEMU emulator thread, the QEMU I/O threads,
kernel threads. Other hypervisors have similar behaviour. It is
better to let the kernel / hypervisor scheduler decide how to
balance the competing workloads than forcing a fixed   suboptimally
performing vcpu:pcpu mapping. The only time I've seen fixed pinning
make a consistent benefit is when you have NUMA involved and want to
prevent a VM spanning NUMA nodes. Even then you'd just be best pinning
to the set of CPUs in a node and then letting the vCPUs float amonst
the pCPUs in that node.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-19 Thread Tuomas Paappanen

Hi Roman,

I haven't but I will write a blueprint for the core pinning part.
I considered vcpu element usage as well but in that case you can not set 
e.g. vcpu-0 to run on pcpu-0. Vcpus and emulator are sharing all pcpus 
defined in cpuset so I decided to use cputune element.


Are you using extra specs for carrying cpuset attributes in your 
implementation?


Br,Tuomas

On 18.11.2013 17:14, Roman Verchikov wrote:

Tuomas,

Have you published your code/blueprints anywhere? Looks like we’re working on 
the same stuff. I have implemented almost the same feature set (haven’t 
published anything yet because of this thread), except for the scheduler part. 
The main goal is to be able to pin VCPUs in NUMA environment.

Have you considered adding placement and cpuset attributes to vcpu element? 
For example:
vcpu placement=‘static’ cpuset=‘%whatever%’

Thanks,
Roman

On Nov 13, 2013, at 14:46, Tuomas Paappanen tuomas.paappa...@tieto.com wrote:


Hi all,

I would like to hear your thoughts about core pinning in Openstack. Currently 
nova(with qemu-kvm) supports usage of cpu set of PCPUs what can be used by 
instances. I didn't find blueprint, but I think this feature is for isolate 
cpus used by host from cpus used by instances(VCPUs).

But, from performance point of view it is better to exclusively dedicate PCPUs 
for VCPUs and emulator. In some cases you may want to guarantee that only one 
instance(and its VCPUs) is using certain PCPUs.  By using core pinning you can 
optimize instance performance based on e.g. cache sharing, NUMA topology, 
interrupt handling, pci pass through(SR-IOV) in multi socket hosts etc.

We have already implemented feature like this(PoC with limitations) to Nova 
Grizzly version and would like to hear your opinion about it.

The current implementation consists of three main parts:
- Definition of pcpu-vcpu maps for instances and instance spawning
- (optional) Compute resource and capability advertising including free pcpus 
and NUMA topology.
- (optional) Scheduling based on free cpus and NUMA topology.

The implementation is quite simple:

(additional/optional parts)
Nova-computes are advertising free pcpus and NUMA topology in same manner than 
host capabilities. Instances are scheduled based on this information.

(core pinning)
admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell for 
instance vcpus, by adding key:value pairs to flavor's extra specs.

EXAMPLE:
instance has 4 vcpus
key:value
vcpus:1,2,3,4 -- vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
emulator:5 -- emulator pinned to pcpu5
or
numacell:0 -- all vcpus are pinned to pcpus in numa cell 0.

In nova-compute, core pinning information is read from extra specs and added to 
domain xml same way as cpu quota values(cputune).

cputune
  vcpupin vcpu='0' cpuset='1'/
  vcpupin vcpu='1' cpuset='2'/
  vcpupin vcpu='2' cpuset='3'/
  vcpupin vcpu='3' cpuset='4'/
  emulatorpin cpuset='5'/
/cputune

What do you think? Implementation alternatives? Is this worth of blueprint? All 
related comments are welcome!

Regards,
Tuomas





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-19 Thread Daniel P. Berrange
On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:
 Hi all,
 
 I would like to hear your thoughts about core pinning in Openstack.
 Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
 what can be used by instances. I didn't find blueprint, but I think
 this feature is for isolate cpus used by host from cpus used by
 instances(VCPUs).
 
 But, from performance point of view it is better to exclusively
 dedicate PCPUs for VCPUs and emulator. In some cases you may want to
 guarantee that only one instance(and its VCPUs) is using certain
 PCPUs.  By using core pinning you can optimize instance performance
 based on e.g. cache sharing, NUMA topology, interrupt handling, pci
 pass through(SR-IOV) in multi socket hosts etc.
 
 We have already implemented feature like this(PoC with limitations)
 to Nova Grizzly version and would like to hear your opinion about
 it.
 
 The current implementation consists of three main parts:
 - Definition of pcpu-vcpu maps for instances and instance spawning
 - (optional) Compute resource and capability advertising including
 free pcpus and NUMA topology.
 - (optional) Scheduling based on free cpus and NUMA topology.
 
 The implementation is quite simple:
 
 (additional/optional parts)
 Nova-computes are advertising free pcpus and NUMA topology in same
 manner than host capabilities. Instances are scheduled based on this
 information.
 
 (core pinning)
 admin can set PCPUs for VCPUs and for emulator process, or select
 NUMA cell for instance vcpus, by adding key:value pairs to flavor's
 extra specs.
 
 EXAMPLE:
 instance has 4 vcpus
 key:value
 vcpus:1,2,3,4 -- vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
 emulator:5 -- emulator pinned to pcpu5
 or
 numacell:0 -- all vcpus are pinned to pcpus in numa cell 0.
 
 In nova-compute, core pinning information is read from extra specs
 and added to domain xml same way as cpu quota values(cputune).
 
 cputune
   vcpupin vcpu='0' cpuset='1'/
   vcpupin vcpu='1' cpuset='2'/
   vcpupin vcpu='2' cpuset='3'/
   vcpupin vcpu='3' cpuset='4'/
   emulatorpin cpuset='5'/
 /cputune
 
 What do you think? Implementation alternatives? Is this worth of
 blueprint? All related comments are welcome!

I think there are several use cases mixed up in your descriptions
here which should likely be considered independantly

 - pCPU/vCPU pinning

   I don't really think this is a good idea as a general purpose
   feature in its own right. It tends to lead to fairly inefficient
   use of CPU resources when you consider that a large % of guests
   will be mostly idle most of the time. It has a fairly high
   administrative burden to maintain explicit pinning too. This
   feels like a data center virt use case rather than cloud use
   case really.

 - Dedicated CPU reservation

   The ability of an end user to request that their VM (or their
   group of VMs) gets assigned a dedicated host CPU set to run on.
   This is obviously something that would have to be controlled
   at a flavour level, and in a commercial deployment would carry
   a hefty pricing premium.

   I don't think you want to expose explicit pCPU/vCPU placement
   for this though. Just request the high level concept and allow
   the virt host to decide actual placement

 - Host NUMA placement.

   By not taking NUMA into account currently the libvirt driver
   at least is badly wasting resources. Having too much cross-numa
   node memory access by guests just kills scalability. The virt
   driver should really automaticall figure out cpu  memory pinning
   within the scope of a NUMA node automatically. No admin config
   should be required for this.

 - Guest NUMA topology

   If the flavour memory size / cpu count exceeds the size of a
   single NUMA node, then the flavour should likely have a way to
   express that the guest should see multiple NUMA nodes. The
   virt host would then set guest NUMA topology to match the way
   it places vCPUs  memory on host NUMA nodes. Again you don't
   want explicit pcpu/vcpu mapping done by the admin for this.



Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-19 Thread Daniel P. Berrange
On Wed, Nov 13, 2013 at 11:57:22AM -0600, Chris Friesen wrote:
 On 11/13/2013 11:40 AM, Jiang, Yunhong wrote:
 
 But, from performance point of view it is better to exclusively
 dedicate PCPUs for VCPUs and emulator. In some cases you may want
 to guarantee that only one instance(and its VCPUs) is using certain
 PCPUs.  By using core pinning you can optimize instance performance
 based on e.g. cache sharing, NUMA topology, interrupt handling, pci
 pass through(SR-IOV) in multi socket hosts etc.
 
 My 2 cents. When you talking about  performance point of view, are
 you talking about guest performance, or overall performance? Pin PCPU
 is sure to benefit guest performance, but possibly not for overall
 performance, especially if the vCPU is not consume 100% of the CPU
 resources.
 
 It can actually be both.  If a guest has several virtual cores that
 both access the same memory, it can be highly beneficial all around
 if all the memory/cpus for that guest come from a single NUMA node
 on the host.  That way you reduce the cross-NUMA-node memory
 traffic, increasing overall efficiency.  Alternately, if a guest has
 several cores that use lots of memory bandwidth but don't access the
 same data, you might want to ensure that the cores are on different
 NUMA nodes to equalize utilization of the different NUMA nodes.
 
 Similarly, once you start talking about doing SR-IOV networking I/O
 passthrough into a guest (for SDN/NFV stuff) for optimum efficiency
 it is beneficial to be able to steer interrupts on the physical host
 to the specific cpus on which the guest will be running.  This
 implies some form of pinning.

I would say intelligent NUMA placement is something that virt drivers
should address automatically without any need for admin defined pinning.
The latter is just imposing too much admin burden, for something we can
figure out automatically to a good enough extent.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-19 Thread yunhong jiang
On Tue, 2013-11-19 at 12:52 +, Daniel P. Berrange wrote:
 On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote:
  Hi all,
  
  I would like to hear your thoughts about core pinning in Openstack.
  Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs
  what can be used by instances. I didn't find blueprint, but I think
  this feature is for isolate cpus used by host from cpus used by
  instances(VCPUs).
  
  But, from performance point of view it is better to exclusively
  dedicate PCPUs for VCPUs and emulator. In some cases you may want to
  guarantee that only one instance(and its VCPUs) is using certain
  PCPUs.  By using core pinning you can optimize instance performance
  based on e.g. cache sharing, NUMA topology, interrupt handling, pci
  pass through(SR-IOV) in multi socket hosts etc.
  
  We have already implemented feature like this(PoC with limitations)
  to Nova Grizzly version and would like to hear your opinion about
  it.
  
  The current implementation consists of three main parts:
  - Definition of pcpu-vcpu maps for instances and instance spawning
  - (optional) Compute resource and capability advertising including
  free pcpus and NUMA topology.
  - (optional) Scheduling based on free cpus and NUMA topology.
  
  The implementation is quite simple:
  
  (additional/optional parts)
  Nova-computes are advertising free pcpus and NUMA topology in same
  manner than host capabilities. Instances are scheduled based on this
  information.
  
  (core pinning)
  admin can set PCPUs for VCPUs and for emulator process, or select
  NUMA cell for instance vcpus, by adding key:value pairs to flavor's
  extra specs.
  
  EXAMPLE:
  instance has 4 vcpus
  key:value
  vcpus:1,2,3,4 -- vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
  emulator:5 -- emulator pinned to pcpu5
  or
  numacell:0 -- all vcpus are pinned to pcpus in numa cell 0.
  
  In nova-compute, core pinning information is read from extra specs
  and added to domain xml same way as cpu quota values(cputune).
  
  cputune
vcpupin vcpu='0' cpuset='1'/
vcpupin vcpu='1' cpuset='2'/
vcpupin vcpu='2' cpuset='3'/
vcpupin vcpu='3' cpuset='4'/
emulatorpin cpuset='5'/
  /cputune
  
  What do you think? Implementation alternatives? Is this worth of
  blueprint? All related comments are welcome!
 
 I think there are several use cases mixed up in your descriptions
 here which should likely be considered independantly
 
  - pCPU/vCPU pinning
 
I don't really think this is a good idea as a general purpose
feature in its own right. It tends to lead to fairly inefficient
use of CPU resources when you consider that a large % of guests
will be mostly idle most of the time. It has a fairly high
administrative burden to maintain explicit pinning too. This
feels like a data center virt use case rather than cloud use
case really.
 
  - Dedicated CPU reservation
 
The ability of an end user to request that their VM (or their
group of VMs) gets assigned a dedicated host CPU set to run on.
This is obviously something that would have to be controlled
at a flavour level, and in a commercial deployment would carry
a hefty pricing premium.
 
I don't think you want to expose explicit pCPU/vCPU placement
for this though. Just request the high level concept and allow
the virt host to decide actual placement
 
  - Host NUMA placement.
 
By not taking NUMA into account currently the libvirt driver
at least is badly wasting resources. Having too much cross-numa
node memory access by guests just kills scalability. The virt
driver should really automaticall figure out cpu  memory pinning
within the scope of a NUMA node automatically. No admin config
should be required for this.
 
  - Guest NUMA topology
 
If the flavour memory size / cpu count exceeds the size of a
single NUMA node, then the flavour should likely have a way to
express that the guest should see multiple NUMA nodes. The
virt host would then set guest NUMA topology to match the way
it places vCPUs  memory on host NUMA nodes. Again you don't
want explicit pcpu/vcpu mapping done by the admin for this.
 
 
 
 Regards,
 Daniel

Quite clear splitting and +1 for P/V pin option.

--jyh



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-18 Thread Roman Verchikov
Tuomas,

Have you published your code/blueprints anywhere? Looks like we’re working on 
the same stuff. I have implemented almost the same feature set (haven’t 
published anything yet because of this thread), except for the scheduler part. 
The main goal is to be able to pin VCPUs in NUMA environment.

Have you considered adding placement and cpuset attributes to vcpu element? 
For example:
vcpu placement=‘static’ cpuset=‘%whatever%’

Thanks,
Roman

On Nov 13, 2013, at 14:46, Tuomas Paappanen tuomas.paappa...@tieto.com wrote:

 Hi all,
 
 I would like to hear your thoughts about core pinning in Openstack. Currently 
 nova(with qemu-kvm) supports usage of cpu set of PCPUs what can be used by 
 instances. I didn't find blueprint, but I think this feature is for isolate 
 cpus used by host from cpus used by instances(VCPUs).
 
 But, from performance point of view it is better to exclusively dedicate 
 PCPUs for VCPUs and emulator. In some cases you may want to guarantee that 
 only one instance(and its VCPUs) is using certain PCPUs.  By using core 
 pinning you can optimize instance performance based on e.g. cache sharing, 
 NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket 
 hosts etc.
 
 We have already implemented feature like this(PoC with limitations) to Nova 
 Grizzly version and would like to hear your opinion about it.
 
 The current implementation consists of three main parts:
 - Definition of pcpu-vcpu maps for instances and instance spawning
 - (optional) Compute resource and capability advertising including free pcpus 
 and NUMA topology.
 - (optional) Scheduling based on free cpus and NUMA topology.
 
 The implementation is quite simple:
 
 (additional/optional parts)
 Nova-computes are advertising free pcpus and NUMA topology in same manner 
 than host capabilities. Instances are scheduled based on this information.
 
 (core pinning)
 admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell 
 for instance vcpus, by adding key:value pairs to flavor's extra specs.
 
 EXAMPLE:
 instance has 4 vcpus
 key:value
 vcpus:1,2,3,4 -- vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
 emulator:5 -- emulator pinned to pcpu5
 or
 numacell:0 -- all vcpus are pinned to pcpus in numa cell 0.
 
 In nova-compute, core pinning information is read from extra specs and added 
 to domain xml same way as cpu quota values(cputune).
 
 cputune
  vcpupin vcpu='0' cpuset='1'/
  vcpupin vcpu='1' cpuset='2'/
  vcpupin vcpu='2' cpuset='3'/
  vcpupin vcpu='3' cpuset='4'/
  emulatorpin cpuset='5'/
 /cputune
 
 What do you think? Implementation alternatives? Is this worth of blueprint? 
 All related comments are welcome!
 
 Regards,
 Tuomas
 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-15 Thread Tapio Tallgren
Hi,

The use cases fro CPU pinning are exactly like discussed above: (1)
lowering guest scheduling latencies and (2) improving networking latencies
by pinning the SR-IOV IRQ's to specific cores. There is also a third use
case, (3) avoiding long latencies with spinlocks.

 On Wed, Nov 13, 2013 at 8:20 PM, Jiang, Yunhong yunhong.ji...@intel.com
 wrote:


 Similarly, once you start talking about doing SR-IOV networking I/O
 passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it
 is beneficial to be able to steer interrupts on the physical host to the
 specific cpus on which the guest will be running.  This implies some
 form of pinning.

 Still, I think hypervisor should achieve this, instead of openstack.

How would this work? As a solution, this would be much better since then
OpenStack would have much less low-level work to do.

-Tapio
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-14 Thread Tuomas Paappanen

On 13.11.2013 20:20, Jiang, Yunhong wrote:



-Original Message-
From: Chris Friesen [mailto:chris.frie...@windriver.com]
Sent: Wednesday, November 13, 2013 9:57 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [nova] Core pinning

On 11/13/2013 11:40 AM, Jiang, Yunhong wrote:


But, from performance point of view it is better to exclusively
dedicate PCPUs for VCPUs and emulator. In some cases you may want
to guarantee that only one instance(and its VCPUs) is using certain
PCPUs.  By using core pinning you can optimize instance performance
based on e.g. cache sharing, NUMA topology, interrupt handling, pci
pass through(SR-IOV) in multi socket hosts etc.

My 2 cents. When you talking about  performance point of view, are
you talking about guest performance, or overall performance? Pin PCPU
is sure to benefit guest performance, but possibly not for overall
performance, especially if the vCPU is not consume 100% of the CPU
resources.

It can actually be both.  If a guest has several virtual cores that both
access the same memory, it can be highly beneficial all around if all
the memory/cpus for that guest come from a single NUMA node on the
host.
   That way you reduce the cross-NUMA-node memory traffic, increasing
overall efficiency.  Alternately, if a guest has several cores that use
lots of memory bandwidth but don't access the same data, you might want
to ensure that the cores are on different NUMA nodes to equalize
utilization of the different NUMA nodes.

I think the Tuomas is talking about  exclusively dedicate PCPUs for VCPUs, in 
that situation, that pCPU can't be shared by other vCPU anymore. If this vCPU like cost 
only 50% of the PCPU usage, it's sure to be a waste of the overall performance.

As to the cross NUMA node access, I'd let hypervisor, instead of cloud OS, to 
reduce the cross NUMA access as much as possible.

I'm not against such usage, it's sure to be used on data center virtualization. 
Just question if it's for cloud.



Similarly, once you start talking about doing SR-IOV networking I/O
passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it
is beneficial to be able to steer interrupts on the physical host to the
specific cpus on which the guest will be running.  This implies some
form of pinning.

Still, I think hypervisor should achieve this, instead of openstack.



I think pin CPU is common to data center virtualization, but not sure
if it's in scope of cloud, which provide computing power, not
hardware resources.

And I think part of your purpose can be achieved through
https://wiki.openstack.org/wiki/CPUEntitlement and
https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I
hope a well implemented hypervisor will avoid needless vcpu migration
if the vcpu is very busy and required most of the pCPU's computing
capability (I knew Xen used to have some issue in the scheduler to
cause frequent vCPU migration long before).

I'm not sure the above stuff can be done with those.  It's not just
about quantity of resources, but also about which specific resources
will be used so that other things can be done based on that knowledge.

With the above stuff, it ensure the QoS and the compute capability for the 
guest, I think.

--jyh
  

Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Hi,

thank you for your comments. I am talking about quest performance. We 
are using openstack for managing Telco cloud applications where quest 
performance optimization is needed.
That example where pcpus are dedicated exclusively for vcpus is not a 
problem. It can be implemented by using scheduling filters and if you 
need that feature you can take the filter in use. Without it, pcpus are 
shared in normal way.


As Chris said, core pinning e.g. depending on NUMA topology is 
beneficial and I think its beneficial with or without exclusive 
dedication of pcpu.


Regards,
Tuomas

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-13 Thread Jiang, Yunhong


 -Original Message-
 From: Tuomas Paappanen [mailto:tuomas.paappa...@tieto.com]
 Sent: Wednesday, November 13, 2013 4:46 AM
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [nova] Core pinning
 
 Hi all,
 
 I would like to hear your thoughts about core pinning in Openstack.
 Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what
 can be used by instances. I didn't find blueprint, but I think this
 feature is for isolate cpus used by host from cpus used by
 instances(VCPUs).
 
 But, from performance point of view it is better to exclusively dedicate
 PCPUs for VCPUs and emulator. In some cases you may want to guarantee
 that only one instance(and its VCPUs) is using certain PCPUs.  By using
 core pinning you can optimize instance performance based on e.g. cache
 sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in
 multi socket hosts etc.

My 2 cents.
When you talking about  performance point of view, are you talking about 
guest performance, or overall performance? Pin PCPU is sure to benefit guest 
performance, but possibly not for overall performance, especially if the vCPU 
is not consume 100% of the CPU resources. 

I think pin CPU is common to data center virtualization, but not sure if it's 
in scope of cloud, which provide computing power, not hardware resources.

And I think part of your purpose can be achieved through 
https://wiki.openstack.org/wiki/CPUEntitlement and 
https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I hope a 
well implemented hypervisor will avoid needless vcpu migration if the vcpu is 
very busy and required most of the pCPU's computing capability (I knew Xen used 
to have some issue in the scheduler to cause frequent vCPU migration long 
before).

--jyh


 
 We have already implemented feature like this(PoC with limitations) to
 Nova Grizzly version and would like to hear your opinion about it.
 
 The current implementation consists of three main parts:
 - Definition of pcpu-vcpu maps for instances and instance spawning
 - (optional) Compute resource and capability advertising including free
 pcpus and NUMA topology.
 - (optional) Scheduling based on free cpus and NUMA topology.
 
 The implementation is quite simple:
 
 (additional/optional parts)
 Nova-computes are advertising free pcpus and NUMA topology in same
 manner than host capabilities. Instances are scheduled based on this
 information.
 
 (core pinning)
 admin can set PCPUs for VCPUs and for emulator process, or select NUMA
 cell for instance vcpus, by adding key:value pairs to flavor's extra specs.
 
 EXAMPLE:
 instance has 4 vcpus
 key:value
 vcpus:1,2,3,4 -- vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
 emulator:5 -- emulator pinned to pcpu5
 or
 numacell:0 -- all vcpus are pinned to pcpus in numa cell 0.
 
 In nova-compute, core pinning information is read from extra specs and
 added to domain xml same way as cpu quota values(cputune).
 
 cputune
vcpupin vcpu='0' cpuset='1'/
vcpupin vcpu='1' cpuset='2'/
vcpupin vcpu='2' cpuset='3'/
vcpupin vcpu='3' cpuset='4'/
emulatorpin cpuset='5'/
 /cputune
 
 What do you think? Implementation alternatives? Is this worth of
 blueprint? All related comments are welcome!
 
 Regards,
 Tuomas
 
 
 
 
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-13 Thread Chris Friesen

On 11/13/2013 11:40 AM, Jiang, Yunhong wrote:


But, from performance point of view it is better to exclusively
dedicate PCPUs for VCPUs and emulator. In some cases you may want
to guarantee that only one instance(and its VCPUs) is using certain
PCPUs.  By using core pinning you can optimize instance performance
based on e.g. cache sharing, NUMA topology, interrupt handling, pci
pass through(SR-IOV) in multi socket hosts etc.


My 2 cents. When you talking about  performance point of view, are
you talking about guest performance, or overall performance? Pin PCPU
is sure to benefit guest performance, but possibly not for overall
performance, especially if the vCPU is not consume 100% of the CPU
resources.


It can actually be both.  If a guest has several virtual cores that both 
access the same memory, it can be highly beneficial all around if all 
the memory/cpus for that guest come from a single NUMA node on the host. 
 That way you reduce the cross-NUMA-node memory traffic, increasing 
overall efficiency.  Alternately, if a guest has several cores that use 
lots of memory bandwidth but don't access the same data, you might want 
to ensure that the cores are on different NUMA nodes to equalize 
utilization of the different NUMA nodes.


Similarly, once you start talking about doing SR-IOV networking I/O 
passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it 
is beneficial to be able to steer interrupts on the physical host to the 
specific cpus on which the guest will be running.  This implies some 
form of pinning.



I think pin CPU is common to data center virtualization, but not sure
if it's in scope of cloud, which provide computing power, not
hardware resources.

And I think part of your purpose can be achieved through
https://wiki.openstack.org/wiki/CPUEntitlement and
https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I
hope a well implemented hypervisor will avoid needless vcpu migration
if the vcpu is very busy and required most of the pCPU's computing
capability (I knew Xen used to have some issue in the scheduler to
cause frequent vCPU migration long before).


I'm not sure the above stuff can be done with those.  It's not just 
about quantity of resources, but also about which specific resources 
will be used so that other things can be done based on that knowledge.


Chris

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-13 Thread Jiang, Yunhong


 -Original Message-
 From: Chris Friesen [mailto:chris.frie...@windriver.com]
 Sent: Wednesday, November 13, 2013 9:57 AM
 To: openstack-dev@lists.openstack.org
 Subject: Re: [openstack-dev] [nova] Core pinning
 
 On 11/13/2013 11:40 AM, Jiang, Yunhong wrote:
 
  But, from performance point of view it is better to exclusively
  dedicate PCPUs for VCPUs and emulator. In some cases you may want
  to guarantee that only one instance(and its VCPUs) is using certain
  PCPUs.  By using core pinning you can optimize instance performance
  based on e.g. cache sharing, NUMA topology, interrupt handling, pci
  pass through(SR-IOV) in multi socket hosts etc.
 
  My 2 cents. When you talking about  performance point of view, are
  you talking about guest performance, or overall performance? Pin PCPU
  is sure to benefit guest performance, but possibly not for overall
  performance, especially if the vCPU is not consume 100% of the CPU
  resources.
 
 It can actually be both.  If a guest has several virtual cores that both
 access the same memory, it can be highly beneficial all around if all
 the memory/cpus for that guest come from a single NUMA node on the
 host.
   That way you reduce the cross-NUMA-node memory traffic, increasing
 overall efficiency.  Alternately, if a guest has several cores that use
 lots of memory bandwidth but don't access the same data, you might want
 to ensure that the cores are on different NUMA nodes to equalize
 utilization of the different NUMA nodes.

I think the Tuomas is talking about  exclusively dedicate PCPUs for VCPUs, in 
that situation, that pCPU can't be shared by other vCPU anymore. If this vCPU 
like cost only 50% of the PCPU usage, it's sure to be a waste of the overall 
performance. 

As to the cross NUMA node access, I'd let hypervisor, instead of cloud OS, to 
reduce the cross NUMA access as much as possible.

I'm not against such usage, it's sure to be used on data center virtualization. 
Just question if it's for cloud.


 
 Similarly, once you start talking about doing SR-IOV networking I/O
 passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it
 is beneficial to be able to steer interrupts on the physical host to the
 specific cpus on which the guest will be running.  This implies some
 form of pinning.

Still, I think hypervisor should achieve this, instead of openstack.


 
  I think pin CPU is common to data center virtualization, but not sure
  if it's in scope of cloud, which provide computing power, not
  hardware resources.
 
  And I think part of your purpose can be achieved through
  https://wiki.openstack.org/wiki/CPUEntitlement and
  https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I
  hope a well implemented hypervisor will avoid needless vcpu migration
  if the vcpu is very busy and required most of the pCPU's computing
  capability (I knew Xen used to have some issue in the scheduler to
  cause frequent vCPU migration long before).
 
 I'm not sure the above stuff can be done with those.  It's not just
 about quantity of resources, but also about which specific resources
 will be used so that other things can be done based on that knowledge.

With the above stuff, it ensure the QoS and the compute capability for the 
guest, I think.

--jyh
 
 
 Chris
 
 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev