Re: [openstack-dev] [nova] Core pinning
On Wed, Nov 27, 2013 at 03:50:47PM +0200, Tuomas Paappanen wrote: > >On Tue, 2013-11-19 at 12:52 +, Daniel P. Berrange wrote: > >>I think there are several use cases mixed up in your descriptions > >>here which should likely be considered independantly > >> > >> - pCPU/vCPU pinning > >> > >>I don't really think this is a good idea as a general purpose > >>feature in its own right. It tends to lead to fairly inefficient > >>use of CPU resources when you consider that a large % of guests > >>will be mostly idle most of the time. It has a fairly high > >>administrative burden to maintain explicit pinning too. This > >>feels like a data center virt use case rather than cloud use > >>case really. > >> > >> - Dedicated CPU reservation > >> > >>The ability of an end user to request that their VM (or their > >>group of VMs) gets assigned a dedicated host CPU set to run on. > >>This is obviously something that would have to be controlled > >>at a flavour level, and in a commercial deployment would carry > >>a hefty pricing premium. > >> > >>I don't think you want to expose explicit pCPU/vCPU placement > >>for this though. Just request the high level concept and allow > >>the virt host to decide actual placement > I think pcpu/vcpu pinning could be considered like an extension for > dedicated cpu reservation feature. And I agree that if we > exclusively dedicate pcpus for VMs it is inefficient from cloud > point of view, but in some case, end user may want to be sure(and > ready to pay) that their VMs have resources available e.g. for > sudden load peaks. > > So, here is my proposal how dedicated cpu reservation would function > on high level: > > When an end user wants VM with nn vcpus which are running on > dedicated host cpu set, admin could enable it by setting a new > "dedicate_pcpu" parameter in a flavor(e.g. optional flavor > parameter). By default, amount of pcpus and vcpus could be same. And > as option, explicit vcpu/pcpu pinning could be done by defining > vcpu/pcpu relations to flavors extra specs(vcpupin:0 0...). > > In the virt driver there is two alternatives how to do the pcpu > sharing 1. all dedicated pcpus are shared with all vcpus(default > case) or 2. each vcpu has dedicated pcpu(vcpu 0 will be pinned to > the first pcpu in a cpu set, vcpu 1 to the second pcpu and so on). > Vcpu/pcpu pinning option could be used to extend the latter case. > > In any case, before VM with or without dedicated pcpus is launched > the virt driver must take care of that the dedicated pcpus are > excluded from existing VMs and from a new VMs and that there are > enough free pcpus for placement. And I think minimum amount of pcpus > for VMs without dedicated pcpus must be configurable somewhere. > > Comments? I still don't believe that vcpu:pcpu pinning is something we want to do, even with dedicated CPUs. There are always threads in the host doing work on behalf of the VM that are not related to vCPUs. For example the main QEMU emulator thread, the QEMU I/O threads, kernel threads. Other hypervisors have similar behaviour. It is better to let the kernel / hypervisor scheduler decide how to balance the competing workloads than forcing a fixed & suboptimally performing vcpu:pcpu mapping. The only time I've seen fixed pinning make a consistent benefit is when you have NUMA involved and want to prevent a VM spanning NUMA nodes. Even then you'd just be best pinning to the set of CPUs in a node and then letting the vCPUs float amonst the pCPUs in that node. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
On 19.11.2013 20:18, yunhong jiang wrote: On Tue, 2013-11-19 at 12:52 +, Daniel P. Berrange wrote: On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote: Hi all, I would like to hear your thoughts about core pinning in Openstack. Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what can be used by instances. I didn't find blueprint, but I think this feature is for isolate cpus used by host from cpus used by instances(VCPUs). But, from performance point of view it is better to exclusively dedicate PCPUs for VCPUs and emulator. In some cases you may want to guarantee that only one instance(and its VCPUs) is using certain PCPUs. By using core pinning you can optimize instance performance based on e.g. cache sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket hosts etc. We have already implemented feature like this(PoC with limitations) to Nova Grizzly version and would like to hear your opinion about it. The current implementation consists of three main parts: - Definition of pcpu-vcpu maps for instances and instance spawning - (optional) Compute resource and capability advertising including free pcpus and NUMA topology. - (optional) Scheduling based on free cpus and NUMA topology. The implementation is quite simple: (additional/optional parts) Nova-computes are advertising free pcpus and NUMA topology in same manner than host capabilities. Instances are scheduled based on this information. (core pinning) admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell for instance vcpus, by adding key:value pairs to flavor's extra specs. EXAMPLE: instance has 4 vcpus : vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... emulator:5 --> emulator pinned to pcpu5 or numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. In nova-compute, core pinning information is read from extra specs and added to domain xml same way as cpu quota values(cputune). What do you think? Implementation alternatives? Is this worth of blueprint? All related comments are welcome! I think there are several use cases mixed up in your descriptions here which should likely be considered independantly - pCPU/vCPU pinning I don't really think this is a good idea as a general purpose feature in its own right. It tends to lead to fairly inefficient use of CPU resources when you consider that a large % of guests will be mostly idle most of the time. It has a fairly high administrative burden to maintain explicit pinning too. This feels like a data center virt use case rather than cloud use case really. - Dedicated CPU reservation The ability of an end user to request that their VM (or their group of VMs) gets assigned a dedicated host CPU set to run on. This is obviously something that would have to be controlled at a flavour level, and in a commercial deployment would carry a hefty pricing premium. I don't think you want to expose explicit pCPU/vCPU placement for this though. Just request the high level concept and allow the virt host to decide actual placement I think pcpu/vcpu pinning could be considered like an extension for dedicated cpu reservation feature. And I agree that if we exclusively dedicate pcpus for VMs it is inefficient from cloud point of view, but in some case, end user may want to be sure(and ready to pay) that their VMs have resources available e.g. for sudden load peaks. So, here is my proposal how dedicated cpu reservation would function on high level: When an end user wants VM with nn vcpus which are running on dedicated host cpu set, admin could enable it by setting a new "dedicate_pcpu" parameter in a flavor(e.g. optional flavor parameter). By default, amount of pcpus and vcpus could be same. And as option, explicit vcpu/pcpu pinning could be done by defining vcpu/pcpu relations to flavors extra specs(vcpupin:0 0...). In the virt driver there is two alternatives how to do the pcpu sharing 1. all dedicated pcpus are shared with all vcpus(default case) or 2. each vcpu has dedicated pcpu(vcpu 0 will be pinned to the first pcpu in a cpu set, vcpu 1 to the second pcpu and so on). Vcpu/pcpu pinning option could be used to extend the latter case. In any case, before VM with or without dedicated pcpus is launched the virt driver must take care of that the dedicated pcpus are excluded from existing VMs and from a new VMs and that there are enough free pcpus for placement. And I think minimum amount of pcpus for VMs without dedicated pcpus must be configurable somewhere. Comments? Br, Tuomas - Host NUMA placement. By not taking NUMA into account currently the libvirt driver at least is badly wasting resources. Having too much cross-numa node memory access by guests just kills scalability. The virt driver should really automaticall figure out cpu & memory pinni
Re: [openstack-dev] [nova] Core pinning
Tuomas, > I haven't but I will write a blueprint for the core pinning part. Can’t wait to see it! > Are you using extra specs for carrying cpuset attributes in your > implementation? Yes, exactly. Although we're using slightly different syntax to update flavor, for example: $ nova flavor-key set set vcpupin:0=1-5,12-17 In here ‘0’ is vCPU, and '1-5,12-17' - pCPUs. Basically this command results in the following libvirt xml: We’re also using the ‘placement’ attribute of set to ‘static’: $ nova flavor-key set vcpu:placement=static Which results in the following libvirt xml: … Otherwise, the functionality and implementation seem to be identical. [offtopic] Apologies for delayed answer, for some reason I thought your email would arrive to my personal mailbox [/offtopic] -Roman On Nov 19, 2013, at 14:35, Tuomas Paappanen wrote: > Hi Roman, > > I haven't but I will write a blueprint for the core pinning part. > I considered vcpu element usage as well but in that case you can not set e.g. > vcpu-0 to run on pcpu-0. Vcpus and emulator are sharing all pcpus defined in > cpuset so I decided to use cputune element. > > Are you using extra specs for carrying cpuset attributes in your > implementation? > > Br,Tuomas > > On 18.11.2013 17:14, Roman Verchikov wrote: >> Tuomas, >> >> Have you published your code/blueprints anywhere? Looks like we’re working >> on the same stuff. I have implemented almost the same feature set (haven’t >> published anything yet because of this thread), except for the scheduler >> part. The main goal is to be able to pin VCPUs in NUMA environment. >> >> Have you considered adding placement and cpuset attributes to >> element? For example: >> >> >> Thanks, >> Roman >> >> On Nov 13, 2013, at 14:46, Tuomas Paappanen >> wrote: >> >>> Hi all, >>> >>> I would like to hear your thoughts about core pinning in Openstack. >>> Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what can >>> be used by instances. I didn't find blueprint, but I think this feature is >>> for isolate cpus used by host from cpus used by instances(VCPUs). >>> >>> But, from performance point of view it is better to exclusively dedicate >>> PCPUs for VCPUs and emulator. In some cases you may want to guarantee that >>> only one instance(and its VCPUs) is using certain PCPUs. By using core >>> pinning you can optimize instance performance based on e.g. cache sharing, >>> NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket >>> hosts etc. >>> >>> We have already implemented feature like this(PoC with limitations) to Nova >>> Grizzly version and would like to hear your opinion about it. >>> >>> The current implementation consists of three main parts: >>> - Definition of pcpu-vcpu maps for instances and instance spawning >>> - (optional) Compute resource and capability advertising including free >>> pcpus and NUMA topology. >>> - (optional) Scheduling based on free cpus and NUMA topology. >>> >>> The implementation is quite simple: >>> >>> (additional/optional parts) >>> Nova-computes are advertising free pcpus and NUMA topology in same manner >>> than host capabilities. Instances are scheduled based on this information. >>> >>> (core pinning) >>> admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell >>> for instance vcpus, by adding key:value pairs to flavor's extra specs. >>> >>> EXAMPLE: >>> instance has 4 vcpus >>> : >>> vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... >>> emulator:5 --> emulator pinned to pcpu5 >>> or >>> numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. >>> >>> In nova-compute, core pinning information is read from extra specs and >>> added to domain xml same way as cpu quota values(cputune). >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> What do you think? Implementation alternatives? Is this worth of blueprint? >>> All related comments are welcome! >>> >>> Regards, >>> Tuomas >>> >>> >>> >>> >>> >>> ___ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> ___ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
On Tue, 2013-11-19 at 12:52 +, Daniel P. Berrange wrote: > On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote: > > Hi all, > > > > I would like to hear your thoughts about core pinning in Openstack. > > Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs > > what can be used by instances. I didn't find blueprint, but I think > > this feature is for isolate cpus used by host from cpus used by > > instances(VCPUs). > > > > But, from performance point of view it is better to exclusively > > dedicate PCPUs for VCPUs and emulator. In some cases you may want to > > guarantee that only one instance(and its VCPUs) is using certain > > PCPUs. By using core pinning you can optimize instance performance > > based on e.g. cache sharing, NUMA topology, interrupt handling, pci > > pass through(SR-IOV) in multi socket hosts etc. > > > > We have already implemented feature like this(PoC with limitations) > > to Nova Grizzly version and would like to hear your opinion about > > it. > > > > The current implementation consists of three main parts: > > - Definition of pcpu-vcpu maps for instances and instance spawning > > - (optional) Compute resource and capability advertising including > > free pcpus and NUMA topology. > > - (optional) Scheduling based on free cpus and NUMA topology. > > > > The implementation is quite simple: > > > > (additional/optional parts) > > Nova-computes are advertising free pcpus and NUMA topology in same > > manner than host capabilities. Instances are scheduled based on this > > information. > > > > (core pinning) > > admin can set PCPUs for VCPUs and for emulator process, or select > > NUMA cell for instance vcpus, by adding key:value pairs to flavor's > > extra specs. > > > > EXAMPLE: > > instance has 4 vcpus > > : > > vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... > > emulator:5 --> emulator pinned to pcpu5 > > or > > numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. > > > > In nova-compute, core pinning information is read from extra specs > > and added to domain xml same way as cpu quota values(cputune). > > > > > > > > > > > > > > > > > > > > What do you think? Implementation alternatives? Is this worth of > > blueprint? All related comments are welcome! > > I think there are several use cases mixed up in your descriptions > here which should likely be considered independantly > > - pCPU/vCPU pinning > >I don't really think this is a good idea as a general purpose >feature in its own right. It tends to lead to fairly inefficient >use of CPU resources when you consider that a large % of guests >will be mostly idle most of the time. It has a fairly high >administrative burden to maintain explicit pinning too. This >feels like a data center virt use case rather than cloud use >case really. > > - Dedicated CPU reservation > >The ability of an end user to request that their VM (or their >group of VMs) gets assigned a dedicated host CPU set to run on. >This is obviously something that would have to be controlled >at a flavour level, and in a commercial deployment would carry >a hefty pricing premium. > >I don't think you want to expose explicit pCPU/vCPU placement >for this though. Just request the high level concept and allow >the virt host to decide actual placement > > - Host NUMA placement. > >By not taking NUMA into account currently the libvirt driver >at least is badly wasting resources. Having too much cross-numa >node memory access by guests just kills scalability. The virt >driver should really automaticall figure out cpu & memory pinning >within the scope of a NUMA node automatically. No admin config >should be required for this. > > - Guest NUMA topology > >If the flavour memory size / cpu count exceeds the size of a >single NUMA node, then the flavour should likely have a way to >express that the guest should see multiple NUMA nodes. The >virt host would then set guest NUMA topology to match the way >it places vCPUs & memory on host NUMA nodes. Again you don't >want explicit pcpu/vcpu mapping done by the admin for this. > > > > Regards, > Daniel Quite clear splitting and +1 for P/V pin option. --jyh ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
On Wed, Nov 13, 2013 at 11:57:22AM -0600, Chris Friesen wrote: > On 11/13/2013 11:40 AM, Jiang, Yunhong wrote: > > >>But, from performance point of view it is better to exclusively > >>dedicate PCPUs for VCPUs and emulator. In some cases you may want > >>to guarantee that only one instance(and its VCPUs) is using certain > >>PCPUs. By using core pinning you can optimize instance performance > >>based on e.g. cache sharing, NUMA topology, interrupt handling, pci > >>pass through(SR-IOV) in multi socket hosts etc. > > > >My 2 cents. When you talking about " performance point of view", are > >you talking about guest performance, or overall performance? Pin PCPU > >is sure to benefit guest performance, but possibly not for overall > >performance, especially if the vCPU is not consume 100% of the CPU > >resources. > > It can actually be both. If a guest has several virtual cores that > both access the same memory, it can be highly beneficial all around > if all the memory/cpus for that guest come from a single NUMA node > on the host. That way you reduce the cross-NUMA-node memory > traffic, increasing overall efficiency. Alternately, if a guest has > several cores that use lots of memory bandwidth but don't access the > same data, you might want to ensure that the cores are on different > NUMA nodes to equalize utilization of the different NUMA nodes. > > Similarly, once you start talking about doing SR-IOV networking I/O > passthrough into a guest (for SDN/NFV stuff) for optimum efficiency > it is beneficial to be able to steer interrupts on the physical host > to the specific cpus on which the guest will be running. This > implies some form of pinning. I would say intelligent NUMA placement is something that virt drivers should address automatically without any need for admin defined pinning. The latter is just imposing too much admin burden, for something we can figure out automatically to a good enough extent. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
On Wed, Nov 13, 2013 at 02:46:06PM +0200, Tuomas Paappanen wrote: > Hi all, > > I would like to hear your thoughts about core pinning in Openstack. > Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs > what can be used by instances. I didn't find blueprint, but I think > this feature is for isolate cpus used by host from cpus used by > instances(VCPUs). > > But, from performance point of view it is better to exclusively > dedicate PCPUs for VCPUs and emulator. In some cases you may want to > guarantee that only one instance(and its VCPUs) is using certain > PCPUs. By using core pinning you can optimize instance performance > based on e.g. cache sharing, NUMA topology, interrupt handling, pci > pass through(SR-IOV) in multi socket hosts etc. > > We have already implemented feature like this(PoC with limitations) > to Nova Grizzly version and would like to hear your opinion about > it. > > The current implementation consists of three main parts: > - Definition of pcpu-vcpu maps for instances and instance spawning > - (optional) Compute resource and capability advertising including > free pcpus and NUMA topology. > - (optional) Scheduling based on free cpus and NUMA topology. > > The implementation is quite simple: > > (additional/optional parts) > Nova-computes are advertising free pcpus and NUMA topology in same > manner than host capabilities. Instances are scheduled based on this > information. > > (core pinning) > admin can set PCPUs for VCPUs and for emulator process, or select > NUMA cell for instance vcpus, by adding key:value pairs to flavor's > extra specs. > > EXAMPLE: > instance has 4 vcpus > : > vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... > emulator:5 --> emulator pinned to pcpu5 > or > numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. > > In nova-compute, core pinning information is read from extra specs > and added to domain xml same way as cpu quota values(cputune). > > > > > > > > > > What do you think? Implementation alternatives? Is this worth of > blueprint? All related comments are welcome! I think there are several use cases mixed up in your descriptions here which should likely be considered independantly - pCPU/vCPU pinning I don't really think this is a good idea as a general purpose feature in its own right. It tends to lead to fairly inefficient use of CPU resources when you consider that a large % of guests will be mostly idle most of the time. It has a fairly high administrative burden to maintain explicit pinning too. This feels like a data center virt use case rather than cloud use case really. - Dedicated CPU reservation The ability of an end user to request that their VM (or their group of VMs) gets assigned a dedicated host CPU set to run on. This is obviously something that would have to be controlled at a flavour level, and in a commercial deployment would carry a hefty pricing premium. I don't think you want to expose explicit pCPU/vCPU placement for this though. Just request the high level concept and allow the virt host to decide actual placement - Host NUMA placement. By not taking NUMA into account currently the libvirt driver at least is badly wasting resources. Having too much cross-numa node memory access by guests just kills scalability. The virt driver should really automaticall figure out cpu & memory pinning within the scope of a NUMA node automatically. No admin config should be required for this. - Guest NUMA topology If the flavour memory size / cpu count exceeds the size of a single NUMA node, then the flavour should likely have a way to express that the guest should see multiple NUMA nodes. The virt host would then set guest NUMA topology to match the way it places vCPUs & memory on host NUMA nodes. Again you don't want explicit pcpu/vcpu mapping done by the admin for this. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
Hi Roman, I haven't but I will write a blueprint for the core pinning part. I considered vcpu element usage as well but in that case you can not set e.g. vcpu-0 to run on pcpu-0. Vcpus and emulator are sharing all pcpus defined in cpuset so I decided to use cputune element. Are you using extra specs for carrying cpuset attributes in your implementation? Br,Tuomas On 18.11.2013 17:14, Roman Verchikov wrote: Tuomas, Have you published your code/blueprints anywhere? Looks like we’re working on the same stuff. I have implemented almost the same feature set (haven’t published anything yet because of this thread), except for the scheduler part. The main goal is to be able to pin VCPUs in NUMA environment. Have you considered adding placement and cpuset attributes to element? For example: Thanks, Roman On Nov 13, 2013, at 14:46, Tuomas Paappanen wrote: Hi all, I would like to hear your thoughts about core pinning in Openstack. Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what can be used by instances. I didn't find blueprint, but I think this feature is for isolate cpus used by host from cpus used by instances(VCPUs). But, from performance point of view it is better to exclusively dedicate PCPUs for VCPUs and emulator. In some cases you may want to guarantee that only one instance(and its VCPUs) is using certain PCPUs. By using core pinning you can optimize instance performance based on e.g. cache sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket hosts etc. We have already implemented feature like this(PoC with limitations) to Nova Grizzly version and would like to hear your opinion about it. The current implementation consists of three main parts: - Definition of pcpu-vcpu maps for instances and instance spawning - (optional) Compute resource and capability advertising including free pcpus and NUMA topology. - (optional) Scheduling based on free cpus and NUMA topology. The implementation is quite simple: (additional/optional parts) Nova-computes are advertising free pcpus and NUMA topology in same manner than host capabilities. Instances are scheduled based on this information. (core pinning) admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell for instance vcpus, by adding key:value pairs to flavor's extra specs. EXAMPLE: instance has 4 vcpus : vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... emulator:5 --> emulator pinned to pcpu5 or numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. In nova-compute, core pinning information is read from extra specs and added to domain xml same way as cpu quota values(cputune). What do you think? Implementation alternatives? Is this worth of blueprint? All related comments are welcome! Regards, Tuomas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
Tuomas, Have you published your code/blueprints anywhere? Looks like we’re working on the same stuff. I have implemented almost the same feature set (haven’t published anything yet because of this thread), except for the scheduler part. The main goal is to be able to pin VCPUs in NUMA environment. Have you considered adding placement and cpuset attributes to element? For example: Thanks, Roman On Nov 13, 2013, at 14:46, Tuomas Paappanen wrote: > Hi all, > > I would like to hear your thoughts about core pinning in Openstack. Currently > nova(with qemu-kvm) supports usage of cpu set of PCPUs what can be used by > instances. I didn't find blueprint, but I think this feature is for isolate > cpus used by host from cpus used by instances(VCPUs). > > But, from performance point of view it is better to exclusively dedicate > PCPUs for VCPUs and emulator. In some cases you may want to guarantee that > only one instance(and its VCPUs) is using certain PCPUs. By using core > pinning you can optimize instance performance based on e.g. cache sharing, > NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket > hosts etc. > > We have already implemented feature like this(PoC with limitations) to Nova > Grizzly version and would like to hear your opinion about it. > > The current implementation consists of three main parts: > - Definition of pcpu-vcpu maps for instances and instance spawning > - (optional) Compute resource and capability advertising including free pcpus > and NUMA topology. > - (optional) Scheduling based on free cpus and NUMA topology. > > The implementation is quite simple: > > (additional/optional parts) > Nova-computes are advertising free pcpus and NUMA topology in same manner > than host capabilities. Instances are scheduled based on this information. > > (core pinning) > admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell > for instance vcpus, by adding key:value pairs to flavor's extra specs. > > EXAMPLE: > instance has 4 vcpus > : > vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... > emulator:5 --> emulator pinned to pcpu5 > or > numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. > > In nova-compute, core pinning information is read from extra specs and added > to domain xml same way as cpu quota values(cputune). > > > > > > > > > > What do you think? Implementation alternatives? Is this worth of blueprint? > All related comments are welcome! > > Regards, > Tuomas > > > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
Hi, The use cases fro CPU pinning are exactly like discussed above: (1) lowering guest scheduling latencies and (2) improving networking latencies by pinning the SR-IOV IRQ's to specific cores. There is also a third use case, (3) avoiding long latencies with spinlocks. > On Wed, Nov 13, 2013 at 8:20 PM, Jiang, Yunhong wrote: > >> Similarly, once you start talking about doing SR-IOV networking I/O >> passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it >> is beneficial to be able to steer interrupts on the physical host to the >> specific cpus on which the guest will be running. This implies some >> form of pinning. > Still, I think hypervisor should achieve this, instead of openstack. How would this work? As a solution, this would be much better since then OpenStack would have much less low-level work to do. -Tapio ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
On 13.11.2013 20:20, Jiang, Yunhong wrote: -Original Message- From: Chris Friesen [mailto:chris.frie...@windriver.com] Sent: Wednesday, November 13, 2013 9:57 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [nova] Core pinning On 11/13/2013 11:40 AM, Jiang, Yunhong wrote: But, from performance point of view it is better to exclusively dedicate PCPUs for VCPUs and emulator. In some cases you may want to guarantee that only one instance(and its VCPUs) is using certain PCPUs. By using core pinning you can optimize instance performance based on e.g. cache sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket hosts etc. My 2 cents. When you talking about " performance point of view", are you talking about guest performance, or overall performance? Pin PCPU is sure to benefit guest performance, but possibly not for overall performance, especially if the vCPU is not consume 100% of the CPU resources. It can actually be both. If a guest has several virtual cores that both access the same memory, it can be highly beneficial all around if all the memory/cpus for that guest come from a single NUMA node on the host. That way you reduce the cross-NUMA-node memory traffic, increasing overall efficiency. Alternately, if a guest has several cores that use lots of memory bandwidth but don't access the same data, you might want to ensure that the cores are on different NUMA nodes to equalize utilization of the different NUMA nodes. I think the Tuomas is talking about " exclusively dedicate PCPUs for VCPUs", in that situation, that pCPU can't be shared by other vCPU anymore. If this vCPU like cost only 50% of the PCPU usage, it's sure to be a waste of the overall performance. As to the cross NUMA node access, I'd let hypervisor, instead of cloud OS, to reduce the cross NUMA access as much as possible. I'm not against such usage, it's sure to be used on data center virtualization. Just question if it's for cloud. Similarly, once you start talking about doing SR-IOV networking I/O passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it is beneficial to be able to steer interrupts on the physical host to the specific cpus on which the guest will be running. This implies some form of pinning. Still, I think hypervisor should achieve this, instead of openstack. I think pin CPU is common to data center virtualization, but not sure if it's in scope of cloud, which provide computing power, not hardware resources. And I think part of your purpose can be achieved through https://wiki.openstack.org/wiki/CPUEntitlement and https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I hope a well implemented hypervisor will avoid needless vcpu migration if the vcpu is very busy and required most of the pCPU's computing capability (I knew Xen used to have some issue in the scheduler to cause frequent vCPU migration long before). I'm not sure the above stuff can be done with those. It's not just about quantity of resources, but also about which specific resources will be used so that other things can be done based on that knowledge. With the above stuff, it ensure the QoS and the compute capability for the guest, I think. --jyh Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Hi, thank you for your comments. I am talking about quest performance. We are using openstack for managing Telco cloud applications where quest performance optimization is needed. That example where pcpus are dedicated exclusively for vcpus is not a problem. It can be implemented by using scheduling filters and if you need that feature you can take the filter in use. Without it, pcpus are shared in normal way. As Chris said, core pinning e.g. depending on NUMA topology is beneficial and I think its beneficial with or without exclusive dedication of pcpu. Regards, Tuomas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
> -Original Message- > From: Chris Friesen [mailto:chris.frie...@windriver.com] > Sent: Wednesday, November 13, 2013 9:57 AM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [nova] Core pinning > > On 11/13/2013 11:40 AM, Jiang, Yunhong wrote: > > >> But, from performance point of view it is better to exclusively > >> dedicate PCPUs for VCPUs and emulator. In some cases you may want > >> to guarantee that only one instance(and its VCPUs) is using certain > >> PCPUs. By using core pinning you can optimize instance performance > >> based on e.g. cache sharing, NUMA topology, interrupt handling, pci > >> pass through(SR-IOV) in multi socket hosts etc. > > > > My 2 cents. When you talking about " performance point of view", are > > you talking about guest performance, or overall performance? Pin PCPU > > is sure to benefit guest performance, but possibly not for overall > > performance, especially if the vCPU is not consume 100% of the CPU > > resources. > > It can actually be both. If a guest has several virtual cores that both > access the same memory, it can be highly beneficial all around if all > the memory/cpus for that guest come from a single NUMA node on the > host. > That way you reduce the cross-NUMA-node memory traffic, increasing > overall efficiency. Alternately, if a guest has several cores that use > lots of memory bandwidth but don't access the same data, you might want > to ensure that the cores are on different NUMA nodes to equalize > utilization of the different NUMA nodes. I think the Tuomas is talking about " exclusively dedicate PCPUs for VCPUs", in that situation, that pCPU can't be shared by other vCPU anymore. If this vCPU like cost only 50% of the PCPU usage, it's sure to be a waste of the overall performance. As to the cross NUMA node access, I'd let hypervisor, instead of cloud OS, to reduce the cross NUMA access as much as possible. I'm not against such usage, it's sure to be used on data center virtualization. Just question if it's for cloud. > > Similarly, once you start talking about doing SR-IOV networking I/O > passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it > is beneficial to be able to steer interrupts on the physical host to the > specific cpus on which the guest will be running. This implies some > form of pinning. Still, I think hypervisor should achieve this, instead of openstack. > > > I think pin CPU is common to data center virtualization, but not sure > > if it's in scope of cloud, which provide computing power, not > > hardware resources. > > > > And I think part of your purpose can be achieved through > > https://wiki.openstack.org/wiki/CPUEntitlement and > > https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I > > hope a well implemented hypervisor will avoid needless vcpu migration > > if the vcpu is very busy and required most of the pCPU's computing > > capability (I knew Xen used to have some issue in the scheduler to > > cause frequent vCPU migration long before). > > I'm not sure the above stuff can be done with those. It's not just > about quantity of resources, but also about which specific resources > will be used so that other things can be done based on that knowledge. With the above stuff, it ensure the QoS and the compute capability for the guest, I think. --jyh > > Chris > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
On 11/13/2013 11:40 AM, Jiang, Yunhong wrote: But, from performance point of view it is better to exclusively dedicate PCPUs for VCPUs and emulator. In some cases you may want to guarantee that only one instance(and its VCPUs) is using certain PCPUs. By using core pinning you can optimize instance performance based on e.g. cache sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket hosts etc. My 2 cents. When you talking about " performance point of view", are you talking about guest performance, or overall performance? Pin PCPU is sure to benefit guest performance, but possibly not for overall performance, especially if the vCPU is not consume 100% of the CPU resources. It can actually be both. If a guest has several virtual cores that both access the same memory, it can be highly beneficial all around if all the memory/cpus for that guest come from a single NUMA node on the host. That way you reduce the cross-NUMA-node memory traffic, increasing overall efficiency. Alternately, if a guest has several cores that use lots of memory bandwidth but don't access the same data, you might want to ensure that the cores are on different NUMA nodes to equalize utilization of the different NUMA nodes. Similarly, once you start talking about doing SR-IOV networking I/O passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it is beneficial to be able to steer interrupts on the physical host to the specific cpus on which the guest will be running. This implies some form of pinning. I think pin CPU is common to data center virtualization, but not sure if it's in scope of cloud, which provide computing power, not hardware resources. And I think part of your purpose can be achieved through https://wiki.openstack.org/wiki/CPUEntitlement and https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I hope a well implemented hypervisor will avoid needless vcpu migration if the vcpu is very busy and required most of the pCPU's computing capability (I knew Xen used to have some issue in the scheduler to cause frequent vCPU migration long before). I'm not sure the above stuff can be done with those. It's not just about quantity of resources, but also about which specific resources will be used so that other things can be done based on that knowledge. Chris ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Core pinning
> -Original Message- > From: Tuomas Paappanen [mailto:tuomas.paappa...@tieto.com] > Sent: Wednesday, November 13, 2013 4:46 AM > To: openstack-dev@lists.openstack.org > Subject: [openstack-dev] [nova] Core pinning > > Hi all, > > I would like to hear your thoughts about core pinning in Openstack. > Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what > can be used by instances. I didn't find blueprint, but I think this > feature is for isolate cpus used by host from cpus used by > instances(VCPUs). > > But, from performance point of view it is better to exclusively dedicate > PCPUs for VCPUs and emulator. In some cases you may want to guarantee > that only one instance(and its VCPUs) is using certain PCPUs. By using > core pinning you can optimize instance performance based on e.g. cache > sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in > multi socket hosts etc. My 2 cents. When you talking about " performance point of view", are you talking about guest performance, or overall performance? Pin PCPU is sure to benefit guest performance, but possibly not for overall performance, especially if the vCPU is not consume 100% of the CPU resources. I think pin CPU is common to data center virtualization, but not sure if it's in scope of cloud, which provide computing power, not hardware resources. And I think part of your purpose can be achieved through https://wiki.openstack.org/wiki/CPUEntitlement and https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I hope a well implemented hypervisor will avoid needless vcpu migration if the vcpu is very busy and required most of the pCPU's computing capability (I knew Xen used to have some issue in the scheduler to cause frequent vCPU migration long before). --jyh > > We have already implemented feature like this(PoC with limitations) to > Nova Grizzly version and would like to hear your opinion about it. > > The current implementation consists of three main parts: > - Definition of pcpu-vcpu maps for instances and instance spawning > - (optional) Compute resource and capability advertising including free > pcpus and NUMA topology. > - (optional) Scheduling based on free cpus and NUMA topology. > > The implementation is quite simple: > > (additional/optional parts) > Nova-computes are advertising free pcpus and NUMA topology in same > manner than host capabilities. Instances are scheduled based on this > information. > > (core pinning) > admin can set PCPUs for VCPUs and for emulator process, or select NUMA > cell for instance vcpus, by adding key:value pairs to flavor's extra specs. > > EXAMPLE: > instance has 4 vcpus > : > vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... > emulator:5 --> emulator pinned to pcpu5 > or > numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. > > In nova-compute, core pinning information is read from extra specs and > added to domain xml same way as cpu quota values(cputune). > > > > > > > > > > What do you think? Implementation alternatives? Is this worth of > blueprint? All related comments are welcome! > > Regards, > Tuomas > > > > > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Core pinning
Hi all, I would like to hear your thoughts about core pinning in Openstack. Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what can be used by instances. I didn't find blueprint, but I think this feature is for isolate cpus used by host from cpus used by instances(VCPUs). But, from performance point of view it is better to exclusively dedicate PCPUs for VCPUs and emulator. In some cases you may want to guarantee that only one instance(and its VCPUs) is using certain PCPUs. By using core pinning you can optimize instance performance based on e.g. cache sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in multi socket hosts etc. We have already implemented feature like this(PoC with limitations) to Nova Grizzly version and would like to hear your opinion about it. The current implementation consists of three main parts: - Definition of pcpu-vcpu maps for instances and instance spawning - (optional) Compute resource and capability advertising including free pcpus and NUMA topology. - (optional) Scheduling based on free cpus and NUMA topology. The implementation is quite simple: (additional/optional parts) Nova-computes are advertising free pcpus and NUMA topology in same manner than host capabilities. Instances are scheduled based on this information. (core pinning) admin can set PCPUs for VCPUs and for emulator process, or select NUMA cell for instance vcpus, by adding key:value pairs to flavor's extra specs. EXAMPLE: instance has 4 vcpus : vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2... emulator:5 --> emulator pinned to pcpu5 or numacell:0 --> all vcpus are pinned to pcpus in numa cell 0. In nova-compute, core pinning information is read from extra specs and added to domain xml same way as cpu quota values(cputune). What do you think? Implementation alternatives? Is this worth of blueprint? All related comments are welcome! Regards, Tuomas ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev