[Qemu-devel] Using Linux's CPUSET for KVM VCPUs

2010-07-22 Thread Andre Przywara

Hi all,

while working on NUMA host pinning, I experimented with vCPU affinity 
within QEMU, but left it alone as it would complicate the code and would 
not achieve better experience than using taskset with the monitor 
provided thread ids like it is done currently. During that I looked at 
Linux' CPUSET implementation 
(/src/linux-2.6/Documentation/cgroups/cpusets.txt).
In brief, this is a pseudo file system based, truly hierarchical 
implementation of restricting a set of processes (or threads, it uses 
PIDs) to a certain subset of the machine.
Sadly we cannot leverage this for true guest NUMA memory assignment, but 
it would work nice for pinning (or not) guest vCPUs. I had the following 
structure in mind:
For each guest there is a new CPUSET (mkdir $CPUSET_MNT/`cat 
/proc/$$/cpuset`/kvm_$guestname). One could then assign the guest global 
resources to this CPUSET.
For each vCPU there is a separate CPUSET located under this guest global 
one. This would allow for easy manipulation of the pinning of vCPUs, 
even from the console without any mgt app (although this could be easily 
implemented in libvirt).


/
|
+--/ kvm_guest_01
|  |
|  +-- VCPU0
|  |
|  +-- VCPU1
|
+--/ kvm_guest_02
...

What do you think about it? It is worth implementing this?

Regards,
Andre.


--
Andre Przywara
AMD-Operating System Research Center (OSRC), Dresden, Germany
Tel: +49 351 448-3567-12




Re: [Qemu-devel] Using Linux's CPUSET for KVM VCPUs

2010-07-22 Thread Daniel P. Berrange
On Thu, Jul 22, 2010 at 04:03:13PM +0200, Andre Przywara wrote:
 Hi all,
 
 while working on NUMA host pinning, I experimented with vCPU affinity 
 within QEMU, but left it alone as it would complicate the code and would 
 not achieve better experience than using taskset with the monitor 
 provided thread ids like it is done currently. During that I looked at 
 Linux' CPUSET implementation 
 (/src/linux-2.6/Documentation/cgroups/cpusets.txt).
 In brief, this is a pseudo file system based, truly hierarchical 
 implementation of restricting a set of processes (or threads, it uses 
 PIDs) to a certain subset of the machine.
 Sadly we cannot leverage this for true guest NUMA memory assignment, but 
 it would work nice for pinning (or not) guest vCPUs. 

IIUC the 'cpuset.mems' tunable let you control the NUMA node that
memory allocation will come out of. It isn't as flexible as numactl
policies, since you can't request interleaving, but if you're just 
look to control node locality I think it would do.

   I had the following 
 structure in mind:
 For each guest there is a new CPUSET (mkdir $CPUSET_MNT/`cat 
 /proc/$$/cpuset`/kvm_$guestname). One could then assign the guest global 
 resources to this CPUSET.
 For each vCPU there is a separate CPUSET located under this guest global 
 one. This would allow for easy manipulation of the pinning of vCPUs, 
 even from the console without any mgt app (although this could be easily 
 implemented in libvirt).

FYI, if you have any cgroup controllers mounted, libvirt  will already
automatically create a dedicated sub-group for every guest you run.
The main reason we use cgroups is that it lets us apply controls to a
group of PIDs at once (eg cpu.cpu_shares to all threads within QEMU,
instead of nice(2) on each individual threads). When dealing at the
individual vCPU level there are single PIDs again, libvirt hasn't
needed further cgroup subdivision, just using traditional Linux APIs
instead.

 /
 |
 +--/ kvm_guest_01
 |  |
 |  +-- VCPU0
 |  |
 |  +-- VCPU1
 |
 +--/ kvm_guest_02
 ...
 
 What do you think about it? It is worth implementing this?

Having at least one cgroup per guest has certainly proved valuable for
libvirt's needs. If not using a mgmt API exposing vcpus (and other
internal QEMU threads) via named sub-cgroups could be quite convenient

Regards,
Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|