thread modeling, part 1)

David Gibson Thu, 23 Apr 2015 00:36:07 -0700

On Tue, Apr 07, 2015 at 02:43:43PM +0200, Christian Borntraeger wrote:
> We had a call and I was asked to write a summary about our conclusion.
> 
> The more I wrote, there more I became uncertain if we really came to a 
> conclusion and became more certain that we want to define the QMP/HMP/CLI
> interfaces first (or quite early in the process)
> 
> As discussed I will provide an initial document as a discussion starter
> 
> So here is my current understanding with each piece of information on one 
> line, so 
> that everybody can correct me or make additions:
> 
> current wrap-up of architecture support
> -------------------
> x86
> - Topology possible
>    - can be hierarchical
>    - interfaces to query topology
> - SMT: fanout in host, guest uses host threads to back guest vCPUS
> - supports cpu hotplug via cpu_add
> 
> power
> - Topology possible
>    - interfaces to query topology?


For power, topology information is communicated via the
"ibm,associativity" (and related) properties in the device tree.  This
is can encode heirarchical topologies, but it is *not* bound to the
socket/core/thread heirarchy.  On the guest side in Power there's no
real notion of "socket", just cores with specified proximities to
various memory nodes.

> - SMT: Power8: no threads in host and full core passed in due to HW design
>        may change in the future
> 
> s/390
> - Topology possible
>     - can be hierarchical
>     - interfaces to query topology
> - always virtualized via PR/SM LPAR
>     - host topology from LPAR can be heterogenous (e.g. 3 cpus in 1st socket, 
> 4 in 2nd)
> - SMT: fanout in host, guest uses host threads to back guest vCPUS
> 
> 
> Current downsides of CPU definitions/hotplug
> -----------------------------------------------
> - smp, sockets=,cores=,threads= builds only homogeneous topology
> - cpu_add does not tell were to add
> - artificial icc bus construct on x86 for several reasons (link, sysbus not 
> hotpluggable..)

Artificial though it may be, I think having a "cpus" pseudo-bus is not
such a bad idea

> discussions
> -------------------
> - we want to be able to (most important question, IHMO)
>  - hotplug CPUs on power/x86/s390 and maybe others
>  - define topology information
>  - bind the guest topology to the host topology in some way
>     - to host nodes
>     - maybe also for gang scheduling of threads (might face reluctance from
>       the linux scheduler folks)
>     - not really deeply outlined in this call
> - QOM links must be allocated at boot time, but can be set later on
>     - nothing that we want to expose to users
>     - Machine provides QOM links that the device_add hotplug mechanism can 
> use to add
>       new CPUs into preallocated slots. "CPUs" can be groups of cores and/or 
> threads. 
> - hotplug and initial config should use same semantics
> - cpu and memory topology might be somewhat independent
> --> - define nodes
>     - map CPUs to nodes
>     - map memory to nodes
> 
> - hotplug per
>     - socket
>     - core
>     - thread
>     ?
> Now comes the part where I am not sure if we came to a conclusion or not:
> - hotplug/definition per core (but not per thread) seems to handle all cases
>     - core might have multiple threads ( and thus multiple cpustates)
>     - as device statement (or object?)
> - mapping of cpus to nodes or defining the topology not really
>   outlined in this call
> 
> To be defined:
> - QEMU command line for initial setup
> - QEMU hmp/qmp interfaces for dynamic setup

So, I can't say I've entirely got my head around this, but here's my
thoughts so far.

I think the basic problem here is that the fixed socket -> core ->
thread heirarchy is something from x86 land that's become integrated
into qemu's generic code where it doesn't entirely make sense.

Ignoring NUMA topology (I'll come back to that in a moment) qemu
should really only care about two things:

  a) the unit of execution scheduling (a vCPU or "thread")
  b) the unit of plug/unplug

Now, returning to NUMA topology.  What the guest, and therefore qemu,
really needs to know is the relative proximity of each thread to each
block of memory.  That usually forms some sort of node heirarchy,
but it doesn't necessarily correspond to a socket->core->thread
heirarchy you can see in physical units.

On Power, an arbitrary NUMA node heirarchy can be described in the
device tree without reference to "cores" or "sockets", so really qemu
has no business even talking about such units.

IIUC, on x86 the NUMA topology is bound up to the socket->core->thread
heirarchy so it needs to have a notion of those layers, but ideally
that would be specific to the pc machine type.

So, here's what I'd propose:

1) I think we really need some better terminology to refer to the unit
of plug/unplug.  Until someone comes up with something better, I'm
going to use "CPU Module" (CM), to distinguish from the NUMA baggage
of "socket" and also to refer more clearly to the thing that goes into
the socket, rather than the socket itself.

2) A Virtual CPU Module (vCM) need not correspond to a real physical
object.  For machine types which we want to faithfully represent a
specific physical machine, it would.  For generic or pure virtual
machines, the vCMs would be as small as possible.  So for current
Power, they'd be one virtual core, for future power (maybe) or s390 a
single virtual thread.  For x86 I'm not sure what they'd be.

3) I'm thinking we'd have a "cpus" virtual bus represented in QOM,
which would contain the vCMs (also QOM objects).  Their existence
would be generic, though we'd almost certainly use arch and/or machine
specific subtypes.

4) There would be a (generic) way of finding the vCPUS (threads) in a
vCM and the vCM for a specific vCPU.

5) A vCM *might* have internal subdivisions into "cores" or "nodes" or
"chips" or "MCMs" or whatever, but that would be up to the machine
type specific code, and not represented in the QOM heirarchy.

6) Obviously we'd need some backwards compat goo to sort out existing
command line options referring to cores and sockets into the new
representation.  This will need machine type specific hooks - so for
x86 it would need to set up the right vCM subdivisions and make sure
the right NUMA topology info goes into ACPI.  For -machine pseries I'm
thinking that "-smp sockets=2,cores=1,threads=4" and "-smp
sockets=1,cores=2,threads=4" should result in exactly the same thing
internally.


Thoughts?


-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

pgpVVKS0PaotK.pgp
Description: PGP signature

Re: [Qemu-devel] cpu modelling and hotplug (was: [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling, part 1)

Reply via email to