On Tue, Apr 07, 2015 at 02:43:43PM +0200, Christian Borntraeger wrote: > We had a call and I was asked to write a summary about our conclusion. > > The more I wrote, there more I became uncertain if we really came to a > conclusion and became more certain that we want to define the QMP/HMP/CLI > interfaces first (or quite early in the process) > > As discussed I will provide an initial document as a discussion starter > > So here is my current understanding with each piece of information on one > line, so > that everybody can correct me or make additions: > > current wrap-up of architecture support > ------------------- > x86 > - Topology possible > - can be hierarchical > - interfaces to query topology > - SMT: fanout in host, guest uses host threads to back guest vCPUS > - supports cpu hotplug via cpu_add > > power > - Topology possible > - interfaces to query topology?
For power, topology information is communicated via the "ibm,associativity" (and related) properties in the device tree. This is can encode heirarchical topologies, but it is *not* bound to the socket/core/thread heirarchy. On the guest side in Power there's no real notion of "socket", just cores with specified proximities to various memory nodes. > - SMT: Power8: no threads in host and full core passed in due to HW design > may change in the future > > s/390 > - Topology possible > - can be hierarchical > - interfaces to query topology > - always virtualized via PR/SM LPAR > - host topology from LPAR can be heterogenous (e.g. 3 cpus in 1st socket, > 4 in 2nd) > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > > Current downsides of CPU definitions/hotplug > ----------------------------------------------- > - smp, sockets=,cores=,threads= builds only homogeneous topology > - cpu_add does not tell were to add > - artificial icc bus construct on x86 for several reasons (link, sysbus not > hotpluggable..) Artificial though it may be, I think having a "cpus" pseudo-bus is not such a bad idea > discussions > ------------------- > - we want to be able to (most important question, IHMO) > - hotplug CPUs on power/x86/s390 and maybe others > - define topology information > - bind the guest topology to the host topology in some way > - to host nodes > - maybe also for gang scheduling of threads (might face reluctance from > the linux scheduler folks) > - not really deeply outlined in this call > - QOM links must be allocated at boot time, but can be set later on > - nothing that we want to expose to users > - Machine provides QOM links that the device_add hotplug mechanism can > use to add > new CPUs into preallocated slots. "CPUs" can be groups of cores and/or > threads. > - hotplug and initial config should use same semantics > - cpu and memory topology might be somewhat independent > --> - define nodes > - map CPUs to nodes > - map memory to nodes > > - hotplug per > - socket > - core > - thread > ? > Now comes the part where I am not sure if we came to a conclusion or not: > - hotplug/definition per core (but not per thread) seems to handle all cases > - core might have multiple threads ( and thus multiple cpustates) > - as device statement (or object?) > - mapping of cpus to nodes or defining the topology not really > outlined in this call > > To be defined: > - QEMU command line for initial setup > - QEMU hmp/qmp interfaces for dynamic setup So, I can't say I've entirely got my head around this, but here's my thoughts so far. I think the basic problem here is that the fixed socket -> core -> thread heirarchy is something from x86 land that's become integrated into qemu's generic code where it doesn't entirely make sense. Ignoring NUMA topology (I'll come back to that in a moment) qemu should really only care about two things: a) the unit of execution scheduling (a vCPU or "thread") b) the unit of plug/unplug Now, returning to NUMA topology. What the guest, and therefore qemu, really needs to know is the relative proximity of each thread to each block of memory. That usually forms some sort of node heirarchy, but it doesn't necessarily correspond to a socket->core->thread heirarchy you can see in physical units. On Power, an arbitrary NUMA node heirarchy can be described in the device tree without reference to "cores" or "sockets", so really qemu has no business even talking about such units. IIUC, on x86 the NUMA topology is bound up to the socket->core->thread heirarchy so it needs to have a notion of those layers, but ideally that would be specific to the pc machine type. So, here's what I'd propose: 1) I think we really need some better terminology to refer to the unit of plug/unplug. Until someone comes up with something better, I'm going to use "CPU Module" (CM), to distinguish from the NUMA baggage of "socket" and also to refer more clearly to the thing that goes into the socket, rather than the socket itself. 2) A Virtual CPU Module (vCM) need not correspond to a real physical object. For machine types which we want to faithfully represent a specific physical machine, it would. For generic or pure virtual machines, the vCMs would be as small as possible. So for current Power, they'd be one virtual core, for future power (maybe) or s390 a single virtual thread. For x86 I'm not sure what they'd be. 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, which would contain the vCMs (also QOM objects). Their existence would be generic, though we'd almost certainly use arch and/or machine specific subtypes. 4) There would be a (generic) way of finding the vCPUS (threads) in a vCM and the vCM for a specific vCPU. 5) A vCM *might* have internal subdivisions into "cores" or "nodes" or "chips" or "MCMs" or whatever, but that would be up to the machine type specific code, and not represented in the QOM heirarchy. 6) Obviously we'd need some backwards compat goo to sort out existing command line options referring to cores and sockets into the new representation. This will need machine type specific hooks - so for x86 it would need to set up the right vCM subdivisions and make sure the right NUMA topology info goes into ACPI. For -machine pseries I'm thinking that "-smp sockets=2,cores=1,threads=4" and "-smp sockets=1,cores=2,threads=4" should result in exactly the same thing internally. Thoughts? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
pgpVVKS0PaotK.pgp
Description: PGP signature