On Thu, Apr 23, 2015 at 05:32:33PM +1000, David Gibson wrote: > On Tue, Apr 07, 2015 at 02:43:43PM +0200, Christian Borntraeger wrote: > > We had a call and I was asked to write a summary about our conclusion. > > > > The more I wrote, there more I became uncertain if we really came to a > > conclusion and became more certain that we want to define the QMP/HMP/CLI > > interfaces first (or quite early in the process) > > > > As discussed I will provide an initial document as a discussion starter > > > > So here is my current understanding with each piece of information on one > > line, so > > that everybody can correct me or make additions: > > > > current wrap-up of architecture support > > ------------------- > > x86 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > - supports cpu hotplug via cpu_add > > > > power > > - Topology possible > > - interfaces to query topology? > > For power, topology information is communicated via the > "ibm,associativity" (and related) properties in the device tree. This > is can encode heirarchical topologies, but it is *not* bound to the > socket/core/thread heirarchy. On the guest side in Power there's no > real notion of "socket", just cores with specified proximities to > various memory nodes. > > > - SMT: Power8: no threads in host and full core passed in due to HW design > > may change in the future > > > > s/390 > > - Topology possible > > - can be hierarchical > > - interfaces to query topology > > - always virtualized via PR/SM LPAR > > - host topology from LPAR can be heterogenous (e.g. 3 cpus in 1st > > socket, 4 in 2nd) > > - SMT: fanout in host, guest uses host threads to back guest vCPUS > > > > > > Current downsides of CPU definitions/hotplug > > ----------------------------------------------- > > - smp, sockets=,cores=,threads= builds only homogeneous topology > > - cpu_add does not tell were to add > > - artificial icc bus construct on x86 for several reasons (link, sysbus not > > hotpluggable..) > > Artificial though it may be, I think having a "cpus" pseudo-bus is not > such a bad idea
That was considered before[1][2]. We have use cases for adding additional information about VCPUs to query-cpus, but we could simply use qom-get for that. The only thing missing is a predictable QOM path for VCPU objects. If we provide something like "/cpus/<cpu>" links on all machines, callers could simply use qom-get to get just the information they need, instead of getting too much information from query-cpus (which also has the side-effect of interrupting all running VCPUs to synchronize register information). Quoting part of your proposal below: > Ignoring NUMA topology (I'll come back to that in a moment) qemu > should really only care about two things: > > a) the unit of execution scheduling (a vCPU or "thread") > b) the unit of plug/unplug > [...] > 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, > which would contain the vCMs (also QOM objects). Their existence > would be generic, though we'd almost certainly use arch and/or machine > specific subtypes. > > 4) There would be a (generic) way of finding the vCPUS (threads) in a > vCM and the vCM for a specific vCPU. > What I propose now is a bit simpler: just a mechanism for enumerating VCPUs/threads (a), that would replace query-cpus. Later we could also have a generic mechanism for (b), if we decide to introduce a generic "CPU module" abstraction for plug/unplug. A more complex mechanism to enumerating vCMs and the vCPUs inside a vCM would be a superset of (a), so in theory we wouldn't need both. But I believe that: 1) we will take some time to define the details of the vCM/plug/unplug abstractions; 2) we already have use cases today[2] that could benefit from a generic QOM path for (a). [1] Message-ID: <20140516151641.gy3...@otherpad.lan.raisama.net> http://article.gmane.org/gmane.comp.emulators.qemu/273463 [2] Message-ID: <20150331131623.gg7...@thinpad.lan.raisama.net> http://article.gmane.org/gmane.comp.emulators.kvm.devel/134625 > > > discussions > > ------------------- > > - we want to be able to (most important question, IHMO) > > - hotplug CPUs on power/x86/s390 and maybe others > > - define topology information > > - bind the guest topology to the host topology in some way > > - to host nodes > > - maybe also for gang scheduling of threads (might face reluctance from > > the linux scheduler folks) > > - not really deeply outlined in this call > > - QOM links must be allocated at boot time, but can be set later on > > - nothing that we want to expose to users > > - Machine provides QOM links that the device_add hotplug mechanism can > > use to add > > new CPUs into preallocated slots. "CPUs" can be groups of cores > > and/or threads. > > - hotplug and initial config should use same semantics > > - cpu and memory topology might be somewhat independent > > --> - define nodes > > - map CPUs to nodes > > - map memory to nodes > > > > - hotplug per > > - socket > > - core > > - thread > > ? > > Now comes the part where I am not sure if we came to a conclusion or not: > > - hotplug/definition per core (but not per thread) seems to handle all cases > > - core might have multiple threads ( and thus multiple cpustates) > > - as device statement (or object?) > > - mapping of cpus to nodes or defining the topology not really > > outlined in this call > > > > To be defined: > > - QEMU command line for initial setup > > - QEMU hmp/qmp interfaces for dynamic setup > > So, I can't say I've entirely got my head around this, but here's my > thoughts so far. > > I think the basic problem here is that the fixed socket -> core -> > thread heirarchy is something from x86 land that's become integrated > into qemu's generic code where it doesn't entirely make sense. > > Ignoring NUMA topology (I'll come back to that in a moment) qemu > should really only care about two things: > > a) the unit of execution scheduling (a vCPU or "thread") > b) the unit of plug/unplug > > Now, returning to NUMA topology. What the guest, and therefore qemu, > really needs to know is the relative proximity of each thread to each > block of memory. That usually forms some sort of node heirarchy, > but it doesn't necessarily correspond to a socket->core->thread > heirarchy you can see in physical units. > > On Power, an arbitrary NUMA node heirarchy can be described in the > device tree without reference to "cores" or "sockets", so really qemu > has no business even talking about such units. > > IIUC, on x86 the NUMA topology is bound up to the socket->core->thread > heirarchy so it needs to have a notion of those layers, but ideally > that would be specific to the pc machine type. > > So, here's what I'd propose: > > 1) I think we really need some better terminology to refer to the unit > of plug/unplug. Until someone comes up with something better, I'm > going to use "CPU Module" (CM), to distinguish from the NUMA baggage > of "socket" and also to refer more clearly to the thing that goes into > the socket, rather than the socket itself. > > 2) A Virtual CPU Module (vCM) need not correspond to a real physical > object. For machine types which we want to faithfully represent a > specific physical machine, it would. For generic or pure virtual > machines, the vCMs would be as small as possible. So for current > Power, they'd be one virtual core, for future power (maybe) or s390 a > single virtual thread. For x86 I'm not sure what they'd be. > > 3) I'm thinking we'd have a "cpus" virtual bus represented in QOM, > which would contain the vCMs (also QOM objects). Their existence > would be generic, though we'd almost certainly use arch and/or machine > specific subtypes. > > 4) There would be a (generic) way of finding the vCPUS (threads) in a > vCM and the vCM for a specific vCPU. > > 5) A vCM *might* have internal subdivisions into "cores" or "nodes" or > "chips" or "MCMs" or whatever, but that would be up to the machine > type specific code, and not represented in the QOM heirarchy. > > 6) Obviously we'd need some backwards compat goo to sort out existing > command line options referring to cores and sockets into the new > representation. This will need machine type specific hooks - so for > x86 it would need to set up the right vCM subdivisions and make sure > the right NUMA topology info goes into ACPI. For -machine pseries I'm > thinking that "-smp sockets=2,cores=1,threads=4" and "-smp > sockets=1,cores=2,threads=4" should result in exactly the same thing > internally. > > > Thoughts? > > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ > _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson -- Eduardo