On 27.05.2014 [14:59:03 -0700], Nishanth Aravamudan wrote: > On 20.05.2014 [12:44:15 +1000], Alexey Kardashevskiy wrote: > > On 05/20/2014 10:06 AM, Nishanth Aravamudan wrote: > > > On 19.05.2014 [15:37:52 -0700], Nishanth Aravamudan wrote: > > >> Hi Alexey, > > >> > > >> I've been looking at hw/ppc/spapr.c::spapr_populate_memory() and ran > > >> into a few questions: > > >> > > >> 1) The values from 1 to nb_numa_nodes are used as indices into the > > >> node_mem array, but that is not populated, necessarily, linearly. > > >> vl.c::add_node() uses the nodeid parameter as the index into node_mem, > > >> if it is specified. > > >> > > >> 2) The node ID is based upon the index into the array, but it seems like > > >> it should actually be based upon the nodeid specified, if any. That is, > > >> we set the value at index 4 (which is statically the reference point in > > >> 'ibm,associativity-reference-points') of 'ibm,associativty' for each > > >> 'ibm,memory@....' node to the index we are currently at. But as > > >> mentioned in 1) above that index isn't necessarily currently the nodeid > > >> specified on the command-line. > > >> > > >> What this all means, is that if I specify something like: > > >> > > >> -numa node,nodeid=1,cpus=0-7,mem=2048 -numa > > >> node,nodeid=5,cpus=8-15,mem=0 -numa node,nodeid=9,mem=2048 > > >> > > >> Linux sees: > > >> > > >> numactl --hardware > > >> available: 3 nodes (0-2) > > >> node 0 cpus: 8 9 10 11 12 13 14 15 > > >> node 0 size: 0 MB > > >> node 0 free: 0 MB > > >> node 1 cpus: 0 1 2 3 4 5 6 7 > > >> node 1 size: 2024 MB > > >> node 1 free: 1560 MB > > >> node 2 cpus: > > >> node 2 size: 0 MB > > >> node 2 free: 0 MB > > >> > > >> Maybe we don't really care about this, but I just noticed it when trying > > >> to reproduce some really weird topologies from PowerVM. > > > > > > Upon further investigation into node_mem, it seems like this assumption > > > is present throughout the qemu code, e.g, the qemu monitor 'info numa' > > > command. Will just document it for myself as a weird way to make > > > memoryless nodes show up :) > > > > I never looked closely at this NUMA business so I know as much as you do :) > > You seem to be right, vl.c seems to get things right (it uses nodeid as an > > index) but spapr.c is broken and we probably should fix it but it does not > > sound very urgent to me... > > Well, and looking at it more, it feels like perhaps that none of the > qemu code is particularly careful about this -- and since you can > explicitly assign 0 memory to a node, you can't simply check for 0 in > node_mem for an unassigned node (and node_mem is an unsigned array). > > I'll look at the behavior on x86 and get back to you.
Well, it looks like ppc is no worse off than x86 here -- passing a similar command-line to qemu-system-x86_64, I get the same result in the VM (nodes numbered starting at 0, etc). Perhaps it makes sense to not allow non-sequential NUMA node ordering, since it isn't really supported anyways? I'm not entirely sure I see why it'd be necessary for a guest in any case. Thanks, Nish