Am 17.03.2015 um 17:42 schrieb Eduardo Habkost: > On Tue, Mar 17, 2015 at 03:48:38PM +0000, Igor Mammedov wrote: >> since commit >> dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT >> Linux kernel actually tries to use CPU to Node mapping from >> QEMU provided SRAT table instead of discarding it, and that >> in some cases breaks build_sched_domains() which expects >> sane mapping where cores/threads belonging to the same socket >> are on the same NUMA node. >> >> With current default round-robin mapping of VCPUs to nodes >> guest ends-up with cores/threads belonging to the same socket >> being on different NUMA nodes. >> >> For example with following CLI: >> qemu-kvm -m 4G -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \ >> -numa node,nodeid=0 -numa node,nodeid=1 >> 2.6.32 based kernels will hang on boot due to incorrectly build >> sched_group-s list in update_sd_lb_stats() >> so comment in QEMU justifying dumb default mapping: >> " >> guest OSes must cope with this anyway, because there are BIOSes >> out there in real machines which also use this scheme. >> " >> isn't really valid. >> >> Replacing default mapping withi a manual, where VCPUs belonging to >> the same socket are on the same NUMA node, fixes issue for >> guests which can't handle nonsense topology i.e. cnaging CLI to: >> -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 >> >> So instead of simply scattering VCPUs around nodes, map >> the same socket VCPUs to the same NUMA node, which is what >> guest would expect from a sane hardware/BIOS. >> >> Signed-off-by: Igor Mammedov <imamm...@redhat.com> > > I believe the proposed behavior is much better. But if we are going to > break compatibility, shouldn't we at least do that before the first -rc > so we get feedback in case it break existing configurations? > > About qemu_cpu_socket_id_from_index(): all qemu-system-* binaries have > smp_cores and smp_threads available (even if machines ignore it), but > the default stub can return values that are larger than the number of > sockets if smp_cores*smp_threads > 1, which would be obviously > incorrect. Isn't it easier to simply make > "cpu_index/(smp_cores*smp_sockets)" be the default cpu_index->socket > mapping function, and allow machine-specific (not arch-specific) > overrides if necessary?
Agree that the proposed stub solution is not so nice. Can you propose a MachineClass based solution instead? The example I keep bringing up for x86 is that the Galileo boards or even the Minnow boards don't really have sockets, being a SoC. Thanks, Andreas -- SUSE Linux GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Felix Imendörffer, Jane Smithard, Jennifer Guild, Dilip Upmanyu, Graham Norton; HRB 21284 (AG Nürnberg)