On Tue, 17 Mar 2015 17:59:48 +0100 Andreas Färber <afaer...@suse.de> wrote:
> Am 17.03.2015 um 17:42 schrieb Eduardo Habkost: > > On Tue, Mar 17, 2015 at 03:48:38PM +0000, Igor Mammedov wrote: > >> since commit > >> dd0247e0 pc: acpi: mark all possible CPUs as enabled in SRAT > >> Linux kernel actually tries to use CPU to Node mapping from > >> QEMU provided SRAT table instead of discarding it, and that > >> in some cases breaks build_sched_domains() which expects > >> sane mapping where cores/threads belonging to the same socket > >> are on the same NUMA node. > >> > >> With current default round-robin mapping of VCPUs to nodes > >> guest ends-up with cores/threads belonging to the same socket > >> being on different NUMA nodes. > >> > >> For example with following CLI: > >> qemu-kvm -m 4G -smp 5,sockets=1,cores=4,threads=1,maxcpus=8 \ > >> -numa node,nodeid=0 -numa node,nodeid=1 > >> 2.6.32 based kernels will hang on boot due to incorrectly build > >> sched_group-s list in update_sd_lb_stats() > >> so comment in QEMU justifying dumb default mapping: > >> " > >> guest OSes must cope with this anyway, because there are BIOSes > >> out there in real machines which also use this scheme. > >> " > >> isn't really valid. > >> > >> Replacing default mapping withi a manual, where VCPUs belonging to > >> the same socket are on the same NUMA node, fixes issue for > >> guests which can't handle nonsense topology i.e. cnaging CLI to: > >> -numa node,nodeid=0,cpus=0-3 -numa node,nodeid=1,cpus=4-7 > >> > >> So instead of simply scattering VCPUs around nodes, map > >> the same socket VCPUs to the same NUMA node, which is what > >> guest would expect from a sane hardware/BIOS. > >> > >> Signed-off-by: Igor Mammedov <imamm...@redhat.com> > > > > I believe the proposed behavior is much better. But if we are going to > > break compatibility, shouldn't we at least do that before the first -rc > > so we get feedback in case it break existing configurations? > > > > About qemu_cpu_socket_id_from_index(): all qemu-system-* binaries have > > smp_cores and smp_threads available (even if machines ignore it), but > > the default stub can return values that are larger than the number of > > sockets if smp_cores*smp_threads > 1, which would be obviously > > incorrect. Isn't it easier to simply make > > "cpu_index/(smp_cores*smp_sockets)" be the default cpu_index->socket > > mapping function, and allow machine-specific (not arch-specific) > > overrides if necessary? > > Agree that the proposed stub solution is not so nice. Can you propose a > MachineClass based solution instead? sure > > The example I keep bringing up for x86 is that the Galileo boards or > even the Minnow boards don't really have sockets, being a SoC. > > Thanks, > Andreas >