Igor, > -----Original Message----- > From: Igor Mammedov <imamm...@redhat.com> > Sent: Thursday, June 25, 2020 10:19 AM > To: Moger, Babu <babu.mo...@amd.com> > Cc: ehabk...@redhat.com; m...@redhat.com; qemu-devel@nongnu.org; > pbonz...@redhat.com; r...@twiddle.net > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and CPU > hotplug > > On Wed, 24 Jun 2020 12:35:59 -0500 > Babu Moger <babu.mo...@amd.com> wrote: > > > > -----Original Message----- > > > From: Igor Mammedov <imamm...@redhat.com> > > > Sent: Wednesday, June 24, 2020 8:48 AM > > > To: Moger, Babu <babu.mo...@amd.com> > > > Cc: ehabk...@redhat.com; m...@redhat.com; qemu-devel@nongnu.org; > > > pbonz...@redhat.com; r...@twiddle.net > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device and > CPU > > > hotplug > > > > > > On Tue, 16 Jun 2020 12:18:56 -0500 > > > Babu Moger <babu.mo...@amd.com> wrote: > > > > > > > > -----Original Message----- > > > > > From: Igor Mammedov <imamm...@redhat.com> > > > > > Sent: Tuesday, June 16, 2020 5:59 AM > > > > > To: Moger, Babu <babu.mo...@amd.com> > > > > > Cc: pbonz...@redhat.com; r...@twiddle.net; ehabk...@redhat.com; > > > > > m...@redhat.com; marcel.apfelb...@gmail.com; qemu- > de...@nongnu.org > > > > > Subject: Re: [PATCH 1/2] hw/386: Fix uninitialized memory with -device > and > > > CPU > > > > > hotplug > > > > > > > > > > On Mon, 08 Jun 2020 15:18:50 -0500 > > > > > Babu Moger <babu.mo...@amd.com> wrote: > > > > > > > > > > > Noticed the following command failure while testing CPU hotplug. > > > > > > > > > > > > $ qemu-system-x86_64 -machine q35,accel=kvm -smp 1,maxcpus=2, > > > > > > cores=1, threads=1,sockets=2 -cpu EPYC -device EPYC-x86_64- > > > > > > cpu,core-id=0,socket-id=1,thread-id=0 > > > > > > > > > > > > qemu-system-x86_64: -device EPYC-x86_64-cpu,core-id=0,socket- > id=1, > > > > > > thread-id=0: Invalid CPU [socket: 21855, die: 0, core: 0, thread: > > > > > > 0] > > > > > > with APIC ID 21855, valid index range 0:1 > > > > > > > > > > > > This happens because APIC ID is calculated using uninitialized > > > > > > memory. > > > > > > This is happening after the addition of new field node_id in > > > X86CPUTopoIDs > > > > > > structure. The node_id field is uninitialized while calling > > > > > > apicid_from_topo_ids. The problem is discussed in the thread below. > > > > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker > > > > > nel.org%2Fqemu- > > > > > > > > > devel%2F20200602171838.GG577771%40habkost.net%2F&data=02%7C01 > > > > > > > > > %7Cbabu.moger%40amd.com%7C02200d75fd8b48d1955608d811e44f5b%7C3d > > > > > > > > > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637279019564311233& > > > > > > > > > ;sdata=ry3QO0Z5dxLPoRxkYVkOsVm3nl%2BxfCGv8be%2BMHdoUPY%3D&r > > > > > eserved=0 > > > > > > > > > > > > Fix the problem by initializing the node_id properly. > > > > > > > > > > > > Signed-off-by: Babu Moger <babu.mo...@amd.com> > > > > > > --- > > > > > > hw/i386/pc.c | 2 ++ > > > > > > include/hw/i386/topology.h | 11 +++++++++++ > > > > > > 2 files changed, 13 insertions(+) > > > > > > > > > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > > > > > index 2128f3d6fe..974cc30891 100644 > > > > > > --- a/hw/i386/pc.c > > > > > > +++ b/hw/i386/pc.c > > > > > > @@ -1585,6 +1585,8 @@ static void pc_cpu_pre_plug(HotplugHandler > > > > > *hotplug_dev, > > > > > > topo_ids.die_id = cpu->die_id; > > > > > > topo_ids.core_id = cpu->core_id; > > > > > > topo_ids.smt_id = cpu->thread_id; > > > > > > + topo_ids.node_id = cpu_x86_use_epyc_apic_id_encoding(ms- > > > >cpu_type) > > > > > ? > > > > > > + x86_node_id_for_epyc(&topo_info, > > > > > > &topo_ids) : 0; > > > > > > > > > > I'd rather not calculate some default value here, > > > > > this is the branch where we check user provided topology info and > > > > > error > out > > > > > asking > > > > > to provide missing bits. > > > > Noticed that cpu->node_id is initialized to > 0xFF(NUMA_NODE_UNASSIGNED). > > > > We can initialize cpu->node_id to default node like how we do it in > > > > x86_get_default_cpu_node_id. We can use it to initialize > topo_ids.node_id. > > > > This is consistent with other fields core_id, die_id etc.. > > > > > > > > > > > > > > I also wonder if we should force user to specify numa nodes on CLI if > EPYC > > > cpu is > > > > > used. > > > > > (i.e. I'm assuming that EPYC always requires numa) > > > > > > > > That is not true. Without numa all the cpus will be configured under one > > > > default numa node 0. Like we do it using x86_get_default_cpu_node_id. > > > > > > get_default_cpu_node_id() which is making things up, is going to be > removed > > > eventually in favor of asking user to provide numa mapping explicitly on > > > CLI. > > > > That will be good going forward. > > > > > > > > now if it's always only node 0, why do we need to calculate it then, > > > why not just assing 0 directly? > > > > > > what if we have several sockets, would all vCPUs still have node-id = 0? > > > > If there are several nodes then socket id becomes node id. > I wonder if node id == socket id then why bother with node_id at all, > probably node id is there to allow for design where several sockets are on > the same node > > > > > Another question is if we have CPUs with only 1 numa node set on all of > them, > > > does it require all other machinery like ACPI SRAT table defined or it's > > > just > > > internal CPU impl. detail? > > > > I am not very sure about it. To me it looks like it is just internal cpu > > implementation detail. > I'd think it might confuse guest OS, when it decodes more than 1 numa node > for APIC ID/CPUID but then there are no such nodes described in ACPI. > While it might work for caches, it would miss any relation of memory mapping > to nodes or get it wrong if one doesn't match another. > > > > I think we have two options here. > > > > 1. Update the patch to initialize the node_id the way it is done > > get_default_cpu_node_id. > > if it were only one node for every CPU (incl. multisocket), I'd go with > enabling > autonuma assigning all CPUs to default 0 node-id, since there is no ambiguity > where > CPUs and RAM are mapped to. > Is it possible to use node-id=0 for all EPYC CPUs even in multisocket config? > (it seems spec allows only one node per socket, but doesn't say that node ids > must > be different.) > If not, then making up node-id is not an option. > > > 2. Ask the user to pass the node_id while hot plugging the device. This is > > a clean solution. Will require some data structure changes. > > Here is my brain dump of current very non obvious flow: > > 1. x86_possible_cpu_arch_ids() > ms->possible_cpus->cpus[i].props.* = x86_topo_ids_from_idx() > > 2. possible numa_cpu_set() > ms->possible_cpus->cpus[i].props.node_id = user input|0 -in autonuma > case > > 3. x86_cpus_init() > // generate apic_id AND makeup node_id embedded into it > ms->possible_cpus->cpus[i].arch_id = > x86_cpu_apic_id_from_index(x86ms, > i); > -> x86_apicid_from_cpu_idx_epyc() -> > x86_topo_ids_from_idx_epyc() > same as > x86_topo_ids_from_idx() + node_id > or > -> x86_apicid_from_cpu_idx() -> > x86_topo_ids_from_idx() > > 4. pc_cpu_pre_plug() > // basically topo ids module node-id is not set or user provided > cpu_slot = pc_find_cpu_slot(MACHINE(pcms), cpu->apic_id, &idx); > > 5. > // do it again with diff that in EPYC case it my have different > node-id than > cpu_slot > x86ms->topo_ids_from_apicid(cpu->apic_id, &topo_info, &topo_ids); > > //i.e. user input of node-id is ignored > set socket-id/core-id/... (but not node-id) from topo_info > > numa_cpu_pre_plug(cpu_slot) > ^^^^^^ > if (node_id == CPU_UNSET_NUMA_NODE_ID) { > if (slot->props.has_node_id) > object_property_set_int(... slot->props.node_id, > "node-id",...); > // this applies to hotplugged without node-id and to initial > CPUs (-smp X) > // so we may end up with "node-id" being set to user defined > value > // or left unset (no numa enabled) > // while APIC ID will have some node-id encoded in it. > > > that's quite a mess, maybe we should unify both > amd make x86_apicid_from_cpu_idx_epyc()/x86_apicid_from_cpu_idx() use > ms->possible_cpus->cpus[i].props instead of x86_topo_ids_from_idx() > i.e > > x86_apicid_from_cpu_idx_epyc() { > topoids = x86_apicid_from_cpu_idx() { > return ms->possible_cpus->cpus[i].props > } > if (ms->possible_cpus->cpus[i].props.has_node_id) > topoids.node_id = ms->possible_cpus->cpus[i].props.node_id > else > error_fatal("EPYC requires use of -numa to define topology if > using > more than 1 socket") > } > > that way QEMU makes up only node[0] by enabling autonuma or whatever > user privided explicitly is encoded into APIC ID and it will be always > consistent > with cpu > *-id properties in possible_cpus and SRAT table QEMU generates. > > as cleanup we can get rid of back and forth conversion [5] and use cpu_slot to > set > the same ids. > > Also maybe we should have a check that node-id is the same within socket in > case of EPYC > if it's guarantied that EPYC won't support multiple nodes per socket. > > hope it makes at least some sense.
To make things clear, in case of autonuma we don't have to worry about node_id. We just have to set it topo_ids.node_id to 0 in pc_cpu_pre_plug, Everything will work as expected. This will solve our current problem of uninitialized variable. Problem here is, when user has configured the numa, then setting the topo_ids.node_id to 0 might not work because it might create duplicate apicids and device_add will be rejected. As per the comments in numa_cpu_pre_plug, this is already broken. Look at the comments below. Looks like node_id cannot be passed down. ============================================ if (node_id == CPU_UNSET_NUMA_NODE_ID) { /* due to bug in libvirt, it doesn't pass node-id from props on * device_add as expected, so we have to fix it up here */ if (slot->props.has_node_id) { object_property_set_int(OBJECT(dev), slot->props.node_id, "node-id", errp); } } else if (node_id != slot->props.node_id) { ============================================ I was trying to solve this problem setting the node_id correctly for EPYC at least. If you think, this is not important we can ignore it (by setting topo_ids.node_id to 0) and move forward. I don't see the need for changing other topology specific code as we have already made very generic. > > > > Let me know if you see any other option. > > > > > > > > > > > > > > > > > > > cpu->apic_id = x86ms->apicid_from_topo_ids(&topo_info, > > > &topo_ids); > > > > > > } > > > > > > > > > > > > diff --git a/include/hw/i386/topology.h b/include/hw/i386/topology.h > > > > > > index 07239f95f4..ee4deb84c4 100644 > > > > > > --- a/include/hw/i386/topology.h > > > > > > +++ b/include/hw/i386/topology.h > > > > > > @@ -140,6 +140,17 @@ static inline unsigned > > > > > apicid_pkg_offset_epyc(X86CPUTopoInfo *topo_info) > > > > > > apicid_node_width_epyc(topo_info); > > > > > > } > > > > > > > > > > > > +static inline unsigned x86_node_id_for_epyc(X86CPUTopoInfo > > > *topo_info, > > > > > > + const X86CPUTopoIDs > > > > > > *topo_ids) > > > > > > +{ > > > > > > + unsigned nr_nodes = MAX(topo_info->nodes_per_pkg, 1); > > > > > > + unsigned cores_per_node = DIV_ROUND_UP((topo_info- > >dies_per_pkg > > > * > > > > > > + > > > > > > topo_info->cores_per_die * > > > > > > + > > > > > > topo_info->threads_per_core), > > > > > > + nr_nodes); > > > > > > + > > > > > > + return (topo_ids->core_id / cores_per_node) % nr_nodes; > > > > > what if nr_nodes == 0? > > > > > > > > > > > +} > > > > > > /* > > > > > > * Make APIC ID for the CPU based on Pkg_ID, Core_ID, SMT_ID > > > > > > * > > > > > > > > > > > > > > > > > > > > > >