On Mon, 27 Apr 2020 19:28:09 -0600 Vishal Verma <vishal.l.ve...@intel.com> wrote:
> NVDIMMs can belong to their own proximity domains, as described by the > NFIT. In such cases, the SRAT needs to have Memory Affinity structures > in the SRAT for these NVDIMMs, otherwise Linux doesn't populate node > data structures properly during NUMA initialization. See the following > for an example failure case. > > https://lore.kernel.org/linux-nvdimm/20200416225438.15208-1-vishal.l.ve...@intel.com/ > > Fix this by adding device address range and node information from > NVDIMMs to the SRAT in build_srat(). > > The relevant command line options to exercise this are below. Nodes 0-1 > contain CPUs and regular memory, and nodes 2-3 are the NVDIMM address > space. > > -numa node,nodeid=0,mem=2048M, > -numa node,nodeid=1,mem=2048M, > -numa node,nodeid=2,mem=0, > -object > memory-backend-file,id=nvmem0,share,mem-path=nvdimm-0,size=16384M,align=128M > -device nvdimm,memdev=nvmem0,id=nv0,label-size=2M,node=2 > -numa node,nodeid=3,mem=0, > -object > memory-backend-file,id=nvmem1,share,mem-path=nvdimm-1,size=16384M,align=128M > -device nvdimm,memdev=nvmem1,id=nv1,label-size=2M,node=3 > > Cc: Jingqi Liu <jingqi....@intel.com> > Cc: Michael S. Tsirkin <m...@redhat.com> > Signed-off-by: Vishal Verma <vishal.l.ve...@intel.com> > --- > hw/i386/acpi-build.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > index 23c77eeb95..b0da67de0e 100644 > --- a/hw/i386/acpi-build.c > +++ b/hw/i386/acpi-build.c > @@ -48,6 +48,7 @@ > #include "migration/vmstate.h" > #include "hw/mem/memory-device.h" > #include "hw/mem/nvdimm.h" > +#include "qemu/nvdimm-utils.h" > #include "sysemu/numa.h" > #include "sysemu/reset.h" > > @@ -2429,6 +2430,25 @@ build_srat(GArray *table_data, BIOSLinker *linker, > MachineState *machine) > MEM_AFFINITY_ENABLED); > } > } > + > + if (machine->nvdimms_state->is_enabled) { > + GSList *device_list = nvdimm_get_device_list(); > + > + for (; device_list; device_list = device_list->next) { > + DeviceState *dev = device_list->data; > + int node = object_property_get_int(OBJECT(dev), > PC_DIMM_NODE_PROP, > + NULL); > + uint64_t addr = object_property_get_uint(OBJECT(dev), > + PC_DIMM_ADDR_PROP, > NULL); > + uint64_t size = object_property_get_uint(OBJECT(dev), > + PC_DIMM_SIZE_PROP, > NULL); > + suggest to use error_abort in getters > + numamem = acpi_data_push(table_data, sizeof *numamem); > + build_srat_memory(numamem, addr, size, node, > + MEM_AFFINITY_ENABLED | > MEM_AFFINITY_NON_VOLATILE); > + } who is in charge of freeing device_list ? > + } There is ARM version of build_srat(), I suggest to put this NVDIMM specific part in helper function within hw/acpi/nvdimm.c and use it from both build_srat() functions. > + > slots = (table_data->len - numa_start) / sizeof *numamem; > for (; slots < pcms->numa_nodes + 2; slots++) { > numamem = acpi_data_push(table_data, sizeof *numamem);