On 02/21/18 14:55 +0100, Igor Mammedov wrote: > On Tue, 20 Feb 2018 17:17:58 -0800 > Dan Williams <dan.j.willi...@intel.com> wrote: > > > On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov <imamm...@redhat.com> wrote: > > > On Sat, 17 Feb 2018 14:31:35 +0800 > > > Haozhong Zhang <haozhong.zh...@intel.com> wrote: > > > > > >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity > > >> domain of a NVDIMM SPA range must match with corresponding entry in > > >> SRAT table. > > >> > > >> The address ranges of vNVDIMM in QEMU are allocated from the > > >> hot-pluggable address space, which is entirely covered by one SRAT > > >> memory affinity structure. However, users can set the vNVDIMM > > >> proximity domain in NFIT SPA range structure by the 'node' property of > > >> '-device nvdimm' to a value different than the one in the above SRAT > > >> memory affinity structure. > > >> > > >> In order to solve such proximity domain mismatch, this patch build one > > >> SRAT memory affinity structure for each NVDIMM device with the > > >> proximity domain used in NFIT. The remaining hot-pluggable address > > >> space is covered by one or multiple SRAT memory affinity structures > > >> with the proximity domain of the last node as before. > > >> > > >> Signed-off-by: Haozhong Zhang <haozhong.zh...@intel.com> > > > If we consider hotpluggable system, correctly implemented OS should > > > be able pull proximity from Device::_PXM and override any value from SRAT. > > > Do we really have a problem here (anything that breaks if we would use > > > _PXM)? > > > Maybe we should add _PXM object to nvdimm device nodes instead of > > > massaging SRAT? > > > > Unfortunately _PXM is an awkward fit. Currently the proximity domain > > is attached to the SPA range structure. The SPA range may be > > associated with multiple DIMM devices and those individual NVDIMMs may > > have conflicting _PXM properties. > There shouldn't be any conflict here as NVDIMM device's _PXM method, > should override in runtime any proximity specified by parent scope. > (as parent scope I'd also count boot time NFIT/SRAT tables). > > To make it more clear we could clear valid proximity domain flag in SPA > like this: > > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c > index 59d6e42..131bca5 100644 > --- a/hw/acpi/nvdimm.c > +++ b/hw/acpi/nvdimm.c > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, > DeviceState *dev) > */ > nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for > management during hot add/online > - operation */ | > - 2 /* Data in Proximity Domain field is > - valid*/); > + operation */); > > /* NUMA node. */ > nfit_spa->proximity_domain = cpu_to_le32(node); > > > Even if that was unified across > > DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the > > device's control interface, or the assembled persistent memory SPA > > range. > I'm not sure what you mean under 'device's control interface', > could you clarify where the ambiguity comes from? > > I read spec as: _PXM applies to address range covered by NVDIMM > device it belongs to. > > As for assembled SPA, I'd assume that it applies to interleaved set > and all NVDIMMs with it should be on the same node. It's somewhat > irrelevant question though as QEMU so far implements only > 1:1:1/SPA:Region Mapping:NVDIMM Device/ > mapping. > > My main concern with using static configuration tables for proximity > mapping, we'd miss on hotplug side of equation. However if we start > from dynamic side first, we could later complement it with static > tables if there really were need for it.
This patch affects only the static tables and static-plugged NVDIMM. For hot-plugged NVDIMMs, guest OSPM still needs to evaluate _FIT to get the information of the new NVDIMMs including their proximity domains. One intention of this patch is to simulate the bare metal as much as possible. I have been using this patch to develop and test NVDIMM enabling work on Xen, and think it might be useful for developers of other OS and hypervisors. Haozhong