Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-26 Thread Haozhong Zhang
On 02/26/18 14:59 +0100, Igor Mammedov wrote:
> On Thu, 22 Feb 2018 09:40:00 +0800
> Haozhong Zhang  wrote:
> 
> > On 02/21/18 14:55 +0100, Igor Mammedov wrote:
> > > On Tue, 20 Feb 2018 17:17:58 -0800
> > > Dan Williams  wrote:
> > >   
> > > > On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  
> > > > wrote:  
> > > > > On Sat, 17 Feb 2018 14:31:35 +0800
> > > > > Haozhong Zhang  wrote:
> > > > >
> > > > >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> > > > >> domain of a NVDIMM SPA range must match with corresponding entry in
> > > > >> SRAT table.
> > > > >>
> > > > >> The address ranges of vNVDIMM in QEMU are allocated from the
> > > > >> hot-pluggable address space, which is entirely covered by one SRAT
> > > > >> memory affinity structure. However, users can set the vNVDIMM
> > > > >> proximity domain in NFIT SPA range structure by the 'node' property 
> > > > >> of
> > > > >> '-device nvdimm' to a value different than the one in the above SRAT
> > > > >> memory affinity structure.
> > > > >>
> > > > >> In order to solve such proximity domain mismatch, this patch build 
> > > > >> one
> > > > >> SRAT memory affinity structure for each NVDIMM device with the
> > > > >> proximity domain used in NFIT. The remaining hot-pluggable address
> > > > >> space is covered by one or multiple SRAT memory affinity structures
> > > > >> with the proximity domain of the last node as before.
> > > > >>
> > > > >> Signed-off-by: Haozhong Zhang 
> > > > > If we consider hotpluggable system, correctly implemented OS should
> > > > > be able pull proximity from Device::_PXM and override any value from 
> > > > > SRAT.
> > > > > Do we really have a problem here (anything that breaks if we would 
> > > > > use _PXM)?
> > > > > Maybe we should add _PXM object to nvdimm device nodes instead of 
> > > > > massaging SRAT?
> > > > 
> > > > Unfortunately _PXM is an awkward fit. Currently the proximity domain
> > > > is attached to the SPA range structure. The SPA range may be
> > > > associated with multiple DIMM devices and those individual NVDIMMs may
> > > > have conflicting _PXM properties.  
> > > There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> > > should override in runtime any proximity specified by parent scope.
> > > (as parent scope I'd also count boot time NFIT/SRAT tables).
> > > 
> > > To make it more clear we could clear valid proximity domain flag in SPA
> > > like this:
> > > 
> > > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > > index 59d6e42..131bca5 100644
> > > --- a/hw/acpi/nvdimm.c
> > > +++ b/hw/acpi/nvdimm.c
> > > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, 
> > > DeviceState *dev)
> > >   */
> > >  nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> > > management during hot add/online
> > > -   operation */ |
> > > -  2 /* Data in Proximity Domain field is
> > > -   valid*/);
> > > +   operation */);
> > >  
> > >  /* NUMA node. */
> > >  nfit_spa->proximity_domain = cpu_to_le32(node);
> > >   
> > > > Even if that was unified across
> > > > DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> > > > device's control interface, or the assembled persistent memory SPA
> > > > range.  
> > > I'm not sure what you mean under 'device's control interface',
> > > could you clarify where the ambiguity comes from?
> > > 
> > > I read spec as: _PXM applies to address range covered by NVDIMM
> > > device it belongs to.
> > > 
> > > As for assembled SPA, I'd assume that it applies to interleaved set
> > > and all NVDIMMs with it should be on the same node. It's somewhat
> > > irrelevant question though as QEMU so far implements only
> > >   1:1:1/SPA:Region Mapping:NVDIMM Device/
> > > mapping.
> > > 
> > > My main concern with using static configuration tables for proximity
> > > mapping, we'd miss on hotplug side of equation. However if we start
> > > from dynamic side first, we could later complement it with static
> > > tables if there really were need for it.  
> > 
> > This patch affects only the static tables and static-plugged NVDIMM.
> > For hot-plugged NVDIMMs, guest OSPM still needs to evaluate _FIT to
> > get the information of the new NVDIMMs including their proximity
> > domains.
> > 
> > One intention of this patch is to simulate the bare metal as much as
> > possible. I have been using this patch to develop and test NVDIMM
> > enabling work on Xen, and think it might be useful for developers of
> > other OS and hypervisors.
> It's simpler on bare metal as systems usually statically partitioned
> according to capacity slots are able to handle.
> 
> The patch is 

Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-26 Thread Igor Mammedov
On Wed, 21 Feb 2018 06:51:11 -0800
Dan Williams  wrote:

> On Wed, Feb 21, 2018 at 5:55 AM, Igor Mammedov  wrote:
> > On Tue, 20 Feb 2018 17:17:58 -0800
> > Dan Williams  wrote:
> >  
> >> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  
> >> wrote:  
> >> > On Sat, 17 Feb 2018 14:31:35 +0800
> >> > Haozhong Zhang  wrote:
> >> >  
> >> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> >> >> domain of a NVDIMM SPA range must match with corresponding entry in
> >> >> SRAT table.
> >> >>
> >> >> The address ranges of vNVDIMM in QEMU are allocated from the
> >> >> hot-pluggable address space, which is entirely covered by one SRAT
> >> >> memory affinity structure. However, users can set the vNVDIMM
> >> >> proximity domain in NFIT SPA range structure by the 'node' property of
> >> >> '-device nvdimm' to a value different than the one in the above SRAT
> >> >> memory affinity structure.
> >> >>
> >> >> In order to solve such proximity domain mismatch, this patch build one
> >> >> SRAT memory affinity structure for each NVDIMM device with the
> >> >> proximity domain used in NFIT. The remaining hot-pluggable address
> >> >> space is covered by one or multiple SRAT memory affinity structures
> >> >> with the proximity domain of the last node as before.
> >> >>
> >> >> Signed-off-by: Haozhong Zhang   
> >> > If we consider hotpluggable system, correctly implemented OS should
> >> > be able pull proximity from Device::_PXM and override any value from 
> >> > SRAT.
> >> > Do we really have a problem here (anything that breaks if we would use 
> >> > _PXM)?
> >> > Maybe we should add _PXM object to nvdimm device nodes instead of 
> >> > massaging SRAT?  
> >>
> >> Unfortunately _PXM is an awkward fit. Currently the proximity domain
> >> is attached to the SPA range structure. The SPA range may be
> >> associated with multiple DIMM devices and those individual NVDIMMs may
> >> have conflicting _PXM properties.  
> > There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> > should override in runtime any proximity specified by parent scope.
> > (as parent scope I'd also count boot time NFIT/SRAT tables).
> >
> > To make it more clear we could clear valid proximity domain flag in SPA
> > like this:
> >
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 59d6e42..131bca5 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, 
> > DeviceState *dev)
> >   */
> >  nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> > management during hot add/online
> > -   operation */ |
> > -  2 /* Data in Proximity Domain field is
> > -   valid*/);
> > +   operation */);
> >
> >  /* NUMA node. */
> >  nfit_spa->proximity_domain = cpu_to_le32(node);
> >  
> >> Even if that was unified across
> >> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> >> device's control interface, or the assembled persistent memory SPA
> >> range.  
> > I'm not sure what you mean under 'device's control interface',
> > could you clarify where the ambiguity comes from?  
> 
> There are multiple SPA range types. In addition to the typical
> Persistent Memory SPA range there are also Control Region SPA ranges
> for MMIO registers on the DIMM for Block Apertures and other purposes.
> 
> >
> > I read spec as: _PXM applies to address range covered by NVDIMM
> > device it belongs to.  
> 
> No, an NVDIMM may contribute to multiple SPA ranges and those ranges
> may span sockets.
Isn't NVDIMM device plugged into a single socket which belongs to
a single numa node?
If it's so then shouldn't SPAs referencing it also have the same
proximity domain?


> > As for assembled SPA, I'd assume that it applies to interleaved set
> > and all NVDIMMs with it should be on the same node. It's somewhat
> > irrelevant question though as QEMU so far implements only
> >   1:1:1/SPA:Region Mapping:NVDIMM Device/
> > mapping.
> >
> > My main concern with using static configuration tables for proximity
> > mapping, we'd miss on hotplug side of equation. However if we start
> > from dynamic side first, we could later complement it with static
> > tables if there really were need for it.  
> 
> Especially when you consider the new HMAT table that wants to have
> proximity domains for describing performance characteristics of an
> address range relative to an initiator, the _PXM method on an
> individual NVDIMM device is a poor fit for describing a wider set.
> 




Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-26 Thread Igor Mammedov
On Thu, 22 Feb 2018 09:40:00 +0800
Haozhong Zhang  wrote:

> On 02/21/18 14:55 +0100, Igor Mammedov wrote:
> > On Tue, 20 Feb 2018 17:17:58 -0800
> > Dan Williams  wrote:
> >   
> > > On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  
> > > wrote:  
> > > > On Sat, 17 Feb 2018 14:31:35 +0800
> > > > Haozhong Zhang  wrote:
> > > >
> > > >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> > > >> domain of a NVDIMM SPA range must match with corresponding entry in
> > > >> SRAT table.
> > > >>
> > > >> The address ranges of vNVDIMM in QEMU are allocated from the
> > > >> hot-pluggable address space, which is entirely covered by one SRAT
> > > >> memory affinity structure. However, users can set the vNVDIMM
> > > >> proximity domain in NFIT SPA range structure by the 'node' property of
> > > >> '-device nvdimm' to a value different than the one in the above SRAT
> > > >> memory affinity structure.
> > > >>
> > > >> In order to solve such proximity domain mismatch, this patch build one
> > > >> SRAT memory affinity structure for each NVDIMM device with the
> > > >> proximity domain used in NFIT. The remaining hot-pluggable address
> > > >> space is covered by one or multiple SRAT memory affinity structures
> > > >> with the proximity domain of the last node as before.
> > > >>
> > > >> Signed-off-by: Haozhong Zhang 
> > > > If we consider hotpluggable system, correctly implemented OS should
> > > > be able pull proximity from Device::_PXM and override any value from 
> > > > SRAT.
> > > > Do we really have a problem here (anything that breaks if we would use 
> > > > _PXM)?
> > > > Maybe we should add _PXM object to nvdimm device nodes instead of 
> > > > massaging SRAT?
> > > 
> > > Unfortunately _PXM is an awkward fit. Currently the proximity domain
> > > is attached to the SPA range structure. The SPA range may be
> > > associated with multiple DIMM devices and those individual NVDIMMs may
> > > have conflicting _PXM properties.  
> > There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> > should override in runtime any proximity specified by parent scope.
> > (as parent scope I'd also count boot time NFIT/SRAT tables).
> > 
> > To make it more clear we could clear valid proximity domain flag in SPA
> > like this:
> > 
> > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> > index 59d6e42..131bca5 100644
> > --- a/hw/acpi/nvdimm.c
> > +++ b/hw/acpi/nvdimm.c
> > @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, 
> > DeviceState *dev)
> >   */
> >  nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> > management during hot add/online
> > -   operation */ |
> > -  2 /* Data in Proximity Domain field is
> > -   valid*/);
> > +   operation */);
> >  
> >  /* NUMA node. */
> >  nfit_spa->proximity_domain = cpu_to_le32(node);
> >   
> > > Even if that was unified across
> > > DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> > > device's control interface, or the assembled persistent memory SPA
> > > range.  
> > I'm not sure what you mean under 'device's control interface',
> > could you clarify where the ambiguity comes from?
> > 
> > I read spec as: _PXM applies to address range covered by NVDIMM
> > device it belongs to.
> > 
> > As for assembled SPA, I'd assume that it applies to interleaved set
> > and all NVDIMMs with it should be on the same node. It's somewhat
> > irrelevant question though as QEMU so far implements only
> >   1:1:1/SPA:Region Mapping:NVDIMM Device/
> > mapping.
> > 
> > My main concern with using static configuration tables for proximity
> > mapping, we'd miss on hotplug side of equation. However if we start
> > from dynamic side first, we could later complement it with static
> > tables if there really were need for it.  
> 
> This patch affects only the static tables and static-plugged NVDIMM.
> For hot-plugged NVDIMMs, guest OSPM still needs to evaluate _FIT to
> get the information of the new NVDIMMs including their proximity
> domains.
> 
> One intention of this patch is to simulate the bare metal as much as
> possible. I have been using this patch to develop and test NVDIMM
> enabling work on Xen, and think it might be useful for developers of
> other OS and hypervisors.
It's simpler on bare metal as systems usually statically partitioned
according to capacity slots are able to handle.

The patch is technically correct and might be useful,
especially in current case case flag /* Data in Proximity Domain field is 
valid*/
set, to conform to the spec. So just complement the patch with
test case as requested and it should be fine to merge.

PS:
while adding ranges for 

Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-23 Thread Haozhong Zhang
Hi Fam,

On 02/23/18 17:17 -0800, no-re...@patchew.org wrote:
> Hi,
> 
> This series failed build test on s390x host. Please find the details below.
> 
> N/A. Internal error while reading log file

What does this message mean? Where can I get the log file?

Thanks,
Haozhong



Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-23 Thread no-reply
Hi,

This series failed build test on s390x host. Please find the details below.

N/A. Internal error while reading log file



---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-de...@freelists.org

Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-21 Thread Haozhong Zhang
On 02/21/18 14:55 +0100, Igor Mammedov wrote:
> On Tue, 20 Feb 2018 17:17:58 -0800
> Dan Williams  wrote:
> 
> > On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  wrote:
> > > On Sat, 17 Feb 2018 14:31:35 +0800
> > > Haozhong Zhang  wrote:
> > >  
> > >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> > >> domain of a NVDIMM SPA range must match with corresponding entry in
> > >> SRAT table.
> > >>
> > >> The address ranges of vNVDIMM in QEMU are allocated from the
> > >> hot-pluggable address space, which is entirely covered by one SRAT
> > >> memory affinity structure. However, users can set the vNVDIMM
> > >> proximity domain in NFIT SPA range structure by the 'node' property of
> > >> '-device nvdimm' to a value different than the one in the above SRAT
> > >> memory affinity structure.
> > >>
> > >> In order to solve such proximity domain mismatch, this patch build one
> > >> SRAT memory affinity structure for each NVDIMM device with the
> > >> proximity domain used in NFIT. The remaining hot-pluggable address
> > >> space is covered by one or multiple SRAT memory affinity structures
> > >> with the proximity domain of the last node as before.
> > >>
> > >> Signed-off-by: Haozhong Zhang   
> > > If we consider hotpluggable system, correctly implemented OS should
> > > be able pull proximity from Device::_PXM and override any value from SRAT.
> > > Do we really have a problem here (anything that breaks if we would use 
> > > _PXM)?
> > > Maybe we should add _PXM object to nvdimm device nodes instead of 
> > > massaging SRAT?  
> > 
> > Unfortunately _PXM is an awkward fit. Currently the proximity domain
> > is attached to the SPA range structure. The SPA range may be
> > associated with multiple DIMM devices and those individual NVDIMMs may
> > have conflicting _PXM properties.
> There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> should override in runtime any proximity specified by parent scope.
> (as parent scope I'd also count boot time NFIT/SRAT tables).
> 
> To make it more clear we could clear valid proximity domain flag in SPA
> like this:
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index 59d6e42..131bca5 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, 
> DeviceState *dev)
>   */
>  nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> management during hot add/online
> -   operation */ |
> -  2 /* Data in Proximity Domain field is
> -   valid*/);
> +   operation */);
>  
>  /* NUMA node. */
>  nfit_spa->proximity_domain = cpu_to_le32(node);
> 
> > Even if that was unified across
> > DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> > device's control interface, or the assembled persistent memory SPA
> > range.
> I'm not sure what you mean under 'device's control interface',
> could you clarify where the ambiguity comes from?
> 
> I read spec as: _PXM applies to address range covered by NVDIMM
> device it belongs to.
> 
> As for assembled SPA, I'd assume that it applies to interleaved set
> and all NVDIMMs with it should be on the same node. It's somewhat
> irrelevant question though as QEMU so far implements only
>   1:1:1/SPA:Region Mapping:NVDIMM Device/
> mapping.
> 
> My main concern with using static configuration tables for proximity
> mapping, we'd miss on hotplug side of equation. However if we start
> from dynamic side first, we could later complement it with static
> tables if there really were need for it.

This patch affects only the static tables and static-plugged NVDIMM.
For hot-plugged NVDIMMs, guest OSPM still needs to evaluate _FIT to
get the information of the new NVDIMMs including their proximity
domains.

One intention of this patch is to simulate the bare metal as much as
possible. I have been using this patch to develop and test NVDIMM
enabling work on Xen, and think it might be useful for developers of
other OS and hypervisors.


Haozhong



Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-21 Thread Dan Williams
On Wed, Feb 21, 2018 at 5:55 AM, Igor Mammedov  wrote:
> On Tue, 20 Feb 2018 17:17:58 -0800
> Dan Williams  wrote:
>
>> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  wrote:
>> > On Sat, 17 Feb 2018 14:31:35 +0800
>> > Haozhong Zhang  wrote:
>> >
>> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
>> >> domain of a NVDIMM SPA range must match with corresponding entry in
>> >> SRAT table.
>> >>
>> >> The address ranges of vNVDIMM in QEMU are allocated from the
>> >> hot-pluggable address space, which is entirely covered by one SRAT
>> >> memory affinity structure. However, users can set the vNVDIMM
>> >> proximity domain in NFIT SPA range structure by the 'node' property of
>> >> '-device nvdimm' to a value different than the one in the above SRAT
>> >> memory affinity structure.
>> >>
>> >> In order to solve such proximity domain mismatch, this patch build one
>> >> SRAT memory affinity structure for each NVDIMM device with the
>> >> proximity domain used in NFIT. The remaining hot-pluggable address
>> >> space is covered by one or multiple SRAT memory affinity structures
>> >> with the proximity domain of the last node as before.
>> >>
>> >> Signed-off-by: Haozhong Zhang 
>> > If we consider hotpluggable system, correctly implemented OS should
>> > be able pull proximity from Device::_PXM and override any value from SRAT.
>> > Do we really have a problem here (anything that breaks if we would use 
>> > _PXM)?
>> > Maybe we should add _PXM object to nvdimm device nodes instead of 
>> > massaging SRAT?
>>
>> Unfortunately _PXM is an awkward fit. Currently the proximity domain
>> is attached to the SPA range structure. The SPA range may be
>> associated with multiple DIMM devices and those individual NVDIMMs may
>> have conflicting _PXM properties.
> There shouldn't be any conflict here as  NVDIMM device's _PXM method,
> should override in runtime any proximity specified by parent scope.
> (as parent scope I'd also count boot time NFIT/SRAT tables).
>
> To make it more clear we could clear valid proximity domain flag in SPA
> like this:
>
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index 59d6e42..131bca5 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, 
> DeviceState *dev)
>   */
>  nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
> management during hot add/online
> -   operation */ |
> -  2 /* Data in Proximity Domain field is
> -   valid*/);
> +   operation */);
>
>  /* NUMA node. */
>  nfit_spa->proximity_domain = cpu_to_le32(node);
>
>> Even if that was unified across
>> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
>> device's control interface, or the assembled persistent memory SPA
>> range.
> I'm not sure what you mean under 'device's control interface',
> could you clarify where the ambiguity comes from?

There are multiple SPA range types. In addition to the typical
Persistent Memory SPA range there are also Control Region SPA ranges
for MMIO registers on the DIMM for Block Apertures and other purposes.

>
> I read spec as: _PXM applies to address range covered by NVDIMM
> device it belongs to.

No, an NVDIMM may contribute to multiple SPA ranges and those ranges
may span sockets.

>
> As for assembled SPA, I'd assume that it applies to interleaved set
> and all NVDIMMs with it should be on the same node. It's somewhat
> irrelevant question though as QEMU so far implements only
>   1:1:1/SPA:Region Mapping:NVDIMM Device/
> mapping.
>
> My main concern with using static configuration tables for proximity
> mapping, we'd miss on hotplug side of equation. However if we start
> from dynamic side first, we could later complement it with static
> tables if there really were need for it.

Especially when you consider the new HMAT table that wants to have
proximity domains for describing performance characteristics of an
address range relative to an initiator, the _PXM method on an
individual NVDIMM device is a poor fit for describing a wider set.



Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-21 Thread Igor Mammedov
On Tue, 20 Feb 2018 17:17:58 -0800
Dan Williams  wrote:

> On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  wrote:
> > On Sat, 17 Feb 2018 14:31:35 +0800
> > Haozhong Zhang  wrote:
> >  
> >> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> >> domain of a NVDIMM SPA range must match with corresponding entry in
> >> SRAT table.
> >>
> >> The address ranges of vNVDIMM in QEMU are allocated from the
> >> hot-pluggable address space, which is entirely covered by one SRAT
> >> memory affinity structure. However, users can set the vNVDIMM
> >> proximity domain in NFIT SPA range structure by the 'node' property of
> >> '-device nvdimm' to a value different than the one in the above SRAT
> >> memory affinity structure.
> >>
> >> In order to solve such proximity domain mismatch, this patch build one
> >> SRAT memory affinity structure for each NVDIMM device with the
> >> proximity domain used in NFIT. The remaining hot-pluggable address
> >> space is covered by one or multiple SRAT memory affinity structures
> >> with the proximity domain of the last node as before.
> >>
> >> Signed-off-by: Haozhong Zhang   
> > If we consider hotpluggable system, correctly implemented OS should
> > be able pull proximity from Device::_PXM and override any value from SRAT.
> > Do we really have a problem here (anything that breaks if we would use 
> > _PXM)?
> > Maybe we should add _PXM object to nvdimm device nodes instead of massaging 
> > SRAT?  
> 
> Unfortunately _PXM is an awkward fit. Currently the proximity domain
> is attached to the SPA range structure. The SPA range may be
> associated with multiple DIMM devices and those individual NVDIMMs may
> have conflicting _PXM properties.
There shouldn't be any conflict here as  NVDIMM device's _PXM method,
should override in runtime any proximity specified by parent scope.
(as parent scope I'd also count boot time NFIT/SRAT tables).

To make it more clear we could clear valid proximity domain flag in SPA
like this:

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 59d6e42..131bca5 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -260,9 +260,7 @@ nvdimm_build_structure_spa(GArray *structures, DeviceState 
*dev)
  */
 nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
management during hot add/online
-   operation */ |
-  2 /* Data in Proximity Domain field is
-   valid*/);
+   operation */);
 
 /* NUMA node. */
 nfit_spa->proximity_domain = cpu_to_le32(node);

> Even if that was unified across
> DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
> device's control interface, or the assembled persistent memory SPA
> range.
I'm not sure what you mean under 'device's control interface',
could you clarify where the ambiguity comes from?

I read spec as: _PXM applies to address range covered by NVDIMM
device it belongs to.

As for assembled SPA, I'd assume that it applies to interleaved set
and all NVDIMMs with it should be on the same node. It's somewhat
irrelevant question though as QEMU so far implements only
  1:1:1/SPA:Region Mapping:NVDIMM Device/
mapping.

My main concern with using static configuration tables for proximity
mapping, we'd miss on hotplug side of equation. However if we start
from dynamic side first, we could later complement it with static
tables if there really were need for it.



Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-20 Thread Dan Williams
On Tue, Feb 20, 2018 at 6:10 AM, Igor Mammedov  wrote:
> On Sat, 17 Feb 2018 14:31:35 +0800
> Haozhong Zhang  wrote:
>
>> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
>> domain of a NVDIMM SPA range must match with corresponding entry in
>> SRAT table.
>>
>> The address ranges of vNVDIMM in QEMU are allocated from the
>> hot-pluggable address space, which is entirely covered by one SRAT
>> memory affinity structure. However, users can set the vNVDIMM
>> proximity domain in NFIT SPA range structure by the 'node' property of
>> '-device nvdimm' to a value different than the one in the above SRAT
>> memory affinity structure.
>>
>> In order to solve such proximity domain mismatch, this patch build one
>> SRAT memory affinity structure for each NVDIMM device with the
>> proximity domain used in NFIT. The remaining hot-pluggable address
>> space is covered by one or multiple SRAT memory affinity structures
>> with the proximity domain of the last node as before.
>>
>> Signed-off-by: Haozhong Zhang 
> If we consider hotpluggable system, correctly implemented OS should
> be able pull proximity from Device::_PXM and override any value from SRAT.
> Do we really have a problem here (anything that breaks if we would use _PXM)?
> Maybe we should add _PXM object to nvdimm device nodes instead of massaging 
> SRAT?

Unfortunately _PXM is an awkward fit. Currently the proximity domain
is attached to the SPA range structure. The SPA range may be
associated with multiple DIMM devices and those individual NVDIMMs may
have conflicting _PXM properties. Even if that was unified across
DIMMs it is ambiguous whether a DIMM-device _PXM would relate to the
device's control interface, or the assembled persistent memory SPA
range.



Re: [Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-20 Thread Igor Mammedov
On Sat, 17 Feb 2018 14:31:35 +0800
Haozhong Zhang  wrote:

> ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
> domain of a NVDIMM SPA range must match with corresponding entry in
> SRAT table.
> 
> The address ranges of vNVDIMM in QEMU are allocated from the
> hot-pluggable address space, which is entirely covered by one SRAT
> memory affinity structure. However, users can set the vNVDIMM
> proximity domain in NFIT SPA range structure by the 'node' property of
> '-device nvdimm' to a value different than the one in the above SRAT
> memory affinity structure.
> 
> In order to solve such proximity domain mismatch, this patch build one
> SRAT memory affinity structure for each NVDIMM device with the
> proximity domain used in NFIT. The remaining hot-pluggable address
> space is covered by one or multiple SRAT memory affinity structures
> with the proximity domain of the last node as before.
> 
> Signed-off-by: Haozhong Zhang 
If we consider hotpluggable system, correctly implemented OS should
be able pull proximity from Device::_PXM and override any value from SRAT.
Do we really have a problem here (anything that breaks if we would use _PXM)?
Maybe we should add _PXM object to nvdimm device nodes instead of massaging 
SRAT?

Beside of above policy decision, patch looks good.
If we decide that it's a way to go (it shouldn't hurt,
patch just adds more code to maintain), I'd like to see
tests added to tests/numa-test.c along with it to ensure
that it works as expected.

> ---
>  hw/acpi/nvdimm.c| 15 +--
>  hw/i386/acpi-build.c| 47 +++
>  include/hw/mem/nvdimm.h | 11 +++
>  3 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index 59d6e4254c..dff0818e77 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -33,12 +33,23 @@
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/mem/nvdimm.h"
>  
> +static gint nvdimm_addr_sort(gconstpointer a, gconstpointer b)
> +{
> +uint64_t addr0 = object_property_get_uint(OBJECT(NVDIMM(a)),
> +  PC_DIMM_ADDR_PROP, NULL);
> +uint64_t addr1 = object_property_get_uint(OBJECT(NVDIMM(b)),
> +  PC_DIMM_ADDR_PROP, NULL);
> +
> +return addr0 < addr1 ? -1 :
> +   addr0 > addr1 ?  1 : 0;
> +}
> +
>  static int nvdimm_device_list(Object *obj, void *opaque)
>  {
>  GSList **list = opaque;
>  
>  if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
> -*list = g_slist_append(*list, DEVICE(obj));
> +*list = g_slist_insert_sorted(*list, DEVICE(obj), nvdimm_addr_sort);
>  }
>  
>  object_child_foreach(obj, nvdimm_device_list, opaque);
> @@ -52,7 +63,7 @@ static int nvdimm_device_list(Object *obj, void *opaque)
>   * Note: it is the caller's responsibility to free the list to avoid
>   * memory leak.
>   */
> -static GSList *nvdimm_get_device_list(void)
> +GSList *nvdimm_get_device_list(void)
>  {
>  GSList *list = NULL;
>  
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index deb440f286..637ac3a8f0 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2323,6 +2323,46 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, 
> GArray *tcpalog)
>  #define HOLE_640K_START  (640 * 1024)
>  #define HOLE_640K_END   (1024 * 1024)
>  
> +static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
> +   uint64_t len, int default_node)
> +{
> +GSList *nvdimms = nvdimm_get_device_list();
> +GSList *ent = nvdimms;
> +NVDIMMDevice *dev;
> +uint64_t end = base + len, addr, size;
> +int node;
> +AcpiSratMemoryAffinity *numamem;
> +
> +while (base < end) {
> +numamem = acpi_data_push(table_data, sizeof *numamem);
> +
> +if (!ent) {
> +build_srat_memory(numamem, base, end - base, default_node,
> +  MEM_AFFINITY_HOTPLUGGABLE | 
> MEM_AFFINITY_ENABLED);
> +break;
> +}
> +
> +dev = NVDIMM(ent->data);
> +addr = object_property_get_uint(OBJECT(dev), PC_DIMM_ADDR_PROP, 
> NULL);
> +size = object_property_get_uint(OBJECT(dev), PC_DIMM_SIZE_PROP, 
> NULL);
> +node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, 
> NULL);
> +
> +if (base < addr) {
> +build_srat_memory(numamem, base, addr - base, default_node,
> +  MEM_AFFINITY_HOTPLUGGABLE | 
> MEM_AFFINITY_ENABLED);
> +numamem = acpi_data_push(table_data, sizeof *numamem);
> +}
> +build_srat_memory(numamem, addr, size, node,
> +  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED |
> +  MEM_AFFINITY_NON_VOLATILE);
> +
> +base = addr + size;
> +ent = ent->next;
> +}
> 

[Qemu-devel] [PATCH] hw/acpi-build: build SRAT memory affinity structures for NVDIMM

2018-02-16 Thread Haozhong Zhang
ACPI 6.2A Table 5-129 "SPA Range Structure" requires the proximity
domain of a NVDIMM SPA range must match with corresponding entry in
SRAT table.

The address ranges of vNVDIMM in QEMU are allocated from the
hot-pluggable address space, which is entirely covered by one SRAT
memory affinity structure. However, users can set the vNVDIMM
proximity domain in NFIT SPA range structure by the 'node' property of
'-device nvdimm' to a value different than the one in the above SRAT
memory affinity structure.

In order to solve such proximity domain mismatch, this patch build one
SRAT memory affinity structure for each NVDIMM device with the
proximity domain used in NFIT. The remaining hot-pluggable address
space is covered by one or multiple SRAT memory affinity structures
with the proximity domain of the last node as before.

Signed-off-by: Haozhong Zhang 
---
 hw/acpi/nvdimm.c| 15 +--
 hw/i386/acpi-build.c| 47 +++
 include/hw/mem/nvdimm.h | 11 +++
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 59d6e4254c..dff0818e77 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -33,12 +33,23 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 
+static gint nvdimm_addr_sort(gconstpointer a, gconstpointer b)
+{
+uint64_t addr0 = object_property_get_uint(OBJECT(NVDIMM(a)),
+  PC_DIMM_ADDR_PROP, NULL);
+uint64_t addr1 = object_property_get_uint(OBJECT(NVDIMM(b)),
+  PC_DIMM_ADDR_PROP, NULL);
+
+return addr0 < addr1 ? -1 :
+   addr0 > addr1 ?  1 : 0;
+}
+
 static int nvdimm_device_list(Object *obj, void *opaque)
 {
 GSList **list = opaque;
 
 if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
-*list = g_slist_append(*list, DEVICE(obj));
+*list = g_slist_insert_sorted(*list, DEVICE(obj), nvdimm_addr_sort);
 }
 
 object_child_foreach(obj, nvdimm_device_list, opaque);
@@ -52,7 +63,7 @@ static int nvdimm_device_list(Object *obj, void *opaque)
  * Note: it is the caller's responsibility to free the list to avoid
  * memory leak.
  */
-static GSList *nvdimm_get_device_list(void)
+GSList *nvdimm_get_device_list(void)
 {
 GSList *list = NULL;
 
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index deb440f286..637ac3a8f0 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2323,6 +2323,46 @@ build_tpm2(GArray *table_data, BIOSLinker *linker, 
GArray *tcpalog)
 #define HOLE_640K_START  (640 * 1024)
 #define HOLE_640K_END   (1024 * 1024)
 
+static void build_srat_hotpluggable_memory(GArray *table_data, uint64_t base,
+   uint64_t len, int default_node)
+{
+GSList *nvdimms = nvdimm_get_device_list();
+GSList *ent = nvdimms;
+NVDIMMDevice *dev;
+uint64_t end = base + len, addr, size;
+int node;
+AcpiSratMemoryAffinity *numamem;
+
+while (base < end) {
+numamem = acpi_data_push(table_data, sizeof *numamem);
+
+if (!ent) {
+build_srat_memory(numamem, base, end - base, default_node,
+  MEM_AFFINITY_HOTPLUGGABLE | 
MEM_AFFINITY_ENABLED);
+break;
+}
+
+dev = NVDIMM(ent->data);
+addr = object_property_get_uint(OBJECT(dev), PC_DIMM_ADDR_PROP, NULL);
+size = object_property_get_uint(OBJECT(dev), PC_DIMM_SIZE_PROP, NULL);
+node = object_property_get_uint(OBJECT(dev), PC_DIMM_NODE_PROP, NULL);
+
+if (base < addr) {
+build_srat_memory(numamem, base, addr - base, default_node,
+  MEM_AFFINITY_HOTPLUGGABLE | 
MEM_AFFINITY_ENABLED);
+numamem = acpi_data_push(table_data, sizeof *numamem);
+}
+build_srat_memory(numamem, addr, size, node,
+  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED |
+  MEM_AFFINITY_NON_VOLATILE);
+
+base = addr + size;
+ent = ent->next;
+}
+
+g_slist_free(nvdimms);
+}
+
 static void
 build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)
 {
@@ -2434,10 +2474,9 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
  * providing _PXM method if necessary.
  */
 if (hotplugabble_address_space_size) {
-numamem = acpi_data_push(table_data, sizeof *numamem);
-build_srat_memory(numamem, pcms->hotplug_memory.base,
-  hotplugabble_address_space_size, pcms->numa_nodes - 
1,
-  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+build_srat_hotpluggable_memory(table_data, pcms->hotplug_memory.base,
+   hotplugabble_address_space_size,
+   pcms->numa_nodes - 1);
 }
 
 build_header(linker,