Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-03-01 Thread Morten Rasmussen
On Sat, Feb 24, 2018 at 11:05:53AM +0800, Xiongfeng Wang wrote:
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>> [4.017371] BUG: arch topology borken
> >>> [4.021069] BUG: arch topology borken
> >>> [4.024764] BUG: arch topology borken
> >>> [4.028460] BUG: arch topology borken
> >>> [4.032153] BUG: arch topology borken
> >>> [4.035849] BUG: arch topology borken
> >>> [4.039543] BUG: arch topology borken
> >>> [4.043239] BUG: arch topology borken
> >>> [4.046932] BUG: arch topology borken
> >>> [4.050629] BUG: arch topology borken
> >>> [4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT 
> >>> physical_package_flag on a physical package node and
> >>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The 
> >>> topology of our board is core->cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current spec
> >> doesn't explicitly ban having the flag set multiple times between the leaf
> >> and the root I consider it a "bug" and there is an effort to clarify the
> >> spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core sched_domain 
> >>> contains all the cores within a package,
> >>> and the lowest NUMA sched_domain contains all the cores within a die. But 
> >>> the kernel requires that the multi-core
> >>> sched_domain should be a subset of the lowest NUMA sched_domain, so the 
> >>> BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain is
> >> detected/located within the PPTT topology (a node with a 1:1 correspondence
> >> isn't even required). As you can see from this patch set, we are making the
> >> general assumption that the proximity domains are at the same level as the
> >> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >> don't align with the physical socket.
> >>
> >> There are efforts underway to clarify and expand upon the specification to
> >> deal with this general problem. The simple solution is another flag (say
> >> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used 
> >> to
> >> find nodes with 1:1 correspondence. At that point we could add a fairly
> >> trivial patch to correct just the scheduler topology without affecting the
> >> rest of the system topology code.
> > 
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is detected
> > (instead of using the default_topology[]) and we create our own ARM64
> > domain hierarchy (with DIE level removed) through set_sched_topology()
> > accordingly ?
> > 
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> > 
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get information
> about how to build a multi-core sched_domain. I think only cache-sharing 
> information
> is not enough to get information about how to build a multi-core shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be 
> multi-core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to 
> the change.

I'm not sure what you mean by upper layers of the lowest layer.

As I see it for non-numa-in-package system, the PPTT physical package
flag should define the MC domains, any levels above should be
represented in the DIE level, any level below should be ignored, except
the lowest level if we have SMT. If have SMT the lowest level in PPTT
should define the SMT domains.

For numa-in-package, the MC domains should be shrunk to match the NUMA
nodes and DIE is ignored.


Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-03-01 Thread Morten Rasmussen
On Sat, Feb 24, 2018 at 11:05:53AM +0800, Xiongfeng Wang wrote:
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>> [4.017371] BUG: arch topology borken
> >>> [4.021069] BUG: arch topology borken
> >>> [4.024764] BUG: arch topology borken
> >>> [4.028460] BUG: arch topology borken
> >>> [4.032153] BUG: arch topology borken
> >>> [4.035849] BUG: arch topology borken
> >>> [4.039543] BUG: arch topology borken
> >>> [4.043239] BUG: arch topology borken
> >>> [4.046932] BUG: arch topology borken
> >>> [4.050629] BUG: arch topology borken
> >>> [4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT 
> >>> physical_package_flag on a physical package node and
> >>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The 
> >>> topology of our board is core->cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current spec
> >> doesn't explicitly ban having the flag set multiple times between the leaf
> >> and the root I consider it a "bug" and there is an effort to clarify the
> >> spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core sched_domain 
> >>> contains all the cores within a package,
> >>> and the lowest NUMA sched_domain contains all the cores within a die. But 
> >>> the kernel requires that the multi-core
> >>> sched_domain should be a subset of the lowest NUMA sched_domain, so the 
> >>> BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain is
> >> detected/located within the PPTT topology (a node with a 1:1 correspondence
> >> isn't even required). As you can see from this patch set, we are making the
> >> general assumption that the proximity domains are at the same level as the
> >> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >> don't align with the physical socket.
> >>
> >> There are efforts underway to clarify and expand upon the specification to
> >> deal with this general problem. The simple solution is another flag (say
> >> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used 
> >> to
> >> find nodes with 1:1 correspondence. At that point we could add a fairly
> >> trivial patch to correct just the scheduler topology without affecting the
> >> rest of the system topology code.
> > 
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is detected
> > (instead of using the default_topology[]) and we create our own ARM64
> > domain hierarchy (with DIE level removed) through set_sched_topology()
> > accordingly ?
> > 
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> > 
> > I can't merge code that breaks a platform with legitimate firmware
> > bindings.
> 
> I think we really need another PPTT flag, from which we can get information
> about how to build a multi-core sched_domain. I think only cache-sharing 
> information
> is not enough to get information about how to build a multi-core shced_domain.
> 
> How about this? We assume the upper layer of the lowest layer to be 
> multi-core layer.
> After that flag is added into ACPI specs, we add another patch to adapt to 
> the change.

I'm not sure what you mean by upper layers of the lowest layer.

As I see it for non-numa-in-package system, the PPTT physical package
flag should define the MC domains, any levels above should be
represented in the DIE level, any level below should be ignored, except
the lowest level if we have SMT. If have SMT the lowest level in PPTT
should define the SMT domains.

For numa-in-package, the MC domains should be shrunk to match the NUMA
nodes and DIE is ignored.


Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-03-01 Thread Morten Rasmussen
On Fri, Feb 23, 2018 at 10:37:33PM -0600, Jeremy Linton wrote:
> On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> >On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >>Hi,
> >>
> >>On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>>Hi Jeremy,
> >>>
> >>>I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>>[4.017371] BUG: arch topology borken
> >>>[4.021069] BUG: arch topology borken
> >>>[4.024764] BUG: arch topology borken
> >>>[4.028460] BUG: arch topology borken
> >>>[4.032153] BUG: arch topology borken
> >>>[4.035849] BUG: arch topology borken
> >>>[4.039543] BUG: arch topology borken
> >>>[4.043239] BUG: arch topology borken
> >>>[4.046932] BUG: arch topology borken
> >>>[4.050629] BUG: arch topology borken
> >>>[4.054322] BUG: arch topology borken
> >>>
> >>>I checked the code and found that the newest UEFI set PPTT 
> >>>physical_package_flag on a physical package node and
> >>>the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
> >>>of our board is core->cluster->die->package).
> >>
> >>I commented about that on the EDK2 mailing list. While the current spec
> >>doesn't explicitly ban having the flag set multiple times between the leaf
> >>and the root I consider it a "bug" and there is an effort to clarify the
> >>spec and the use of that flag.
> >>>
> >>>When the kernel starts to build sched_domain, the multi-core sched_domain 
> >>>contains all the cores within a package,
> >>>and the lowest NUMA sched_domain contains all the cores within a die. But 
> >>>the kernel requires that the multi-core
> >>>sched_domain should be a subset of the lowest NUMA sched_domain, so the 
> >>>BUG info is printed.
> >>
> >>Right. I've mentioned this problem a couple of times.
> >>
> >>At at the moment, the spec isn't clear about how the proximity domain is
> >>detected/located within the PPTT topology (a node with a 1:1 correspondence
> >>isn't even required). As you can see from this patch set, we are making the
> >>general assumption that the proximity domains are at the same level as the
> >>physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >>don't align with the physical socket.
> >>
> >>There are efforts underway to clarify and expand upon the specification to
> >>deal with this general problem. The simple solution is another flag (say
> >>PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> >>find nodes with 1:1 correspondence. At that point we could add a fairly
> >>trivial patch to correct just the scheduler topology without affecting the
> >>rest of the system topology code.
> >
> >I think Morten asked already but isn't this the same end result we end
> >up having if we remove the DIE level if NUMA-within-package is detected
> >(instead of using the default_topology[]) and we create our own ARM64
> >domain hierarchy (with DIE level removed) through set_sched_topology()
> >accordingly ?
> 
> I'm not sure what removing the die level does for you, but its not really
> the problem AFAIK, the problem is because MC layer is larger than the NUMA
> domains.

Do you mean MC domains are larger than NUMA domains because that
reflects the hardware topology, i.e. you have caches shared across NUMA
nodes, or do you mean the problem is that the current code generates too
large MC domains?

If is it the first, then you have to make a choice whether you want
multi-core scheduling or NUMA-scheduling at that level in the topology.
You can't have both. If you don't want NUMA scheduling at that level you
should define your NUMA nodes to be larger than (or equal to?) the MC
domains, or not define NUMA nodes at all. If you do want NUMA
scheduling at that level, we have to ignore any cache sharing between
NUMA nodes and reduce the size of the MC domains accordingly.

We should be able to reduce the size of the MC domains based on in the
information already in the ACPI spec. SRAT defines the NUMA domains, if
the PPTT package level is larger than the NUMA nodes we should claim it
is NUMA in package, drop the DIE level and reduce the size of the MC
domain to equal to the NUMA node size ignoring any PPTT topology
information above the NUMA node level.

AFAICT, x86 doesn't have this problem as they don't use PPTT, and the
last-level cache is always inside the NUMA node, even for
numa-in-package. For numa-in-package they seem to let SRAT define the
NUMA nodes, have a special topology table for the non-NUMA levels only
containing SMT and MC, and guarantee the MC isn't larger than the NUMA
node.

Can't we just follow the same approach with the addition that we have to
resize the MC domains if necessary?

> >Put it differently: do we really need to rely on another PPTT flag to
> >collect this information ?
> 
> Strictly no, and I have a partial patch around here i've been meaning to
> flush out which uses the early node information to detect if there are nodes
> smaller 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-03-01 Thread Morten Rasmussen
On Fri, Feb 23, 2018 at 10:37:33PM -0600, Jeremy Linton wrote:
> On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:
> >On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >>Hi,
> >>
> >>On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>>Hi Jeremy,
> >>>
> >>>I have tested the patch with the newest UEFI. It prints the below error:
> >>>
> >>>[4.017371] BUG: arch topology borken
> >>>[4.021069] BUG: arch topology borken
> >>>[4.024764] BUG: arch topology borken
> >>>[4.028460] BUG: arch topology borken
> >>>[4.032153] BUG: arch topology borken
> >>>[4.035849] BUG: arch topology borken
> >>>[4.039543] BUG: arch topology borken
> >>>[4.043239] BUG: arch topology borken
> >>>[4.046932] BUG: arch topology borken
> >>>[4.050629] BUG: arch topology borken
> >>>[4.054322] BUG: arch topology borken
> >>>
> >>>I checked the code and found that the newest UEFI set PPTT 
> >>>physical_package_flag on a physical package node and
> >>>the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
> >>>of our board is core->cluster->die->package).
> >>
> >>I commented about that on the EDK2 mailing list. While the current spec
> >>doesn't explicitly ban having the flag set multiple times between the leaf
> >>and the root I consider it a "bug" and there is an effort to clarify the
> >>spec and the use of that flag.
> >>>
> >>>When the kernel starts to build sched_domain, the multi-core sched_domain 
> >>>contains all the cores within a package,
> >>>and the lowest NUMA sched_domain contains all the cores within a die. But 
> >>>the kernel requires that the multi-core
> >>>sched_domain should be a subset of the lowest NUMA sched_domain, so the 
> >>>BUG info is printed.
> >>
> >>Right. I've mentioned this problem a couple of times.
> >>
> >>At at the moment, the spec isn't clear about how the proximity domain is
> >>detected/located within the PPTT topology (a node with a 1:1 correspondence
> >>isn't even required). As you can see from this patch set, we are making the
> >>general assumption that the proximity domains are at the same level as the
> >>physical socket. This isn't ideal for NUMA topologies, like the D05, that
> >>don't align with the physical socket.
> >>
> >>There are efforts underway to clarify and expand upon the specification to
> >>deal with this general problem. The simple solution is another flag (say
> >>PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> >>find nodes with 1:1 correspondence. At that point we could add a fairly
> >>trivial patch to correct just the scheduler topology without affecting the
> >>rest of the system topology code.
> >
> >I think Morten asked already but isn't this the same end result we end
> >up having if we remove the DIE level if NUMA-within-package is detected
> >(instead of using the default_topology[]) and we create our own ARM64
> >domain hierarchy (with DIE level removed) through set_sched_topology()
> >accordingly ?
> 
> I'm not sure what removing the die level does for you, but its not really
> the problem AFAIK, the problem is because MC layer is larger than the NUMA
> domains.

Do you mean MC domains are larger than NUMA domains because that
reflects the hardware topology, i.e. you have caches shared across NUMA
nodes, or do you mean the problem is that the current code generates too
large MC domains?

If is it the first, then you have to make a choice whether you want
multi-core scheduling or NUMA-scheduling at that level in the topology.
You can't have both. If you don't want NUMA scheduling at that level you
should define your NUMA nodes to be larger than (or equal to?) the MC
domains, or not define NUMA nodes at all. If you do want NUMA
scheduling at that level, we have to ignore any cache sharing between
NUMA nodes and reduce the size of the MC domains accordingly.

We should be able to reduce the size of the MC domains based on in the
information already in the ACPI spec. SRAT defines the NUMA domains, if
the PPTT package level is larger than the NUMA nodes we should claim it
is NUMA in package, drop the DIE level and reduce the size of the MC
domain to equal to the NUMA node size ignoring any PPTT topology
information above the NUMA node level.

AFAICT, x86 doesn't have this problem as they don't use PPTT, and the
last-level cache is always inside the NUMA node, even for
numa-in-package. For numa-in-package they seem to let SRAT define the
NUMA nodes, have a special topology table for the non-NUMA levels only
containing SMT and MC, and guarantee the MC isn't larger than the NUMA
node.

Can't we just follow the same approach with the addition that we have to
resize the MC domains if necessary?

> >Put it differently: do we really need to rely on another PPTT flag to
> >collect this information ?
> 
> Strictly no, and I have a partial patch around here i've been meaning to
> flush out which uses the early node information to detect if there are nodes
> smaller 

RE: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-24 Thread vkilari
Hi,

> -Original Message-
> From: linux-arm-kernel
[mailto:linux-arm-kernel-boun...@lists.infradead.org]
> On Behalf Of Xiongfeng Wang
> Sent: Saturday, February 24, 2018 8:36 AM
> To: Lorenzo Pieralisi <lorenzo.pieral...@arm.com>; Jeremy Linton
> <jeremy.lin...@arm.com>
> Cc: mark.rutl...@arm.com; jonathan.zh...@cavium.com;
> jayachandran.n...@cavium.com; catalin.mari...@arm.com; Juri Lelli
> <juri.le...@arm.com>; gre...@linuxfoundation.org; jh...@codeaurora.org;
> r...@rjwysocki.net; linux...@vger.kernel.org; will.dea...@arm.com; linux-
> ker...@vger.kernel.org; morten.rasmus...@arm.com; linux-
> a...@vger.kernel.org; viresh.ku...@linaro.org; hanjun@linaro.org;
> sudeep.ho...@arm.com; austi...@codeaurora.org; vkil...@codeaurora.org;
> a...@redhat.com; linux-arm-ker...@lists.infradead.org; l...@kernel.org
> Subject: Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU
> topology
> 
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below
error:
> >>>
> >>> [4.017371] BUG: arch topology borken
> >>> [4.021069] BUG: arch topology borken
> >>> [4.024764] BUG: arch topology borken
> >>> [4.028460] BUG: arch topology borken
> >>> [4.032153] BUG: arch topology borken
> >>> [4.035849] BUG: arch topology borken
> >>> [4.039543] BUG: arch topology borken
> >>> [4.043239] BUG: arch topology borken
> >>> [4.046932] BUG: arch topology borken
> >>> [4.050629] BUG: arch topology borken
> >>> [4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT
> >>> physical_package_flag on a physical package node and the NUMA domain
> (SRAT domains) starts from the layer of DIE. (The topology of our board is
core-
> >cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current
> >> spec doesn't explicitly ban having the flag set multiple times
> >> between the leaf and the root I consider it a "bug" and there is an
> >> effort to clarify the spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core
> >>> sched_domain contains all the cores within a package, and the lowest
> >>> NUMA sched_domain contains all the cores within a die. But the kernel
> requires that the multi-core sched_domain should be a subset of the lowest
> NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain
> >> is detected/located within the PPTT topology (a node with a 1:1
> >> correspondence isn't even required). As you can see from this patch
> >> set, we are making the general assumption that the proximity domains
> >> are at the same level as the physical socket. This isn't ideal for
> >> NUMA topologies, like the D05, that don't align with the physical
socket.
> >>
> >> There are efforts underway to clarify and expand upon the
> >> specification to deal with this general problem. The simple solution
> >> is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05
> >> die) which could be used to find nodes with 1:1 correspondence. At
> >> that point we could add a fairly trivial patch to correct just the
> >> scheduler topology without affecting the rest of the system topology
code.
> >
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is
> > detected (instead of using the default_topology[]) and we create our
> > own ARM64 domain hierarchy (with DIE level removed) through
> > set_sched_topology() accordingly ?

To overcome this, on x86 as well the DIE level is removed when
NUMA-within-package is detected with this patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ar
ch/x86/kernel/smpboot.c?h=v4.16-rc2=8f37961cf22304fb286c7604d3a7f6104dcc1
283

Solving with PPTT would be clean approach instead of overriding
default_topology[]

> >
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this i

RE: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-24 Thread vkilari
Hi,

> -Original Message-
> From: linux-arm-kernel
[mailto:linux-arm-kernel-boun...@lists.infradead.org]
> On Behalf Of Xiongfeng Wang
> Sent: Saturday, February 24, 2018 8:36 AM
> To: Lorenzo Pieralisi ; Jeremy Linton
> 
> Cc: mark.rutl...@arm.com; jonathan.zh...@cavium.com;
> jayachandran.n...@cavium.com; catalin.mari...@arm.com; Juri Lelli
> ; gre...@linuxfoundation.org; jh...@codeaurora.org;
> r...@rjwysocki.net; linux...@vger.kernel.org; will.dea...@arm.com; linux-
> ker...@vger.kernel.org; morten.rasmus...@arm.com; linux-
> a...@vger.kernel.org; viresh.ku...@linaro.org; hanjun@linaro.org;
> sudeep.ho...@arm.com; austi...@codeaurora.org; vkil...@codeaurora.org;
> a...@redhat.com; linux-arm-ker...@lists.infradead.org; l...@kernel.org
> Subject: Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU
> topology
> 
> 
> Hi,
> On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> > On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> >> Hi,
> >>
> >> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >>> Hi Jeremy,
> >>>
> >>> I have tested the patch with the newest UEFI. It prints the below
error:
> >>>
> >>> [4.017371] BUG: arch topology borken
> >>> [4.021069] BUG: arch topology borken
> >>> [4.024764] BUG: arch topology borken
> >>> [4.028460] BUG: arch topology borken
> >>> [4.032153] BUG: arch topology borken
> >>> [4.035849] BUG: arch topology borken
> >>> [4.039543] BUG: arch topology borken
> >>> [4.043239] BUG: arch topology borken
> >>> [4.046932] BUG: arch topology borken
> >>> [4.050629] BUG: arch topology borken
> >>> [4.054322] BUG: arch topology borken
> >>>
> >>> I checked the code and found that the newest UEFI set PPTT
> >>> physical_package_flag on a physical package node and the NUMA domain
> (SRAT domains) starts from the layer of DIE. (The topology of our board is
core-
> >cluster->die->package).
> >>
> >> I commented about that on the EDK2 mailing list. While the current
> >> spec doesn't explicitly ban having the flag set multiple times
> >> between the leaf and the root I consider it a "bug" and there is an
> >> effort to clarify the spec and the use of that flag.
> >>>
> >>> When the kernel starts to build sched_domain, the multi-core
> >>> sched_domain contains all the cores within a package, and the lowest
> >>> NUMA sched_domain contains all the cores within a die. But the kernel
> requires that the multi-core sched_domain should be a subset of the lowest
> NUMA sched_domain, so the BUG info is printed.
> >>
> >> Right. I've mentioned this problem a couple of times.
> >>
> >> At at the moment, the spec isn't clear about how the proximity domain
> >> is detected/located within the PPTT topology (a node with a 1:1
> >> correspondence isn't even required). As you can see from this patch
> >> set, we are making the general assumption that the proximity domains
> >> are at the same level as the physical socket. This isn't ideal for
> >> NUMA topologies, like the D05, that don't align with the physical
socket.
> >>
> >> There are efforts underway to clarify and expand upon the
> >> specification to deal with this general problem. The simple solution
> >> is another flag (say PPTT_PROXIMITY_DOMAIN which would map to the D05
> >> die) which could be used to find nodes with 1:1 correspondence. At
> >> that point we could add a fairly trivial patch to correct just the
> >> scheduler topology without affecting the rest of the system topology
code.
> >
> > I think Morten asked already but isn't this the same end result we end
> > up having if we remove the DIE level if NUMA-within-package is
> > detected (instead of using the default_topology[]) and we create our
> > own ARM64 domain hierarchy (with DIE level removed) through
> > set_sched_topology() accordingly ?

To overcome this, on x86 as well the DIE level is removed when
NUMA-within-package is detected with this patch
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ar
ch/x86/kernel/smpboot.c?h=v4.16-rc2=8f37961cf22304fb286c7604d3a7f6104dcc1
283

Solving with PPTT would be clean approach instead of overriding
default_topology[]

> >
> > Put it differently: do we really need to rely on another PPTT flag to
> > collect this information ?
> >
> > I can't merge code that breaks a platform with legitimate f

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Jeremy Linton

On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board 
is core->cluster->die->package).


I commented about that on the EDK2 mailing list. While the current spec
doesn't explicitly ban having the flag set multiple times between the leaf
and the root I consider it a "bug" and there is an effort to clarify the
spec and the use of that flag.


When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.


Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is
detected/located within the PPTT topology (a node with a 1:1 correspondence
isn't even required). As you can see from this patch set, we are making the
general assumption that the proximity domains are at the same level as the
physical socket. This isn't ideal for NUMA topologies, like the D05, that
don't align with the physical socket.

There are efforts underway to clarify and expand upon the specification to
deal with this general problem. The simple solution is another flag (say
PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
find nodes with 1:1 correspondence. At that point we could add a fairly
trivial patch to correct just the scheduler topology without affecting the
rest of the system topology code.


I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?


I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.




Put it differently: do we really need to rely on another PPTT flag to
collect this information ?


Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.




I can't merge code that breaks a platform with legitimate firmware
bindings.


Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.


Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.


Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.


Thanks,



Thanks,
Lorenzo





If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Jeremy Linton

On 02/23/2018 05:02 AM, Lorenzo Pieralisi wrote:

On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board 
is core->cluster->die->package).


I commented about that on the EDK2 mailing list. While the current spec
doesn't explicitly ban having the flag set multiple times between the leaf
and the root I consider it a "bug" and there is an effort to clarify the
spec and the use of that flag.


When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.


Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is
detected/located within the PPTT topology (a node with a 1:1 correspondence
isn't even required). As you can see from this patch set, we are making the
general assumption that the proximity domains are at the same level as the
physical socket. This isn't ideal for NUMA topologies, like the D05, that
don't align with the physical socket.

There are efforts underway to clarify and expand upon the specification to
deal with this general problem. The simple solution is another flag (say
PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
find nodes with 1:1 correspondence. At that point we could add a fairly
trivial patch to correct just the scheduler topology without affecting the
rest of the system topology code.


I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?


I'm not sure what removing the die level does for you, but its not 
really the problem AFAIK, the problem is because MC layer is larger than 
the NUMA domains.




Put it differently: do we really need to rely on another PPTT flag to
collect this information ?


Strictly no, and I have a partial patch around here i've been meaning to 
flush out which uses the early node information to detect if there are 
nodes smaller than the package. Initially I've been claiming i was going 
to stay away from making scheduler topology changes in this patch set, 
but it seems that at least providing a patch which does the minimal bits 
is in the cards. The PXN flag was is more of a shortcut to finding the 
cache levels at or below the numa domains, rather than any hard 
requirement. Similarly, to the request someone else was making for a 
leaf node flag (or node ordering) to avoid multiple passes in the table. 
That request would simplify the posted code a bit but it works without it.




I can't merge code that breaks a platform with legitimate firmware
bindings.


Breaks in this case is a BUG warning that shows up right before it 
"corrects" a scheduler domain.


Basically, as i've mentioned a few times, this patch set corrects the 
existing topology problems, in doing so it uncovers issues with the way 
we are mapping that topology for the scheduler. That is actually not 
difficult thing to fix, my assumption originally is that we would 
already be at the point of discussion the finer points of the scheduler 
changes but we are still here.


Anyway, I was planning on posting a v7 this week, but time flys... I 
will include a further scheduler tweak to work around the inverted numa 
domain problem in that set early next week.


Thanks,



Thanks,
Lorenzo





If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Xiongfeng Wang

Hi,
On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [4.017371] BUG: arch topology borken
>>> [4.021069] BUG: arch topology borken
>>> [4.024764] BUG: arch topology borken
>>> [4.028460] BUG: arch topology borken
>>> [4.032153] BUG: arch topology borken
>>> [4.035849] BUG: arch topology borken
>>> [4.039543] BUG: arch topology borken
>>> [4.043239] BUG: arch topology borken
>>> [4.046932] BUG: arch topology borken
>>> [4.050629] BUG: arch topology borken
>>> [4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT 
>>> physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
>>> of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain 
>>> contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But 
>>> the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
>>> info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?
> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?
> 
> I can't merge code that breaks a platform with legitimate firmware
> bindings.

I think we really need another PPTT flag, from which we can get information
about how to build a multi-core sched_domain. I think only cache-sharing 
information
is not enough to get information about how to build a multi-core shced_domain.

How about this? We assume the upper layer of the lowest layer to be multi-core 
layer.
After that flag is added into ACPI specs, we add another patch to adapt to the 
change.

Thanks,
Xiongfeng

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of 
>>> package, then all the topology information
>>> within the package will be discarded. I think we need to build the 
>>> multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 
>>> 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I 
>>> think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
 Propagate the topology information from the PPTT tree to the
 cpu_topology array. We can get the thread id, core_id and
 cluster_id by assuming certain levels of the PPTT tree correspond
 to those 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Xiongfeng Wang

Hi,
On 2018/2/23 19:02, Lorenzo Pieralisi wrote:
> On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
>> Hi,
>>
>> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>>> Hi Jeremy,
>>>
>>> I have tested the patch with the newest UEFI. It prints the below error:
>>>
>>> [4.017371] BUG: arch topology borken
>>> [4.021069] BUG: arch topology borken
>>> [4.024764] BUG: arch topology borken
>>> [4.028460] BUG: arch topology borken
>>> [4.032153] BUG: arch topology borken
>>> [4.035849] BUG: arch topology borken
>>> [4.039543] BUG: arch topology borken
>>> [4.043239] BUG: arch topology borken
>>> [4.046932] BUG: arch topology borken
>>> [4.050629] BUG: arch topology borken
>>> [4.054322] BUG: arch topology borken
>>>
>>> I checked the code and found that the newest UEFI set PPTT 
>>> physical_package_flag on a physical package node and
>>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
>>> of our board is core->cluster->die->package).
>>
>> I commented about that on the EDK2 mailing list. While the current spec
>> doesn't explicitly ban having the flag set multiple times between the leaf
>> and the root I consider it a "bug" and there is an effort to clarify the
>> spec and the use of that flag.
>>>
>>> When the kernel starts to build sched_domain, the multi-core sched_domain 
>>> contains all the cores within a package,
>>> and the lowest NUMA sched_domain contains all the cores within a die. But 
>>> the kernel requires that the multi-core
>>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
>>> info is printed.
>>
>> Right. I've mentioned this problem a couple of times.
>>
>> At at the moment, the spec isn't clear about how the proximity domain is
>> detected/located within the PPTT topology (a node with a 1:1 correspondence
>> isn't even required). As you can see from this patch set, we are making the
>> general assumption that the proximity domains are at the same level as the
>> physical socket. This isn't ideal for NUMA topologies, like the D05, that
>> don't align with the physical socket.
>>
>> There are efforts underway to clarify and expand upon the specification to
>> deal with this general problem. The simple solution is another flag (say
>> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
>> find nodes with 1:1 correspondence. At that point we could add a fairly
>> trivial patch to correct just the scheduler topology without affecting the
>> rest of the system topology code.
> 
> I think Morten asked already but isn't this the same end result we end
> up having if we remove the DIE level if NUMA-within-package is detected
> (instead of using the default_topology[]) and we create our own ARM64
> domain hierarchy (with DIE level removed) through set_sched_topology()
> accordingly ?
> 
> Put it differently: do we really need to rely on another PPTT flag to
> collect this information ?
> 
> I can't merge code that breaks a platform with legitimate firmware
> bindings.

I think we really need another PPTT flag, from which we can get information
about how to build a multi-core sched_domain. I think only cache-sharing 
information
is not enough to get information about how to build a multi-core shced_domain.

How about this? We assume the upper layer of the lowest layer to be multi-core 
layer.
After that flag is added into ACPI specs, we add another patch to adapt to the 
change.

Thanks,
Xiongfeng

> 
> Thanks,
> Lorenzo
> 
>>
>>>
>>> If we modify the UEFI to make NUMA sched_domain start from the layer of 
>>> package, then all the topology information
>>> within the package will be discarded. I think we need to build the 
>>> multi-core sched_domain using the cores within
>>> the cluster instead of the cores within the package. I think that's what 
>>> 'multi-core' means. Multi cores form a cluster. I guess.
>>> If we build the multi-core sched_domain using the cores within a cluster, I 
>>> think we need to add fields in struct cpu_topology
>>> to record which cores are in each cluster.
>>
>> The problem is that there isn't a generic way to identify which level of
>> cache sharing is the "correct" top layer MC domain. For one system cluster
>> might be appropriate, for another it might be the highest caching level
>> within a socket, for another is might be a something in between or a group
>> of clusters or LLCs..
>>
>> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
>> SRAT domain. With that, each SOC/system provider has clearly defined method
>> for communicating where they want the proximity domain information to begin.
>>
>> Thanks,
>>
>>>
>>>
>>> Thanks,
>>> Xiongfeng
>>>
>>> On 2018/1/13 8:59, Jeremy Linton wrote:
 Propagate the topology information from the PPTT tree to the
 cpu_topology array. We can get the thread id, core_id and
 cluster_id by assuming certain levels of the PPTT tree correspond
 to those 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Lorenzo Pieralisi
On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >Hi Jeremy,
> >
> >I have tested the patch with the newest UEFI. It prints the below error:
> >
> >[4.017371] BUG: arch topology borken
> >[4.021069] BUG: arch topology borken
> >[4.024764] BUG: arch topology borken
> >[4.028460] BUG: arch topology borken
> >[4.032153] BUG: arch topology borken
> >[4.035849] BUG: arch topology borken
> >[4.039543] BUG: arch topology borken
> >[4.043239] BUG: arch topology borken
> >[4.046932] BUG: arch topology borken
> >[4.050629] BUG: arch topology borken
> >[4.054322] BUG: arch topology borken
> >
> >I checked the code and found that the newest UEFI set PPTT 
> >physical_package_flag on a physical package node and
> >the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
> >of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec
> doesn't explicitly ban having the flag set multiple times between the leaf
> and the root I consider it a "bug" and there is an effort to clarify the
> spec and the use of that flag.
> >
> >When the kernel starts to build sched_domain, the multi-core sched_domain 
> >contains all the cores within a package,
> >and the lowest NUMA sched_domain contains all the cores within a die. But 
> >the kernel requires that the multi-core
> >sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
> >info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is
> detected/located within the PPTT topology (a node with a 1:1 correspondence
> isn't even required). As you can see from this patch set, we are making the
> general assumption that the proximity domains are at the same level as the
> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to
> deal with this general problem. The simple solution is another flag (say
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> find nodes with 1:1 correspondence. At that point we could add a fairly
> trivial patch to correct just the scheduler topology without affecting the
> rest of the system topology code.

I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?

Put it differently: do we really need to rely on another PPTT flag to
collect this information ?

I can't merge code that breaks a platform with legitimate firmware
bindings.

Thanks,
Lorenzo

> 
> >
> >If we modify the UEFI to make NUMA sched_domain start from the layer of 
> >package, then all the topology information
> >within the package will be discarded. I think we need to build the 
> >multi-core sched_domain using the cores within
> >the cluster instead of the cores within the package. I think that's what 
> >'multi-core' means. Multi cores form a cluster. I guess.
> >If we build the multi-core sched_domain using the cores within a cluster, I 
> >think we need to add fields in struct cpu_topology
> >to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of
> cache sharing is the "correct" top layer MC domain. For one system cluster
> might be appropriate, for another it might be the highest caching level
> within a socket, for another is might be a something in between or a group
> of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
> SRAT domain. With that, each SOC/system provider has clearly defined method
> for communicating where they want the proximity domain information to begin.
> 
> Thanks,
> 
> >
> >
> >Thanks,
> >Xiongfeng
> >
> >On 2018/1/13 8:59, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by calling find_acpi_cpu_topology_package() which terminates
> >>its search when it finds an ACPI node flagged as the physical
> >>package. If the tree doesn't contain enough levels to represent
> >>all of the requested levels then the root node will be returned
> >>for all subsequent levels.
> >>
> >>Cc: Juri Lelli 
> >>Signed-off-by: Jeremy Linton 
> >>---
> >>  arch/arm64/kernel/topology.c | 46 
> >> +++-
> >>  1 file changed, 45 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-02-23 Thread Lorenzo Pieralisi
On Thu, Jan 25, 2018 at 09:56:30AM -0600, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
> >Hi Jeremy,
> >
> >I have tested the patch with the newest UEFI. It prints the below error:
> >
> >[4.017371] BUG: arch topology borken
> >[4.021069] BUG: arch topology borken
> >[4.024764] BUG: arch topology borken
> >[4.028460] BUG: arch topology borken
> >[4.032153] BUG: arch topology borken
> >[4.035849] BUG: arch topology borken
> >[4.039543] BUG: arch topology borken
> >[4.043239] BUG: arch topology borken
> >[4.046932] BUG: arch topology borken
> >[4.050629] BUG: arch topology borken
> >[4.054322] BUG: arch topology borken
> >
> >I checked the code and found that the newest UEFI set PPTT 
> >physical_package_flag on a physical package node and
> >the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
> >of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec
> doesn't explicitly ban having the flag set multiple times between the leaf
> and the root I consider it a "bug" and there is an effort to clarify the
> spec and the use of that flag.
> >
> >When the kernel starts to build sched_domain, the multi-core sched_domain 
> >contains all the cores within a package,
> >and the lowest NUMA sched_domain contains all the cores within a die. But 
> >the kernel requires that the multi-core
> >sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
> >info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is
> detected/located within the PPTT topology (a node with a 1:1 correspondence
> isn't even required). As you can see from this patch set, we are making the
> general assumption that the proximity domains are at the same level as the
> physical socket. This isn't ideal for NUMA topologies, like the D05, that
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to
> deal with this general problem. The simple solution is another flag (say
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to
> find nodes with 1:1 correspondence. At that point we could add a fairly
> trivial patch to correct just the scheduler topology without affecting the
> rest of the system topology code.

I think Morten asked already but isn't this the same end result we end
up having if we remove the DIE level if NUMA-within-package is detected
(instead of using the default_topology[]) and we create our own ARM64
domain hierarchy (with DIE level removed) through set_sched_topology()
accordingly ?

Put it differently: do we really need to rely on another PPTT flag to
collect this information ?

I can't merge code that breaks a platform with legitimate firmware
bindings.

Thanks,
Lorenzo

> 
> >
> >If we modify the UEFI to make NUMA sched_domain start from the layer of 
> >package, then all the topology information
> >within the package will be discarded. I think we need to build the 
> >multi-core sched_domain using the cores within
> >the cluster instead of the cores within the package. I think that's what 
> >'multi-core' means. Multi cores form a cluster. I guess.
> >If we build the multi-core sched_domain using the cores within a cluster, I 
> >think we need to add fields in struct cpu_topology
> >to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of
> cache sharing is the "correct" top layer MC domain. For one system cluster
> might be appropriate, for another it might be the highest caching level
> within a socket, for another is might be a something in between or a group
> of clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a
> SRAT domain. With that, each SOC/system provider has clearly defined method
> for communicating where they want the proximity domain information to begin.
> 
> Thanks,
> 
> >
> >
> >Thanks,
> >Xiongfeng
> >
> >On 2018/1/13 8:59, Jeremy Linton wrote:
> >>Propagate the topology information from the PPTT tree to the
> >>cpu_topology array. We can get the thread id, core_id and
> >>cluster_id by assuming certain levels of the PPTT tree correspond
> >>to those concepts. The package_id is flagged in the tree and can be
> >>found by calling find_acpi_cpu_topology_package() which terminates
> >>its search when it finds an ACPI node flagged as the physical
> >>package. If the tree doesn't contain enough levels to represent
> >>all of the requested levels then the root node will be returned
> >>for all subsequent levels.
> >>
> >>Cc: Juri Lelli 
> >>Signed-off-by: Jeremy Linton 
> >>---
> >>  arch/arm64/kernel/topology.c | 46 
> >> +++-
> >>  1 file changed, 45 insertions(+), 1 deletion(-)
> >>
> >>diff --git 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-01-25 Thread Xiongfeng Wang

Hi Jeremy,
On 2018/1/25 23:56, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> I have tested the patch with the newest UEFI. It prints the below error:
>>
>> [4.017371] BUG: arch topology borken
>> [4.021069] BUG: arch topology borken
>> [4.024764] BUG: arch topology borken
>> [4.028460] BUG: arch topology borken
>> [4.032153] BUG: arch topology borken
>> [4.035849] BUG: arch topology borken
>> [4.039543] BUG: arch topology borken
>> [4.043239] BUG: arch topology borken
>> [4.046932] BUG: arch topology borken
>> [4.050629] BUG: arch topology borken
>> [4.054322] BUG: arch topology borken
>>
>> I checked the code and found that the newest UEFI set PPTT 
>> physical_package_flag on a physical package node and
>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
>> of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec 
> doesn't explicitly ban having the flag set multiple times between the leaf 
> and the root I consider it a "bug" and there is an effort to clarify the spec 
> and the use of that flag.
>>
>> When the kernel starts to build sched_domain, the multi-core sched_domain 
>> contains all the cores within a package,
>> and the lowest NUMA sched_domain contains all the cores within a die. But 
>> the kernel requires that the multi-core
>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
>> info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is 
> detected/located within the PPTT topology (a node with a 1:1 correspondence 
> isn't even required). As you can see from this patch set, we are making the 
> general assumption that the proximity domains are at the same level as the 
> physical socket. This isn't ideal for NUMA topologies, like the D05, that 
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to 
> deal with this general problem. The simple solution is another flag (say 
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to 
> find nodes with 1:1 correspondence. At that point we could add a fairly 
> trivial patch to correct just the scheduler topology without affecting the 
> rest of the system topology code.
> 
>>
>> If we modify the UEFI to make NUMA sched_domain start from the layer of 
>> package, then all the topology information
>> within the package will be discarded. I think we need to build the 
>> multi-core sched_domain using the cores within
>> the cluster instead of the cores within the package. I think that's what 
>> 'multi-core' means. Multi cores form a cluster. I guess.
>> If we build the multi-core sched_domain using the cores within a cluster, I 
>> think we need to add fields in struct cpu_topology
>> to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of 
> cache sharing is the "correct" top layer MC domain. For one system cluster 
> might be appropriate, for another it might be the highest caching level 
> within a socket, for another is might be a something in between or a group of 
> clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a 
> SRAT domain. With that, each SOC/system provider has clearly defined method 
> for communicating where they want the proximity domain information to begin.
> 
Or maybe we can add a multi-core flag in PPTT to indicate in which layer of the 
topology tree each node represent a multi-core.
And also require that the layer of multi-core be below the layer of NUMA. So we 
don't need the  PPTT_PROXIMITY_DOMAIN flag either.
I think it's reasonable for PPTT to report muli-core topology information.

Thanks,
Xiongfeng

> Thanks,
> 
>>
>>
>> Thanks,
>> Xiongfeng
>>
>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by calling find_acpi_cpu_topology_package() which terminates
>>> its search when it finds an ACPI node flagged as the physical
>>> package. If the tree doesn't contain enough levels to represent
>>> all of the requested levels then the root node will be returned
>>> for all subsequent levels.
>>>
>>> Cc: Juri Lelli 
>>> Signed-off-by: Jeremy Linton 
>>> ---
>>>   arch/arm64/kernel/topology.c | 46 
>>> +++-
>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-01-25 Thread Xiongfeng Wang

Hi Jeremy,
On 2018/1/25 23:56, Jeremy Linton wrote:
> Hi,
> 
> On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:
>> Hi Jeremy,
>>
>> I have tested the patch with the newest UEFI. It prints the below error:
>>
>> [4.017371] BUG: arch topology borken
>> [4.021069] BUG: arch topology borken
>> [4.024764] BUG: arch topology borken
>> [4.028460] BUG: arch topology borken
>> [4.032153] BUG: arch topology borken
>> [4.035849] BUG: arch topology borken
>> [4.039543] BUG: arch topology borken
>> [4.043239] BUG: arch topology borken
>> [4.046932] BUG: arch topology borken
>> [4.050629] BUG: arch topology borken
>> [4.054322] BUG: arch topology borken
>>
>> I checked the code and found that the newest UEFI set PPTT 
>> physical_package_flag on a physical package node and
>> the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology 
>> of our board is core->cluster->die->package).
> 
> I commented about that on the EDK2 mailing list. While the current spec 
> doesn't explicitly ban having the flag set multiple times between the leaf 
> and the root I consider it a "bug" and there is an effort to clarify the spec 
> and the use of that flag.
>>
>> When the kernel starts to build sched_domain, the multi-core sched_domain 
>> contains all the cores within a package,
>> and the lowest NUMA sched_domain contains all the cores within a die. But 
>> the kernel requires that the multi-core
>> sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
>> info is printed.
> 
> Right. I've mentioned this problem a couple of times.
> 
> At at the moment, the spec isn't clear about how the proximity domain is 
> detected/located within the PPTT topology (a node with a 1:1 correspondence 
> isn't even required). As you can see from this patch set, we are making the 
> general assumption that the proximity domains are at the same level as the 
> physical socket. This isn't ideal for NUMA topologies, like the D05, that 
> don't align with the physical socket.
> 
> There are efforts underway to clarify and expand upon the specification to 
> deal with this general problem. The simple solution is another flag (say 
> PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could be used to 
> find nodes with 1:1 correspondence. At that point we could add a fairly 
> trivial patch to correct just the scheduler topology without affecting the 
> rest of the system topology code.
> 
>>
>> If we modify the UEFI to make NUMA sched_domain start from the layer of 
>> package, then all the topology information
>> within the package will be discarded. I think we need to build the 
>> multi-core sched_domain using the cores within
>> the cluster instead of the cores within the package. I think that's what 
>> 'multi-core' means. Multi cores form a cluster. I guess.
>> If we build the multi-core sched_domain using the cores within a cluster, I 
>> think we need to add fields in struct cpu_topology
>> to record which cores are in each cluster.
> 
> The problem is that there isn't a generic way to identify which level of 
> cache sharing is the "correct" top layer MC domain. For one system cluster 
> might be appropriate, for another it might be the highest caching level 
> within a socket, for another is might be a something in between or a group of 
> clusters or LLCs..
> 
> Hence the effort to standardize/guarantee a PPTT node that exactly matches a 
> SRAT domain. With that, each SOC/system provider has clearly defined method 
> for communicating where they want the proximity domain information to begin.
> 
Or maybe we can add a multi-core flag in PPTT to indicate in which layer of the 
topology tree each node represent a multi-core.
And also require that the layer of multi-core be below the layer of NUMA. So we 
don't need the  PPTT_PROXIMITY_DOMAIN flag either.
I think it's reasonable for PPTT to report muli-core topology information.

Thanks,
Xiongfeng

> Thanks,
> 
>>
>>
>> Thanks,
>> Xiongfeng
>>
>> On 2018/1/13 8:59, Jeremy Linton wrote:
>>> Propagate the topology information from the PPTT tree to the
>>> cpu_topology array. We can get the thread id, core_id and
>>> cluster_id by assuming certain levels of the PPTT tree correspond
>>> to those concepts. The package_id is flagged in the tree and can be
>>> found by calling find_acpi_cpu_topology_package() which terminates
>>> its search when it finds an ACPI node flagged as the physical
>>> package. If the tree doesn't contain enough levels to represent
>>> all of the requested levels then the root node will be returned
>>> for all subsequent levels.
>>>
>>> Cc: Juri Lelli 
>>> Signed-off-by: Jeremy Linton 
>>> ---
>>>   arch/arm64/kernel/topology.c | 46 
>>> +++-
>>>   1 file changed, 45 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 7b06e263fdd1..ce8ec7fd6b32 100644
>>> --- 

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-01-25 Thread Jeremy Linton

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board 
is core->cluster->die->package).


I commented about that on the EDK2 mailing list. While the current spec 
doesn't explicitly ban having the flag set multiple times between the 
leaf and the root I consider it a "bug" and there is an effort to 
clarify the spec and the use of that flag.


When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.


Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is 
detected/located within the PPTT topology (a node with a 1:1 
correspondence isn't even required). As you can see from this patch set, 
we are making the general assumption that the proximity domains are at 
the same level as the physical socket. This isn't ideal for NUMA 
topologies, like the D05, that don't align with the physical socket.


There are efforts underway to clarify and expand upon the specification 
to deal with this general problem. The simple solution is another flag 
(say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could 
be used to find nodes with 1:1 correspondence. At that point we could 
add a fairly trivial patch to correct just the scheduler topology 
without affecting the rest of the system topology code.




If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


The problem is that there isn't a generic way to identify which level of 
cache sharing is the "correct" top layer MC domain. For one system 
cluster might be appropriate, for another it might be the highest 
caching level within a socket, for another is might be a something in 
between or a group of clusters or LLCs..


Hence the effort to standardize/guarantee a PPTT node that exactly 
matches a SRAT domain. With that, each SOC/system provider has clearly 
defined method for communicating where they want the proximity domain 
information to begin.


Thanks,




Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by calling find_acpi_cpu_topology_package() which terminates
its search when it finds an ACPI node flagged as the physical
package. If the tree doesn't contain enough levels to represent
all of the requested levels then the root node will be returned
for all subsequent levels.

Cc: Juri Lelli 
Signed-off-by: Jeremy Linton 
---
  arch/arm64/kernel/topology.c | 46 +++-
  1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7b06e263fdd1..ce8ec7fd6b32 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
   * for more details.
   */
  
+#include 

  #include 
  #include 
  #include 
@@ -22,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  #include 

@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
}
  }
  
+#ifdef CONFIG_ACPI

+/*
+ * Propagate the topology information of the processor_topology_node tree to 
the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+   bool is_threaded;
+   int cpu, topology_id;
+
+   is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+
+   

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-01-25 Thread Jeremy Linton

Hi,

On 01/25/2018 06:15 AM, Xiongfeng Wang wrote:

Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of our board 
is core->cluster->die->package).


I commented about that on the EDK2 mailing list. While the current spec 
doesn't explicitly ban having the flag set multiple times between the 
leaf and the root I consider it a "bug" and there is an effort to 
clarify the spec and the use of that flag.


When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.


Right. I've mentioned this problem a couple of times.

At at the moment, the spec isn't clear about how the proximity domain is 
detected/located within the PPTT topology (a node with a 1:1 
correspondence isn't even required). As you can see from this patch set, 
we are making the general assumption that the proximity domains are at 
the same level as the physical socket. This isn't ideal for NUMA 
topologies, like the D05, that don't align with the physical socket.


There are efforts underway to clarify and expand upon the specification 
to deal with this general problem. The simple solution is another flag 
(say PPTT_PROXIMITY_DOMAIN which would map to the D05 die) which could 
be used to find nodes with 1:1 correspondence. At that point we could 
add a fairly trivial patch to correct just the scheduler topology 
without affecting the rest of the system topology code.




If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


The problem is that there isn't a generic way to identify which level of 
cache sharing is the "correct" top layer MC domain. For one system 
cluster might be appropriate, for another it might be the highest 
caching level within a socket, for another is might be a something in 
between or a group of clusters or LLCs..


Hence the effort to standardize/guarantee a PPTT node that exactly 
matches a SRAT domain. With that, each SOC/system provider has clearly 
defined method for communicating where they want the proximity domain 
information to begin.


Thanks,




Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:

Propagate the topology information from the PPTT tree to the
cpu_topology array. We can get the thread id, core_id and
cluster_id by assuming certain levels of the PPTT tree correspond
to those concepts. The package_id is flagged in the tree and can be
found by calling find_acpi_cpu_topology_package() which terminates
its search when it finds an ACPI node flagged as the physical
package. If the tree doesn't contain enough levels to represent
all of the requested levels then the root node will be returned
for all subsequent levels.

Cc: Juri Lelli 
Signed-off-by: Jeremy Linton 
---
  arch/arm64/kernel/topology.c | 46 +++-
  1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 7b06e263fdd1..ce8ec7fd6b32 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -11,6 +11,7 @@
   * for more details.
   */
  
+#include 

  #include 
  #include 
  #include 
@@ -22,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  #include 

@@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
}
  }
  
+#ifdef CONFIG_ACPI

+/*
+ * Propagate the topology information of the processor_topology_node tree to 
the
+ * cpu_topology array.
+ */
+static int __init parse_acpi_topology(void)
+{
+   bool is_threaded;
+   int cpu, topology_id;
+
+   is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
+
+   for_each_possible_cpu(cpu) {
+  

Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-01-25 Thread Xiongfeng Wang
Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of 
our board is core->cluster->die->package).

When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.

If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by calling find_acpi_cpu_topology_package() which terminates
> its search when it finds an ACPI node flagged as the physical
> package. If the tree doesn't contain enough levels to represent
> all of the requested levels then the root node will be returned
> for all subsequent levels.
> 
> Cc: Juri Lelli 
> Signed-off-by: Jeremy Linton 
> ---
>  arch/arm64/kernel/topology.c | 46 
> +++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 7b06e263fdd1..ce8ec7fd6b32 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -22,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>   }
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to 
> the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> + bool is_threaded;
> + int cpu, topology_id;
> +
> + is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> + for_each_possible_cpu(cpu) {
> + topology_id = find_acpi_cpu_topology(cpu, 0);
> + if (topology_id < 0)
> + return topology_id;
> +
> + if (is_threaded) {
> + cpu_topology[cpu].thread_id = topology_id;
> + topology_id = find_acpi_cpu_topology(cpu, 1);
> + cpu_topology[cpu].core_id   = topology_id;
> + topology_id = find_acpi_cpu_topology_package(cpu);
> + cpu_topology[cpu].package_id = topology_id;
> + } else {
> + cpu_topology[cpu].thread_id  = -1;
> + cpu_topology[cpu].core_id= topology_id;
> + topology_id = find_acpi_cpu_topology_package(cpu);
> + cpu_topology[cpu].package_id = topology_id;
> + }
> + }
> +
> + return 0;
> +}
> +
> +#else
> +static inline int __init parse_acpi_topology(void)
> +{
> + return -EINVAL;
> +}
> +#endif
>  
>  void __init init_cpu_topology(void)
>  {
> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>* Discard anything that was parsed if we hit an error so we
>* don't use partial information.
>*/
> - if (of_have_populated_dt() && parse_dt_topology())
> + if ((!acpi_disabled) && parse_acpi_topology())
> + reset_cpu_topology();
> + else if (of_have_populated_dt() && parse_dt_topology())
>   reset_cpu_topology();
>  }
> 



Re: [PATCH v6 11/12] arm64: topology: enable ACPI/PPTT based CPU topology

2018-01-25 Thread Xiongfeng Wang
Hi Jeremy,

I have tested the patch with the newest UEFI. It prints the below error:

[4.017371] BUG: arch topology borken
[4.021069] BUG: arch topology borken
[4.024764] BUG: arch topology borken
[4.028460] BUG: arch topology borken
[4.032153] BUG: arch topology borken
[4.035849] BUG: arch topology borken
[4.039543] BUG: arch topology borken
[4.043239] BUG: arch topology borken
[4.046932] BUG: arch topology borken
[4.050629] BUG: arch topology borken
[4.054322] BUG: arch topology borken

I checked the code and found that the newest UEFI set PPTT 
physical_package_flag on a physical package node and
the NUMA domain (SRAT domains) starts from the layer of DIE. (The topology of 
our board is core->cluster->die->package).

When the kernel starts to build sched_domain, the multi-core sched_domain 
contains all the cores within a package,
and the lowest NUMA sched_domain contains all the cores within a die. But the 
kernel requires that the multi-core
sched_domain should be a subset of the lowest NUMA sched_domain, so the BUG 
info is printed.

If we modify the UEFI to make NUMA sched_domain start from the layer of 
package, then all the topology information
within the package will be discarded. I think we need to build the multi-core 
sched_domain using the cores within
the cluster instead of the cores within the package. I think that's what 
'multi-core' means. Multi cores form a cluster. I guess.
If we build the multi-core sched_domain using the cores within a cluster, I 
think we need to add fields in struct cpu_topology
to record which cores are in each cluster.


Thanks,
Xiongfeng

On 2018/1/13 8:59, Jeremy Linton wrote:
> Propagate the topology information from the PPTT tree to the
> cpu_topology array. We can get the thread id, core_id and
> cluster_id by assuming certain levels of the PPTT tree correspond
> to those concepts. The package_id is flagged in the tree and can be
> found by calling find_acpi_cpu_topology_package() which terminates
> its search when it finds an ACPI node flagged as the physical
> package. If the tree doesn't contain enough levels to represent
> all of the requested levels then the root node will be returned
> for all subsequent levels.
> 
> Cc: Juri Lelli 
> Signed-off-by: Jeremy Linton 
> ---
>  arch/arm64/kernel/topology.c | 46 
> +++-
>  1 file changed, 45 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
> index 7b06e263fdd1..ce8ec7fd6b32 100644
> --- a/arch/arm64/kernel/topology.c
> +++ b/arch/arm64/kernel/topology.c
> @@ -11,6 +11,7 @@
>   * for more details.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -22,6 +23,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -300,6 +302,46 @@ static void __init reset_cpu_topology(void)
>   }
>  }
>  
> +#ifdef CONFIG_ACPI
> +/*
> + * Propagate the topology information of the processor_topology_node tree to 
> the
> + * cpu_topology array.
> + */
> +static int __init parse_acpi_topology(void)
> +{
> + bool is_threaded;
> + int cpu, topology_id;
> +
> + is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK;
> +
> + for_each_possible_cpu(cpu) {
> + topology_id = find_acpi_cpu_topology(cpu, 0);
> + if (topology_id < 0)
> + return topology_id;
> +
> + if (is_threaded) {
> + cpu_topology[cpu].thread_id = topology_id;
> + topology_id = find_acpi_cpu_topology(cpu, 1);
> + cpu_topology[cpu].core_id   = topology_id;
> + topology_id = find_acpi_cpu_topology_package(cpu);
> + cpu_topology[cpu].package_id = topology_id;
> + } else {
> + cpu_topology[cpu].thread_id  = -1;
> + cpu_topology[cpu].core_id= topology_id;
> + topology_id = find_acpi_cpu_topology_package(cpu);
> + cpu_topology[cpu].package_id = topology_id;
> + }
> + }
> +
> + return 0;
> +}
> +
> +#else
> +static inline int __init parse_acpi_topology(void)
> +{
> + return -EINVAL;
> +}
> +#endif
>  
>  void __init init_cpu_topology(void)
>  {
> @@ -309,6 +351,8 @@ void __init init_cpu_topology(void)
>* Discard anything that was parsed if we hit an error so we
>* don't use partial information.
>*/
> - if (of_have_populated_dt() && parse_dt_topology())
> + if ((!acpi_disabled) && parse_acpi_topology())
> + reset_cpu_topology();
> + else if (of_have_populated_dt() && parse_dt_topology())
>   reset_cpu_topology();
>  }
>