Re: [PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity
On Thu, Aug 12, 2021 at 06:52:23PM +0530, Aneesh Kumar K.V wrote: > PAPR interface currently supports two different ways of communicating resource > grouping details to the OS. These are referred to as Form 0 and Form 1 > associativity grouping. Form 0 is the older format and is now considered > deprecated. This patch adds another resource grouping named FORM2. > > Signed-off-by: Daniel Henrique Barboza > Signed-off-by: Aneesh Kumar K.V Reviewed-by: David Gibson Though there a couple of cosmetic issues and one bad memory access issue (though only in the case of buggy firmware). [snip] > +Form 2 > +--- > +Form 2 associativity format adds separate device tree properties > representing NUMA node distance > +thereby making the node distance computation flexible. Form 2 also allows > flexible primary > +domain numbering. With numa distance computation now detached from the index > value in > +"ibm,associativity-reference-points" property, Form 2 allows a large number > of primary domain > +ids at the same domainID index representing resource groups of different > performance/latency > +characteristics. > + > +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 > in the > +"ibm,architecture-vec-5" property. > + > +"ibm,numa-lookup-index-table" property contains a list of one or more > numbers representing > +the domainIDs present in the system. The offset of the domainID in this > property is > +used as an index while computing numa distance information via > "ibm,numa-distance-table". > + > +prop-encoded-array: The number N of the domainIDs encoded as with > encode-int, followed by > +N domainID encoded as with encode-int > + > +For ex: > +"ibm,numa-lookup-index-table" = {4, 0, 8, 250, 252}. The offset of domainID > 8 (2) is used when Since you're using dts syntax below, it probably makes sense to use it here as well. > +computing the distance of domain 8 from other domains present in the system. > For the rest of > +this document, this offset will be referred to as domain distance offset. > + > +"ibm,numa-distance-table" property contains a list of one or more numbers > representing the NUMA > +distance between resource groups/domains present in the system. > + > +prop-encoded-array: The number N of the distance values encoded as with > encode-int, followed by > +N distance values encoded as with encode-bytes. The max distance value we > could encode is 255. > +The number N must be equal to the square of m where m is the number of > domainIDs in the > +numa-lookup-index-table. > + > +For ex: > +ibm,numa-lookup-index-table = <3 0 8 40>; > +ibm,numa-distace-table = <9>, /bits/ 8 < 10 20 80 > + 20 10 160 > + 80 160 10>; [snip] > + > + /* FORM2 affinity */ > + nid = of_node_to_nid_single(node); > + if (nid == NUMA_NO_NODE) > + return; > + > + /* > + * With FORM2 we expect NUMA distance of all possible NUMA > + * nodes to be provided during boot. > + */ > + WARN(numa_distance_table[nid][nid] == -1, > + "NUMA distance details for node %d not provided\n", nid); > +} > + > +/* > + * ibm,numa-lookup-index-table= {N, domainid1, domainid2, . domainidN} > + * ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6, N elements} .. and here too. > + */ > +static void initialize_form2_numa_distance_lookup_table(void) > +{ > + int i, j; > + struct device_node *root; > + const __u8 *numa_dist_table; > + const __be32 *numa_lookup_index; > + int numa_dist_table_length; > + int max_numa_index, distance_index; > + > + if (firmware_has_feature(FW_FEATURE_OPAL)) > + root = of_find_node_by_path("/ibm,opal"); > + else > + root = of_find_node_by_path("/rtas"); > + if (!root) > + root = of_find_node_by_path("/"); > + > + numa_lookup_index = of_get_property(root, > "ibm,numa-lookup-index-table", NULL); > + max_numa_index = of_read_number(&numa_lookup_index[0], 1); > + > + /* first element of the array is the size and is encode-int */ > + numa_dist_table = of_get_property(root, "ibm,numa-distance-table", > NULL); > + numa_dist_table_length = of_read_number((const __be32 > *)&numa_dist_table[0], 1); > + /* Skip the size which is encoded int */ > + numa_dist_table += sizeof(__be32); > + > + pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n", > + numa_dist_table_length, max_numa_index); You validate numa_dist_table_length below. However, AFAICT you don't anywhere check that the property actually has the length its first element claims it does. Yes, that represents a firmware bug, but it's probably best if we don't ready past the end of the array in that case, which I think is what will happen now. Same applies for the lookup-index-table. > + for (i = 0; i < max_numa_index; i++) > + /* +1 skip the ma
[PATCH v8 5/5] powerpc/pseries: Add support for FORM2 associativity
PAPR interface currently supports two different ways of communicating resource grouping details to the OS. These are referred to as Form 0 and Form 1 associativity grouping. Form 0 is the older format and is now considered deprecated. This patch adds another resource grouping named FORM2. Signed-off-by: Daniel Henrique Barboza Signed-off-by: Aneesh Kumar K.V --- Documentation/powerpc/associativity.rst | 104 arch/powerpc/include/asm/firmware.h | 3 +- arch/powerpc/include/asm/prom.h | 1 + arch/powerpc/kernel/prom_init.c | 3 +- arch/powerpc/mm/numa.c| 187 ++ arch/powerpc/platforms/pseries/firmware.c | 1 + 6 files changed, 262 insertions(+), 37 deletions(-) create mode 100644 Documentation/powerpc/associativity.rst diff --git a/Documentation/powerpc/associativity.rst b/Documentation/powerpc/associativity.rst new file mode 100644 index ..07e7dd3d6c87 --- /dev/null +++ b/Documentation/powerpc/associativity.rst @@ -0,0 +1,104 @@ + +NUMA resource associativity += + +Associativity represents the groupings of the various platform resources into +domains of substantially similar mean performance relative to resources outside +of that domain. Resources subsets of a given domain that exhibit better +performance relative to each other than relative to other resources subsets +are represented as being members of a sub-grouping domain. This performance +characteristic is presented in terms of NUMA node distance within the Linux kernel. +From the platform view, these groups are also referred to as domains. + +PAPR interface currently supports different ways of communicating these resource +grouping details to the OS. These are referred to as Form 0, Form 1 and Form2 +associativity grouping. Form 0 is the oldest format and is now considered deprecated. + +Hypervisor indicates the type/form of associativity used via "ibm,architecture-vec-5 property". +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1. +A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity +bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used. + +Form 0 +- +Form 0 associativity supports only two NUMA distances (LOCAL and REMOTE). + +Form 1 +- +With Form 1 a combination of ibm,associativity-reference-points, and ibm,associativity +device tree properties are used to determine the NUMA distance between resource groups/domains. + +The “ibm,associativity” property contains a list of one or more numbers (domainID) +representing the resource’s platform grouping domains. + +The “ibm,associativity-reference-points” property contains a list of one or more numbers +(domainID index) that represents the 1 based ordinal in the associativity lists. +The list of domainID indexes represents an increasing hierarchy of resource grouping. + +ex: +{ primary domainID index, secondary domainID index, tertiary domainID index.. } + +Linux kernel uses the domainID at the primary domainID index as the NUMA node id. +Linux kernel computes NUMA distance between two domains by recursively comparing +if they belong to the same higher-level domains. For mismatch at every higher +level of the resource group, the kernel doubles the NUMA distance between the +comparing domains. + +Form 2 +--- +Form 2 associativity format adds separate device tree properties representing NUMA node distance +thereby making the node distance computation flexible. Form 2 also allows flexible primary +domain numbering. With numa distance computation now detached from the index value in +"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain +ids at the same domainID index representing resource groups of different performance/latency +characteristics. + +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the +"ibm,architecture-vec-5" property. + +"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing +the domainIDs present in the system. The offset of the domainID in this property is +used as an index while computing numa distance information via "ibm,numa-distance-table". + +prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by +N domainID encoded as with encode-int + +For ex: +"ibm,numa-lookup-index-table" = {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when +computing the distance of domain 8 from other domains present in the system. For the rest of +this document, this offset will be referred to as domain distance offset. + +"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA +distance between resource groups/domains present in the system. + +prop-encoded-array: The number N of the distance values encoded as with encode-int, follow