On 27.10.2024 15:43, Bernhard Kaindl wrote:
> From: Bernhard Kaindl <[email protected]>
>
> At the moment, Xen keeps track of the spans of PFNs of the NUMA nodes.
> But the PFN span sometimes includes large MMIO holes, so these values
> might not be an exact representation of the total usable RAM of nodes.
>
> Xen does not need it, but the size of the NUMA node's memory can be
> helpful for management tools and HW information tools like hwloc/lstopo
> with its Xen backend for Dom0: https://github.com/xenserver-next/hwloc/
>
> First, introduce NODE_DATA(nodeid)->node_present_pages to node_data[],
> determine the sum of usable PFNs at boot and update them on memory_add().
>
> (The Linux kernel handles NODE_DATA->node_present_pages likewise)
>
> Signed-off-by: Bernhard Kaindl <[email protected]>
> ---
> Changes in v3:
> - Use PFN_UP/DOWN, refactored further to simplify the code while leaving
> compiler-level optimisations to the compiler's optimisation passes.
> Changes in v4:
> - Refactored code and doxygen documentation according to the review.
> ---
> xen/arch/x86/numa.c | 13 +++++++++++++
> xen/arch/x86/x86_64/mm.c | 3 +++
> xen/common/numa.c | 36 +++++++++++++++++++++++++++++++++---
> xen/include/xen/numa.h | 21 +++++++++++++++++++++
> 4 files changed, 70 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
> index 4b0b297c7e..3c0574f773 100644
> --- a/xen/arch/x86/numa.c
> +++ b/xen/arch/x86/numa.c
> @@ -100,6 +100,19 @@ unsigned int __init arch_get_dma_bitsize(void)
> + PAGE_SHIFT, 32);
> }
>
> +/**
> + * @brief Retrieves the RAM range for a given index from the e820 memory map.
> + *
> + * This function fetches the start and end address (exclusive) of a RAM range
> + * specified by the given index idx from the e820 memory map.
I think the use of (exclusive) here leaves room for ambiguity (as it may,
unusually, apply to start as well then). Imo it would better be put ...
> + * @param idx The index of the RAM range in the e820 memory map to retrieve.
> + * @param start Pointer to store the start address of the RAM range.
> + * @param end Pointer to store the end address of the RAM range.
... here, just like you have it ...
> + * @return 0 on success, -ENOENT if the index is out of bounds,
> + * or -ENODATA if the memory map at index idx is not of type
> E820_RAM.
> + */
> int __init arch_get_ram_range(unsigned int idx, paddr_t *start, paddr_t *end)
> {
> if ( idx >= e820.nr_map )
> --- a/xen/common/numa.c
> +++ b/xen/common/numa.c
> @@ -4,6 +4,7 @@
> * Adapted for Xen: Ryan Harper <[email protected]>
> */
>
> +#include "xen/pfn.h"
> #include <xen/init.h>
> #include <xen/keyhandler.h>
> #include <xen/mm.h>
> @@ -499,15 +500,44 @@ int __init compute_hash_shift(const struct node *nodes,
> return shift;
> }
>
> -/* Initialize NODE_DATA given nodeid and start/end */
> +/**
> + * @brief Initialize a NUMA node's node_data structure at boot.
> + *
> + * It is given the NUMA node's index in the node_data array as well
> + * as the start and exclusive end address of the node's memory span
> + * as arguments and initializes the node_data entry with this information.
> + *
> + * It then initializes the total number of usable memory pages within
> + * the NUMA node's memory span using the arch_get_ram_range() function.
> + *
> + * @param nodeid The index into the node_data array for the node.
> + * @param start The starting physical address of the node's memory range.
> + * @param end The exclusive ending physical address of the node's memory
> range.
... here.
> + */
> void __init setup_node_bootmem(nodeid_t nodeid, paddr_t start, paddr_t end)
> {
> unsigned long start_pfn = paddr_to_pfn(start);
> unsigned long end_pfn = paddr_to_pfn(end);
> + struct node_data *numa_node = NODE_DATA(nodeid);
> + paddr_t start_ram, end_ram;
> + unsigned int idx = 0;
> + unsigned long *pages = &numa_node->node_present_pages;
>
> - NODE_DATA(nodeid)->node_start_pfn = start_pfn;
> - NODE_DATA(nodeid)->node_spanned_pages = end_pfn - start_pfn;
> + numa_node->node_start_pfn = start_pfn;
> + numa_node->node_spanned_pages = end_pfn - start_pfn;
> +
> + /* Calculate the number of present RAM pages within the node: */
> + *pages = 0;
> + do {
> + int err = arch_get_ram_range(idx++, &start_ram, &end_ram);
> +
> + if (err == -ENOENT)
> + break;
> + if ( err || start_ram >= end || end_ram <= start )
> + continue; /* range is outside of the node, or not usable RAM */
>
> + *pages += PFN_DOWN(min(end_ram, end)) - PFN_UP(max(start_ram,
> start));
> + } while (1);
Nit: While we have ample bad examples, I think even in such while() uses style
ought to be followed (i.e. "while ( 1 )"). Personally, since this looks a little
odd to me, I generally prefer "for ( ; ; )" in such cases.
With respective adjustments (which I'm happy to make while committing, so long
as you agree):
Reviewed-by: Jan Beulich <[email protected]>
Jan