On 5/19/26 17:10, Juhyung Park wrote:
> free_pagetable() is called via free_hugepage_table() with
> get_order(PMD_SIZE) = 9 to free the 2 MB vmemmap PMD leaves that back
> struct page arrays on x86_64. After commit bf9e4e30f353 ("x86/mm: use
> pagetable_free()"), it goes through pagetable_free() instead of
> __free_pages(), and pagetable_free() ultimately calls
> __free_pages(page, compound_order()) which ignores the explicit order
> argument and infers it from the page's compound metadata.
> 
> The vmemmap PMD chunks are allocated by vmemmap_alloc_block() using
> alloc_pages_node() without __GFP_COMP, so PG_head is not set and
> compound_order() returns 0. Only the first of 512 pages of each PMD
> chunk is returned to the buddy allocator on hot-remove; the remaining
> 511 pages stay allocated and become unreachable. Generalized: roughly
> 16 MB leaked per GB of hot-removed memory per cycle.
> 
> The leak affects every memory hot-remove path on x86_64 when
> memmap_on_memory=N (the default), including dax_kmem, virtio-mem,
> balloon drivers, ACPI memory hotplug, and direct sysfs offline+remove.
> memmap_on_memory=Y avoids it because free_hugepage_table() then takes
> the altmap branch and does not call free_pagetable().
> 
> Reproduced with CXL memory toggled through DAX in a loop:
> 
>   daxctl reconfigure-device --mode=system-ram dax0.0 --force
>   daxctl reconfigure-device --mode=devdax    dax0.0 --force
> 
> Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
> Cc: [email protected]
> Cc: Lu Baolu <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: David Hildenbrand <[email protected]>
> Cc: Mike Rapoport (Microsoft) <[email protected]>
> Cc: Oscar Salvador <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Andy Lutomirski <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Cc: Vishal Verma <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Assisted-by: Claude:claude-opus-4-7
> Signed-off-by: Juhyung Park <[email protected]>
> ---
>  arch/x86/mm/init_64.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index df2261fa4f98..a2301bddb647 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1024,7 +1024,12 @@ static void __meminit free_pagetable(struct page 
> *page, int order)
>               free_reserved_pages(page, nr_pages);
>  #endif
>       } else {
> -             pagetable_free(page_ptdesc(page));
> +             /*
> +              * Use __free_pages() to honor @order: vmemmap PMD leaves
> +              * freed here are not compound pages, so pagetable_free()
> +              * would lose leak 511 of 512 pages per 2 MB chunk.
> +              */
> +             __free_pages(page, order);
>       }
>  }
>  

I sent a proper fix for this already:

https://lore.kernel.org/all/[email protected]/

-- 
Cheers,

David

Reply via email to