free_pagetable() is called via free_hugepage_table() with
get_order(PMD_SIZE) = 9 to free the 2 MB vmemmap PMD leaves that back
struct page arrays on x86_64. After commit bf9e4e30f353 ("x86/mm: use
pagetable_free()"), it goes through pagetable_free() instead of
__free_pages(), and pagetable_free() ultimately calls
__free_pages(page, compound_order()) which ignores the explicit order
argument and infers it from the page's compound metadata.
The vmemmap PMD chunks are allocated by vmemmap_alloc_block() using
alloc_pages_node() without __GFP_COMP, so PG_head is not set and
compound_order() returns 0. Only the first of 512 pages of each PMD
chunk is returned to the buddy allocator on hot-remove; the remaining
511 pages stay allocated and become unreachable. Generalized: roughly
16 MB leaked per GB of hot-removed memory per cycle.
The leak affects every memory hot-remove path on x86_64 when
memmap_on_memory=N (the default), including dax_kmem, virtio-mem,
balloon drivers, ACPI memory hotplug, and direct sysfs offline+remove.
memmap_on_memory=Y avoids it because free_hugepage_table() then takes
the altmap branch and does not call free_pagetable().
Reproduced with CXL memory toggled through DAX in a loop:
daxctl reconfigure-device --mode=system-ram dax0.0 --force
daxctl reconfigure-device --mode=devdax dax0.0 --force
Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
Cc: [email protected]
Cc: Lu Baolu <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Rapoport (Microsoft) <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Vishal Verma <[email protected]>
Cc: [email protected]
Cc: [email protected]
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Juhyung Park <[email protected]>
---
arch/x86/mm/init_64.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index df2261fa4f98..a2301bddb647 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1024,7 +1024,12 @@ static void __meminit free_pagetable(struct page *page,
int order)
free_reserved_pages(page, nr_pages);
#endif
} else {
- pagetable_free(page_ptdesc(page));
+ /*
+ * Use __free_pages() to honor @order: vmemmap PMD leaves
+ * freed here are not compound pages, so pagetable_free()
+ * would lose leak 511 of 512 pages per 2 MB chunk.
+ */
+ __free_pages(page, order);
}
}
--
2.54.0