Neat. Any sign of it getting merged?

Thanks.


On Wed, May 20, 2026 at 2:24 PM David Hildenbrand (Arm)
<[email protected]> wrote:
>
> On 5/19/26 17:10, Juhyung Park wrote:
> > free_pagetable() is called via free_hugepage_table() with
> > get_order(PMD_SIZE) = 9 to free the 2 MB vmemmap PMD leaves that back
> > struct page arrays on x86_64. After commit bf9e4e30f353 ("x86/mm: use
> > pagetable_free()"), it goes through pagetable_free() instead of
> > __free_pages(), and pagetable_free() ultimately calls
> > __free_pages(page, compound_order()) which ignores the explicit order
> > argument and infers it from the page's compound metadata.
> >
> > The vmemmap PMD chunks are allocated by vmemmap_alloc_block() using
> > alloc_pages_node() without __GFP_COMP, so PG_head is not set and
> > compound_order() returns 0. Only the first of 512 pages of each PMD
> > chunk is returned to the buddy allocator on hot-remove; the remaining
> > 511 pages stay allocated and become unreachable. Generalized: roughly
> > 16 MB leaked per GB of hot-removed memory per cycle.
> >
> > The leak affects every memory hot-remove path on x86_64 when
> > memmap_on_memory=N (the default), including dax_kmem, virtio-mem,
> > balloon drivers, ACPI memory hotplug, and direct sysfs offline+remove.
> > memmap_on_memory=Y avoids it because free_hugepage_table() then takes
> > the altmap branch and does not call free_pagetable().
> >
> > Reproduced with CXL memory toggled through DAX in a loop:
> >
> >   daxctl reconfigure-device --mode=system-ram dax0.0 --force
> >   daxctl reconfigure-device --mode=devdax    dax0.0 --force
> >
> > Fixes: bf9e4e30f353 ("x86/mm: use pagetable_free()")
> > Cc: [email protected]
> > Cc: Lu Baolu <[email protected]>
> > Cc: Jason Gunthorpe <[email protected]>
> > Cc: David Hildenbrand <[email protected]>
> > Cc: Mike Rapoport (Microsoft) <[email protected]>
> > Cc: Oscar Salvador <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > Cc: Dave Hansen <[email protected]>
> > Cc: Andy Lutomirski <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Thomas Gleixner <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Borislav Petkov <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Dave Jiang <[email protected]>
> > Cc: Vishal Verma <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Assisted-by: Claude:claude-opus-4-7
> > Signed-off-by: Juhyung Park <[email protected]>
> > ---
> >  arch/x86/mm/init_64.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index df2261fa4f98..a2301bddb647 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -1024,7 +1024,12 @@ static void __meminit free_pagetable(struct page 
> > *page, int order)
> >               free_reserved_pages(page, nr_pages);
> >  #endif
> >       } else {
> > -             pagetable_free(page_ptdesc(page));
> > +             /*
> > +              * Use __free_pages() to honor @order: vmemmap PMD leaves
> > +              * freed here are not compound pages, so pagetable_free()
> > +              * would lose leak 511 of 512 pages per 2 MB chunk.
> > +              */
> > +             __free_pages(page, order);
> >       }
> >  }
> >
>
> I sent a proper fix for this already:
>
> https://lore.kernel.org/all/[email protected]/
>
> --
> Cheers,
>
> David

Reply via email to