On 16.04.25 14:07, Petr Vaněk wrote:
Hi all,

I have discovered a regression introduced in commit a9b3c355c2e6
("asm-generic: pgalloc: provide generic __pgd_{alloc,free}") [1,2] in
kernel version 6.14. The problem occurs when the x86 kernel is
configured with CONFIG_DEBUG_VM_PGFLAGS=y and is run as a PV Dom0 in Xen
4.19.1. During the startup, the kernel panics with the error log below.

The commit changed PGD allocation path.  In the new implementation
_pgd_alloc allocates memory with __pgd_alloc, which indirectly calls

   alloc_pages_noprof(gfp | __GFP_COMP, order);

This is in contrast to the old behavior, where __get_free_pages was
used, which indirectly called

   alloc_pages_noprof(gfp_mask & ~__GFP_HIGHMEM, order);

The key difference is that the new allocator can return a compound page.
When xen_pin_page is later called on such a page, it call
TestSetPagePinned function, which internally uses the PF_NO_COMPOUND
macro. This macro enforces VM_BUG_ON_PGFLAGS if PageCompound is true,
triggering the panic when CONFIG_DEBUG_VM_PGFLAGS is enabled.

I am reporting this issue without a patch as I am not sure which part of
the code should be adapted to resolve the regression.

Thanks for the report AND the analysis.

I believe PMD_ALLOCATION_ORDER needs to be changed: in case the system is
running as a Xen PV domain (or with PTI disabled), PMD_ALLOCATION_ORDER
should be 0.

So I'd suggest to switch PGD_ALLOCATION_ORDER to be defined either as 0
(in case PTI is not configured), or pgd_allocation_order (a new global
variable having the value 0 or 1, depending on PTI active or not).

I'll send a patch.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to