Hi Casey,

On Tue, 12 May 2026 at 14:13, Casey Connolly <[email protected]> wrote:
>
> As more platforms start ensuring they explicitly unmap reserved-memory
> regions a few issues have appeared with how the existing dynamic mapping
> code works. Fix these and get a small optimisation as well.
>
> 1. Teach pte_type() to actually respect the PTE_TYPE_VALID bit
> 2. Don't walk the TLB a second time if we call mmu_change_region_attr()
>    with PTE_TYPE_FAULT (since it would just be a slow nop)
> 3. Fix how set_one_region() decides to split blocks.
>
> Today set_one_region() will always split blocks until it reaches the
> smallest granule size (4k) and then update all of these pages. This
> appears to be due to a big in how is_aligned() is implemented, since
> it only evaluates to true if addr and size are both multiples of the
> current granule size, so a mapping aligned to 2M which is 4M in size
> will cleanly result in 2 blocks being set, but a mapping aligned to
> 2M which is 4M + 8k in size will result in blocks being split and 1026
> individual pages being set.
>
> While for the address it is correct to enforce that it is aligned to
> the current granule size, we only need to check if the region size is
> greater than the current granule size. This allows us to simplify our
> second example above to only 4 entries being updated (assuming no blocks
> have to be split) since we only need to update 2 blocks to map the first
> 4M, drastically improving the best-case performance.
>
> In the case where the address is 4k aligned rather than 2M aligned we
> will still be restricted to mapping 4k pages until we reach 2M alignment
> where we could then map a larger 2M granule which previously would never
> happen.
>
> Signed-off-by: Casey Connolly <[email protected]>
> ---
>  arch/arm/cpu/armv8/cache_v8.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> index e9ce6335ee9e..bcfc36603adb 100644
> --- a/arch/arm/cpu/armv8/cache_v8.c
> +++ b/arch/arm/cpu/armv8/cache_v8.c
> @@ -162,9 +162,9 @@ u64 get_tcr(u64 *pips, u64 *pva_bits)
>  #define MAX_PTE_ENTRIES 512
>
>  static int pte_type(u64 *pte)
>  {
> -       return *pte & PTE_TYPE_MASK;
> +       return *pte & PTE_TYPE_VALID ? *pte & PTE_TYPE_MASK : PTE_TYPE_FAULT;
>  }
>
>  /* Returns the LSB number for a PTE on level <level> */
>  static int level2shift(int level)
> @@ -980,11 +980,12 @@ u64 *__weak arch_get_page_table(void) {
>
>         return NULL;
>  }
>
> +/* Checks if the current PTE is an aligned subset of the region */
>  static bool is_aligned(u64 addr, u64 size, u64 align)
>  {
> -       return !(addr & (align - 1)) && !(size & (align - 1));
> +       return !(addr & (align - 1)) && size >= align;
>  }
>
>  /* Use flag to indicate if attrs has more than d-cache attributes */
>  static u64 set_one_region(u64 start, u64 size, u64 attrs, bool flag, int 
> level)
> @@ -992,11 +993,16 @@ static u64 set_one_region(u64 start, u64 size, u64 
> attrs, bool flag, int level)
>         int levelshift = level2shift(level);
>         u64 levelsize = 1ULL << levelshift;
>         u64 *pte = find_pte(start, level);
>
> -       /* Can we can just modify the current level block PTE? */
> +       /* Can we can just modify the current level block/page? */
>         if (is_aligned(start, size, levelsize)) {
> -               if (flag) {
> +               if (attrs == PTE_TYPE_FAULT) {
> +                       if (pte_type(pte) == PTE_TYPE_TABLE)

The code will still unmap the wrong regions here. One easy to way to
reprod this is apply this for QEMU, enabme meminfo and have a look at
the output.

index 38f0ec5f2fbb..c001ccca878d 100644
--- a/board/emulation/qemu-arm/qemu-arm.c
+++ b/board/emulation/qemu-arm/qemu-arm.c
@@ -114,6 +114,7 @@ int board_late_init(void)
        if (CONFIG_IS_ENABLED(USB_KEYBOARD))
                usb_init();

+       mmu_change_region_attr(0x50100000, 0x200000, PTE_TYPE_FAULT);
        return 0;
 }

So, what I think is happening is that pte_type() doesn't take the
level into account. But For L3 entries PTE_TYPE_TABLE (which is really
PTE_TYPE_PAGE for that level) will alway be true on level 3.

Cheers
/Ilias

/Ilias
> +                               *pte &= ~0;
> +                       else
> +                               *pte &= ~(PMD_ATTRMASK | PTE_TYPE_MASK | 
> PTE_BLOCK_INNER_SHARE);
> +               } else if (flag) {
>                         *pte &= ~PMD_ATTRMASK;
>                         *pte |= attrs & PMD_ATTRMASK;
>                 } else {
>                         *pte &= ~PMD_ATTRINDX_MASK;
> @@ -1097,8 +1103,12 @@ void mmu_change_region_attr(phys_addr_t addr, size_t 
> size, u64 attrs)
>         flush_dcache_range(gd->arch.tlb_addr,
>                            gd->arch.tlb_addr + gd->arch.tlb_size);
>         __asm_invalidate_tlb_all();
>
> +       /* If we were unmapping a region then we have nothing to make and can 
> return. */
> +       if (attrs == PTE_TYPE_FAULT)
> +               return;
> +
>         mmu_change_region_attr_nobreak(addr, size, attrs);
>  }
>
>  int pgprot_set_attrs(phys_addr_t addr, size_t size, enum pgprot_attrs perm)
>
> --
> 2.53.0
>

Reply via email to