Replying to myself here, but as we discussed in private the *pte &=
~0; is basically a nop as well

Cheers
/Ilias

On Wed, 13 May 2026 at 14:57, Ilias Apalodimas
<[email protected]> wrote:
>
> Hi Casey,
>
> On Tue, 12 May 2026 at 14:13, Casey Connolly <[email protected]> 
> wrote:
> >
> > As more platforms start ensuring they explicitly unmap reserved-memory
> > regions a few issues have appeared with how the existing dynamic mapping
> > code works. Fix these and get a small optimisation as well.
> >
> > 1. Teach pte_type() to actually respect the PTE_TYPE_VALID bit
> > 2. Don't walk the TLB a second time if we call mmu_change_region_attr()
> >    with PTE_TYPE_FAULT (since it would just be a slow nop)
> > 3. Fix how set_one_region() decides to split blocks.
> >
> > Today set_one_region() will always split blocks until it reaches the
> > smallest granule size (4k) and then update all of these pages. This
> > appears to be due to a big in how is_aligned() is implemented, since
> > it only evaluates to true if addr and size are both multiples of the
> > current granule size, so a mapping aligned to 2M which is 4M in size
> > will cleanly result in 2 blocks being set, but a mapping aligned to
> > 2M which is 4M + 8k in size will result in blocks being split and 1026
> > individual pages being set.
> >
> > While for the address it is correct to enforce that it is aligned to
> > the current granule size, we only need to check if the region size is
> > greater than the current granule size. This allows us to simplify our
> > second example above to only 4 entries being updated (assuming no blocks
> > have to be split) since we only need to update 2 blocks to map the first
> > 4M, drastically improving the best-case performance.
> >
> > In the case where the address is 4k aligned rather than 2M aligned we
> > will still be restricted to mapping 4k pages until we reach 2M alignment
> > where we could then map a larger 2M granule which previously would never
> > happen.
> >
> > Signed-off-by: Casey Connolly <[email protected]>
> > ---
> >  arch/arm/cpu/armv8/cache_v8.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c
> > index e9ce6335ee9e..bcfc36603adb 100644
> > --- a/arch/arm/cpu/armv8/cache_v8.c
> > +++ b/arch/arm/cpu/armv8/cache_v8.c
> > @@ -162,9 +162,9 @@ u64 get_tcr(u64 *pips, u64 *pva_bits)
> >  #define MAX_PTE_ENTRIES 512
> >
> >  static int pte_type(u64 *pte)
> >  {
> > -       return *pte & PTE_TYPE_MASK;
> > +       return *pte & PTE_TYPE_VALID ? *pte & PTE_TYPE_MASK : 
> > PTE_TYPE_FAULT;
> >  }
> >
> >  /* Returns the LSB number for a PTE on level <level> */
> >  static int level2shift(int level)
> > @@ -980,11 +980,12 @@ u64 *__weak arch_get_page_table(void) {
> >
> >         return NULL;
> >  }
> >
> > +/* Checks if the current PTE is an aligned subset of the region */
> >  static bool is_aligned(u64 addr, u64 size, u64 align)
> >  {
> > -       return !(addr & (align - 1)) && !(size & (align - 1));
> > +       return !(addr & (align - 1)) && size >= align;
> >  }
> >
> >  /* Use flag to indicate if attrs has more than d-cache attributes */
> >  static u64 set_one_region(u64 start, u64 size, u64 attrs, bool flag, int 
> > level)
> > @@ -992,11 +993,16 @@ static u64 set_one_region(u64 start, u64 size, u64 
> > attrs, bool flag, int level)
> >         int levelshift = level2shift(level);
> >         u64 levelsize = 1ULL << levelshift;
> >         u64 *pte = find_pte(start, level);
> >
> > -       /* Can we can just modify the current level block PTE? */
> > +       /* Can we can just modify the current level block/page? */
> >         if (is_aligned(start, size, levelsize)) {
> > -               if (flag) {
> > +               if (attrs == PTE_TYPE_FAULT) {
> > +                       if (pte_type(pte) == PTE_TYPE_TABLE)
>
> The code will still unmap the wrong regions here. One easy to way to
> reprod this is apply this for QEMU, enabme meminfo and have a look at
> the output.
>
> index 38f0ec5f2fbb..c001ccca878d 100644
> --- a/board/emulation/qemu-arm/qemu-arm.c
> +++ b/board/emulation/qemu-arm/qemu-arm.c
> @@ -114,6 +114,7 @@ int board_late_init(void)
>         if (CONFIG_IS_ENABLED(USB_KEYBOARD))
>                 usb_init();
>
> +       mmu_change_region_attr(0x50100000, 0x200000, PTE_TYPE_FAULT);
>         return 0;
>  }
>
> So, what I think is happening is that pte_type() doesn't take the
> level into account. But For L3 entries PTE_TYPE_TABLE (which is really
> PTE_TYPE_PAGE for that level) will alway be true on level 3.
>
> Cheers
> /Ilias
>
> /Ilias
> > +                               *pte &= ~0;
> > +                       else
> > +                               *pte &= ~(PMD_ATTRMASK | PTE_TYPE_MASK | 
> > PTE_BLOCK_INNER_SHARE);
> > +               } else if (flag) {
> >                         *pte &= ~PMD_ATTRMASK;
> >                         *pte |= attrs & PMD_ATTRMASK;
> >                 } else {
> >                         *pte &= ~PMD_ATTRINDX_MASK;
> > @@ -1097,8 +1103,12 @@ void mmu_change_region_attr(phys_addr_t addr, size_t 
> > size, u64 attrs)
> >         flush_dcache_range(gd->arch.tlb_addr,
> >                            gd->arch.tlb_addr + gd->arch.tlb_size);
> >         __asm_invalidate_tlb_all();
> >
> > +       /* If we were unmapping a region then we have nothing to make and 
> > can return. */
> > +       if (attrs == PTE_TYPE_FAULT)
> > +               return;
> > +
> >         mmu_change_region_attr_nobreak(addr, size, attrs);
> >  }
> >
> >  int pgprot_set_attrs(phys_addr_t addr, size_t size, enum pgprot_attrs perm)
> >
> > --
> > 2.53.0
> >

Reply via email to