Replying to myself here, but as we discussed in private the *pte &= ~0; is basically a nop as well
Cheers /Ilias On Wed, 13 May 2026 at 14:57, Ilias Apalodimas <[email protected]> wrote: > > Hi Casey, > > On Tue, 12 May 2026 at 14:13, Casey Connolly <[email protected]> > wrote: > > > > As more platforms start ensuring they explicitly unmap reserved-memory > > regions a few issues have appeared with how the existing dynamic mapping > > code works. Fix these and get a small optimisation as well. > > > > 1. Teach pte_type() to actually respect the PTE_TYPE_VALID bit > > 2. Don't walk the TLB a second time if we call mmu_change_region_attr() > > with PTE_TYPE_FAULT (since it would just be a slow nop) > > 3. Fix how set_one_region() decides to split blocks. > > > > Today set_one_region() will always split blocks until it reaches the > > smallest granule size (4k) and then update all of these pages. This > > appears to be due to a big in how is_aligned() is implemented, since > > it only evaluates to true if addr and size are both multiples of the > > current granule size, so a mapping aligned to 2M which is 4M in size > > will cleanly result in 2 blocks being set, but a mapping aligned to > > 2M which is 4M + 8k in size will result in blocks being split and 1026 > > individual pages being set. > > > > While for the address it is correct to enforce that it is aligned to > > the current granule size, we only need to check if the region size is > > greater than the current granule size. This allows us to simplify our > > second example above to only 4 entries being updated (assuming no blocks > > have to be split) since we only need to update 2 blocks to map the first > > 4M, drastically improving the best-case performance. > > > > In the case where the address is 4k aligned rather than 2M aligned we > > will still be restricted to mapping 4k pages until we reach 2M alignment > > where we could then map a larger 2M granule which previously would never > > happen. > > > > Signed-off-by: Casey Connolly <[email protected]> > > --- > > arch/arm/cpu/armv8/cache_v8.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/arch/arm/cpu/armv8/cache_v8.c b/arch/arm/cpu/armv8/cache_v8.c > > index e9ce6335ee9e..bcfc36603adb 100644 > > --- a/arch/arm/cpu/armv8/cache_v8.c > > +++ b/arch/arm/cpu/armv8/cache_v8.c > > @@ -162,9 +162,9 @@ u64 get_tcr(u64 *pips, u64 *pva_bits) > > #define MAX_PTE_ENTRIES 512 > > > > static int pte_type(u64 *pte) > > { > > - return *pte & PTE_TYPE_MASK; > > + return *pte & PTE_TYPE_VALID ? *pte & PTE_TYPE_MASK : > > PTE_TYPE_FAULT; > > } > > > > /* Returns the LSB number for a PTE on level <level> */ > > static int level2shift(int level) > > @@ -980,11 +980,12 @@ u64 *__weak arch_get_page_table(void) { > > > > return NULL; > > } > > > > +/* Checks if the current PTE is an aligned subset of the region */ > > static bool is_aligned(u64 addr, u64 size, u64 align) > > { > > - return !(addr & (align - 1)) && !(size & (align - 1)); > > + return !(addr & (align - 1)) && size >= align; > > } > > > > /* Use flag to indicate if attrs has more than d-cache attributes */ > > static u64 set_one_region(u64 start, u64 size, u64 attrs, bool flag, int > > level) > > @@ -992,11 +993,16 @@ static u64 set_one_region(u64 start, u64 size, u64 > > attrs, bool flag, int level) > > int levelshift = level2shift(level); > > u64 levelsize = 1ULL << levelshift; > > u64 *pte = find_pte(start, level); > > > > - /* Can we can just modify the current level block PTE? */ > > + /* Can we can just modify the current level block/page? */ > > if (is_aligned(start, size, levelsize)) { > > - if (flag) { > > + if (attrs == PTE_TYPE_FAULT) { > > + if (pte_type(pte) == PTE_TYPE_TABLE) > > The code will still unmap the wrong regions here. One easy to way to > reprod this is apply this for QEMU, enabme meminfo and have a look at > the output. > > index 38f0ec5f2fbb..c001ccca878d 100644 > --- a/board/emulation/qemu-arm/qemu-arm.c > +++ b/board/emulation/qemu-arm/qemu-arm.c > @@ -114,6 +114,7 @@ int board_late_init(void) > if (CONFIG_IS_ENABLED(USB_KEYBOARD)) > usb_init(); > > + mmu_change_region_attr(0x50100000, 0x200000, PTE_TYPE_FAULT); > return 0; > } > > So, what I think is happening is that pte_type() doesn't take the > level into account. But For L3 entries PTE_TYPE_TABLE (which is really > PTE_TYPE_PAGE for that level) will alway be true on level 3. > > Cheers > /Ilias > > /Ilias > > + *pte &= ~0; > > + else > > + *pte &= ~(PMD_ATTRMASK | PTE_TYPE_MASK | > > PTE_BLOCK_INNER_SHARE); > > + } else if (flag) { > > *pte &= ~PMD_ATTRMASK; > > *pte |= attrs & PMD_ATTRMASK; > > } else { > > *pte &= ~PMD_ATTRINDX_MASK; > > @@ -1097,8 +1103,12 @@ void mmu_change_region_attr(phys_addr_t addr, size_t > > size, u64 attrs) > > flush_dcache_range(gd->arch.tlb_addr, > > gd->arch.tlb_addr + gd->arch.tlb_size); > > __asm_invalidate_tlb_all(); > > > > + /* If we were unmapping a region then we have nothing to make and > > can return. */ > > + if (attrs == PTE_TYPE_FAULT) > > + return; > > + > > mmu_change_region_attr_nobreak(addr, size, attrs); > > } > > > > int pgprot_set_attrs(phys_addr_t addr, size_t size, enum pgprot_attrs perm) > > > > -- > > 2.53.0 > >

