Hi Luca,
On 14/05/2025 15:27, Luca Fancellu wrote:
+.endm
+
+.macro invalidate_dcache_one reg
+ nop
Same here.
+.endm
+> +#endif
+
/*
* Macro to prepare and set a EL2 MPU memory region.
* We will also create an according MPU memory region entry, which
@@ -56,6 +84,27 @@
isb
WRITE_SYSREG_ASM(\prbar, PRBAR_EL2)
WRITE_SYSREG_ASM(\prlar, PRLAR_EL2)
+
+ /* Load pair into xen_mpumap and invalidate cache */
To confirm my understanding, xen_mpumap is part of the loaded binary. So we
expect the bootloader to have clean the cache for us. Therefore, we
only need to invalidate the entries afterwards. Is it correct?
yes xen_mpumap is part of the binary, are you suggesting we should use clean
and invalidate here? So for example boot-wrapper-aarch64 is not
cleaning the cache.
I am mainly asking clarification of the expected state of the cache
before the write. From what you wrote, the cache line will be cleaned.
So we only need to invalidate it after the write and there is no cache
maintenance necessary before writing.
+ adr_l \base, xen_mpumap
+ add \base, \base, \sel, LSL #XEN_MPUMAP_ENTRY_SHIFT
+ store_pair \prbar, \prlar, \base
I think you want a comment on top of pr_t to mention the structure
will not changed and
can you explain better this part, what should I write on top of the typedef
struct {...} pt_t?
Hmm, my previous sentence made no sense. Let me retry. Today, you are
relying on the layout of pr_t to never change. But this is not obvious
when someone is reading the structure pr_t. So it would be good to have
a comment such as below on top of pr_t:
"
/!\ The assembly code (see ...) relies on the pr_t. If the structure is
modified, then the assembly code mostly likely needs to be updated
"
+ invalidate_dcache_one \base
This will invalidate a single line in the data cache. The size depends on the
HW, but typically it will be 64 - 128 bytes. Do we have any check
that will confirm the data will fit in an cache line?
You are right, so we are gonna write 16 bytes at most for Arm64 and (for now) 8
bytes for Arm32, so I think we will be way below 64 bytes,
shall we have a BUILD_BUG_ON inside build_assertions() in arm/mpu/mm.c to check
sizeof(pr_t) <= 64?
I wrote "typically" because there is no guarantee that the cache line
will be equal or bigger than 64-byte. Looking at the Arm Arm
(CTR_EL0.DminLine), if I am not mistaken, the minimum is 16-byte, so you
could check that sizeof() is always smaller or equal to 16-byte (should
be the case today).
Alternatively, you could implement another helper in cache.S that would
invalidate the cache and then don't rely on the size or alignment of the
structure.
+
+ /* Set/clear xen_mpumap_mask bitmap */
+ tst \prlar, #PRLAR_ELx_EN
+ bne 2f
+ /* Region is disabled, clear the bit in the bitmap */
+ bitmap_clear_bit xen_mpumap_mask, \sel, \base, \limit, \prbar, \prlar
+ b 3f
+
+2:
+ /* Region is enabled, set the bit in the bitmap */
+ bitmap_set_bit xen_mpumap_mask, \sel, \base, \limit, \prbar, \prlar
+
+3:
+ invalidate_dcache_one \base
You want to a comment to explain what this invalidate does. AFAICT, this is for
the bitmap. But given the bitmap will be typically small, wouldn't it better to
do it in one go at the end?
Yes this invalidate is a bit overkill because it will invalidate 64-128 bytes
starting from the address of the last changed word where the bit was.
Should I invalidate everything outside this macro, inside enable_boot_cpu_mm in
arm64/mpu/head.S instead?
Yes. I think it will be easier if we end up to use a function to clean
the cache as I suggested above.
Same comment here.
+
dsb sy
isb
diff --git a/xen/arch/arm/mpu/mm.c b/xen/arch/arm/mpu/mm.c
index 07c8959f4ee9..ee035a54b942 100644
--- a/xen/arch/arm/mpu/mm.c
+++ b/xen/arch/arm/mpu/mm.c
@@ -7,9 +7,25 @@
#include <xen/mm.h>
#include <xen/sizes.h>
#include <xen/types.h>
+#include <asm/mpu.h>
struct page_info *frame_table;
+/* Maximum number of supported MPU memory regions by the EL2 MPU. */
+uint8_t __ro_after_init max_mpu_regions;
+
+/*
+ * Bitmap xen_mpumap_mask is to record the usage of EL2 MPU memory regions.
+ * Bit 0 represents MPU memory region 0, bit 1 represents MPU memory
+ * region 1, ..., and so on.
+ * If a MPU memory region gets enabled, set the according bit to 1.
+ */
+DECLARE_BITMAP(xen_mpumap_mask, MAX_MPU_REGION_NR) \
+ __section(".data.page_aligned");
Why does this need to be page_aligned?
I see from the linker file that this section is aligned to SMP_CACHE_BYTES
The start of the section is cache aligned. But that may not be the case
for the content within the section itself.
Also, .data.page_aligned is used for data that are page size aligned
(currently 4KB). So everything within that section needs to be declared
with __aligned(PAGE_SIZE). But AFAIU, the bitmap is only going to be a
few bytes, correct?. So think it would be a bad idea for this variable
to be page aligned because it would end up to be a massive waste of memory.
, which is L1_CACHE_BYTES
for Arm, because we are using the invalidate cache I was under the impression
that it’s better to have it
aligned on the cache line
To clarify, L1_CACHE_BYTES is fixed to 128-byte. If I got correctly,
this is less than the maximum cache line that could exist (256-bytes).
L1_CACHE_BYTES is mainly used as a hint for Xen to try to allocate
structure in separate cache line. But when cleaning/invalidating a cache
line, the code can't rely on the data not crossing a cache line. So you
need a loop to invalidate each line.
Anyway, I think it would make sense to use __cacheline_aligned for both
as the two variables are going to be used often. We should also make
force the section to be .data, otherwise the compiler will typically put
the variable in .bss which cannot be used during early boot. This is
because the BSS region is not loaded in memory by the bootloader and the
Image protocol doesn't specify the state of the region, so for
simplicity, it is zeroed after the MMU/MPU and cache is turned on.
+
+/* EL2 Xen MPU memory region mapping table. */
+pr_t __section(".data.page_aligned") xen_mpumap[MAX_MPU_REGION_NR];
I guess for this one this is mandated by the HW?
same reason above, no other reason (I’m aware of).
.data..page_aligned makes a bit more sense here (along with
__aligned(PAGE_SIZE)). However, it seems a bit overkill when we only
need the region to be cache aligned. So I would follow same approach as
I suggested for the bitmap.
Cheers,
--
Julien Grall