date:20230112

Re: [RFC][PATCH 0/6] x86: Fix suspend vs retbleed=stuff

2023-01-12 Thread Joan Bruguera

Hi Peter,

I tried your patches on both QEMU and my two (real) computers where
s2ram with `retbleed=stuff` was failing and they wake up fine now.

However, I think some minor reviews are needed:

(1) I got a build error due to a symbol conflict between the
`restore_registers` in `arch/x86/include/asm/suspend_64.h` and the
one in `drivers/gpu/drm/amd/display/dc/gpio/hw_gpio.c`.

(I fixed by renaming the one in `hw_gpio.c`, but it's worth
 an `allmodconfig` just in case there's something else)

(2) Tracing with QEMU I still see two `sarq $5, %gs:0x1337B33F` before
`%gs` is restored. Those correspond to the calls from
`secondary_startup_64` in `arch/x86/kernel/head_64.S` to
`verify_cpu` and `sev_verify_cbit`.
Those don't cause a crash but look suspicious, are they correct?

(There are also some `sarq`s in the call to `early_setup_idt` from
`secondary_startup_64`, but `%gs` is restored immediately before)

I attach an annotated QEMU log for those if it is useful.

Regards,
- Joan

QEMU wakeup log:

# 32-bit code ellided. Next line calls `secondary_startup_64` from `startup_64`
0x0009a0d0:  ff 25 2a 2f 00 00jmpq *0x2f2a(%rip)
# Next line is `call verify_cpu` from `secondary_startup_64`
0x9a800070:  e8 f1 00 00 00   callq0x9a800166
# This next `sarq` does not have the correct GS set?
# RAX=80050033 RBX=0800 RCX=c080 
RDX=
# RSI= RDI=0001 RBP= 
RSP=0009e018
# R8 = R9 = R10= 
R11=
# R12= R13= R14= 
R15=
# RIP=9a800166 RFL=00200097 [--S-APC] CPL=0 II=0 A20=1 SMM=0 HLT=0
# ES =0018   00cf9300 DPL=0 DS   [-WA]
# CS =0010   00af9b00 DPL=0 CS64 [-RA]
# SS =0018   00cf9300 DPL=0 DS   [-WA]
# DS =0018   00cf9300 DPL=0 DS   [-WA]
# FS =0018   00cf9300 DPL=0 DS   [-WA]
# GS =0018   00cf9300 DPL=0 DS   [-WA]
# LDT=   8200 DPL=0 LDT
# TR =   8b00 DPL=0 TSS64-busy
# GDT= 00098030 001f
# IDT=  
# CR0=80050033 CR2= CR3=0009c000 CR4=06b0
# DR0= DR1= DR2= 
DR3= 
# DR6=0ff0 DR7=0400
# CCS=0095 CCD=f6ff CCO=EFLAGS
# EFER=0d01
0x9a800166:  65 48 c1 3c 25 90 29 03  sarq $5, %gs:0x32990
0x9a80016e:  00 05
0x9a800170:  66 0f 1f 00  nopw (%rax)
0x9a800174:  9c   pushfq   
0x9a800175:  6a 00pushq$0
0x9a800177:  9d   popfq
0x9a800178:  b8 00 00 00 00   movl $0, %eax
0x9a80017d:  0f a2cpuid
0x9a80017f:  83 f8 01 cmpl $1, %eax
0x9a800182:  0f 82 d2 00 00 00jb   0x9a80025a
0x9a800188:  66 31 ff xorw %di, %di
0x9a80018b:  81 fb 41 75 74 68cmpl $0x68747541, %ebx
0x9a800191:  75 16jne  0x9a8001a9
0x9a800193:  81 fa 65 6e 74 69cmpl $0x69746e65, %edx
0x9a800199:  75 0ejne  0x9a8001a9
0x9a80019b:  81 f9 63 41 4d 44cmpl $0x444d4163, %ecx
0x9a8001a1:  75 06jne  0x9a8001a9
0x9a8001a3:  66 bf 01 00  movw $1, %di
0x9a8001a7:  eb 4djmp  0x9a8001f6
0x9a8001f6:  b8 01 00 00 00   movl $1, %eax
0x9a8001fb:  0f a2cpuid
0x9a8001fd:  81 e2 61 81 00 07andl $0x7008161, %edx
0x9a800203:  81 f2 61 81 00 07xorl $0x7008161, %edx
0x9a800209:  75 4fjne  0x9a80025a
0x9a80020b:  b8 00 00 00 80   movl $0x8000, %eax
0x9a800210:  0f a2cpuid
0x9a800212:  3d 01 00 00 80   cmpl $0x8001, %eax
0x9a800217:  72 41jb   0x9a80025a
0x9a800219:  b8 01 00 00 80   movl $0x8001, %eax
0x9a80021e:  0f a2cpuid
0x9a800220:  81 e2 00 00 00 20andl $0x2000, %edx
0x9a800226:  81 f2 00 00 00 20xorl $0x2000, %edx
0x9a80022c:  75 2cjne  0x9a80025a
0x9a80022e:  b8 01 00 00 00   movl $1, %eax

Re: [PATCH v2 6/8] x86/iommu: call pi_update_irte through an hvm_function callback

2023-01-12 Thread Xenia Ragiadakou




On 1/12/23 14:37, Jan Beulich wrote:

On 12.01.2023 13:16, Jan Beulich wrote:

On 04.01.2023 09:45, Xenia Ragiadakou wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2143,6 +2143,14 @@ static bool cf_check vmx_test_pir(const struct vcpu *v, 
uint8_t vec)
  return pi_test_pir(vec, >arch.hvm.vmx.pi_desc);
  }
  
+static int cf_check vmx_pi_update_irte(const struct vcpu *v,

+   const struct pirq *pirq, uint8_t gvec)
+{
+const struct pi_desc *pi_desc = v ? >arch.hvm.vmx.pi_desc : NULL;
+
+return pi_update_irte(pi_desc, pirq, gvec);
+}


This being the only caller of pi_update_irte(), I don't see the point in
having the extra wrapper. Adjust pi_update_irte() such that it can be
used as the intended hook directly. Plus perhaps prefix it with vtd_.


Plus move it to vtd/x86/hvm.c (!HVM builds shouldn't need it), albeit I
realize this could be done independent of your work. In principle the
function shouldn't be VT-d specific (and could hence live in x86/hvm.c),
as msi_msg_write_remap_rte() is already available as IOMMU hook anyway,
provided struct pi_desc turns out compatible with what's going to be
needed for AMD.


Since the posted interrupt descriptor is vmx specific while 
msi_msg_write_remap_rte is iommu specific, can I propose the following:


- Keep the name as is (i.e vmx_pi_update_irte) and keep its definition 
in xen/arch/x86/hvm/vmx/vmx.c


- Open code pi_update_irte() inside the body of vmx_pi_update_irte() but 
replace intel-specific msi_msg_write_remap_rte() with generic 
iommu_update_ire_from_msi().


Does this approach make sense?

--
Xenia

Re: [PATCH v2 6/8] x86/iommu: call pi_update_irte through an hvm_function callback

2023-01-12 Thread Xenia Ragiadakou




On 1/12/23 14:16, Jan Beulich wrote:

On 04.01.2023 09:45, Xenia Ragiadakou wrote:

--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2143,6 +2143,14 @@ static bool cf_check vmx_test_pir(const struct vcpu *v, 
uint8_t vec)
  return pi_test_pir(vec, >arch.hvm.vmx.pi_desc);
  }
  
+static int cf_check vmx_pi_update_irte(const struct vcpu *v,

+   const struct pirq *pirq, uint8_t gvec)
+{
+const struct pi_desc *pi_desc = v ? >arch.hvm.vmx.pi_desc : NULL;
+
+return pi_update_irte(pi_desc, pirq, gvec);
+}


This being the only caller of pi_update_irte(), I don't see the point in
having the extra wrapper. Adjust pi_update_irte() such that it can be
used as the intended hook directly. Plus perhaps prefix it with vtd_.


Ok I will remove the extra wrapper.




@@ -2591,6 +2599,8 @@ static struct hvm_function_table __initdata_cf_clobber 
vmx_function_table = {
  .tsc_scaling = {
  .max_ratio = VMX_TSC_MULTIPLIER_MAX,
  },
+
+.pi_update_irte = vmx_pi_update_irte,


You want to install this hook only when iommu_intpost (i.e. the only case
when it can actually be called, and only when INTEL_IOMMU=y (avoiding the
need for an inline stub of pi_update_irte() or whatever its final name is
going to be.


Ok will do.




@@ -250,6 +252,9 @@ struct hvm_function_table {
  /* Architecture function to setup TSC scaling ratio */
  void (*setup)(struct vcpu *v);
  } tsc_scaling;
+
+int (*pi_update_irte)(const struct vcpu *v,
+  const struct pirq *pirq, uint8_t gvec);
  };


Please can this be moved higher up, e.g. next to .


Right after handle_eoi would be ok? or higher up?




@@ -774,6 +779,16 @@ static inline void hvm_set_nonreg_state(struct vcpu *v,
  alternative_vcall(hvm_funcs.set_nonreg_state, v, nrs);
  }
  
+static inline int hvm_pi_update_irte(const struct vcpu *v,

+ const struct pirq *pirq, uint8_t gvec)
+{
+if ( hvm_funcs.pi_update_irte )
+return alternative_call(hvm_funcs.pi_update_irte, v, pirq, gvec);
+
+return -EOPNOTSUPP;


I don't think the conditional is needed, at least not with the other
suggested adjustments. Plus the way alternative patching works, a NULL
hook will be converted to some equivalent of BUG() anyway, so
ASSERT_UNREACHABLE() should also be unnecessary.


Ok will remove it.




+}
+
+
  #else  /* CONFIG_HVM */


Please don't add double blank lines.


Ok will fix.




--- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
+++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
@@ -146,6 +146,17 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
  clear_bit(POSTED_INTR_SN, _desc->control);
  }
  
+#ifdef CONFIG_INTEL_IOMMU

+int pi_update_irte(const struct pi_desc *pi_desc,
+   const struct pirq *pirq, const uint8_t gvec);
+#else
+static inline int pi_update_irte(const struct pi_desc *pi_desc,
+ const struct pirq *pirq, const uint8_t gvec)
+{
+return -EOPNOTSUPP;
+}
+#endif


This still is a VT-d function, so I think its declaration would better
remain in asm/iommu.h.

Jan


--
Xenia

[PATCH v2 35/40] xen/mpu: destroy boot modules and early FDT mapping in MPU system

2023-01-12 Thread Penny Zheng

In MMU system, we will free all memory as boot modules, like kernel
initramfs module, into heap, and it is not applicable in MPU system.
Heap must be statically configured in Device tree, so it should not
change.
In MPU system, we destory MPU memory regions of boot modules.

In MPU version of remove_early_mappings, we destroy MPU memory
region of early FDT mapping.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/mm_mpu.c|  4 
 xen/arch/arm/setup.c | 25 -
 xen/arch/arm/setup_mmu.c | 25 +
 xen/arch/arm/setup_mpu.c | 26 ++
 4 files changed, 55 insertions(+), 25 deletions(-)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index de0c7d919a..118bb11d1a 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -854,6 +854,10 @@ void dump_hyp_walk(vaddr_t addr)
 
 void __init remove_early_mappings(void)
 {
+/* Earlier, early FDT is mapped with MAX_FDT_SIZE in early_fdt_map */
+if ( destroy_xen_mappings(round_pgdown(dtb_paddr),
+  round_pgup(dtb_paddr + MAX_FDT_SIZE)) )
+panic("Unable to destroy early Device-Tree mapping.\n");
 }
 
 int init_secondary_pagetables(int cpu)
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 0eac33e68c..49ba998f68 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -412,31 +412,6 @@ const char * __init 
boot_module_kind_as_string(bootmodule_kind kind)
 }
 }
 
-void __init discard_initial_modules(void)
-{
-struct bootmodules *mi = 
-int i;
-
-for ( i = 0; i < mi->nr_mods; i++ )
-{
-paddr_t s = mi->module[i].start;
-paddr_t e = s + PAGE_ALIGN(mi->module[i].size);
-
-if ( mi->module[i].kind == BOOTMOD_XEN )
-continue;
-
-if ( !mfn_valid(maddr_to_mfn(s)) ||
- !mfn_valid(maddr_to_mfn(e)) )
-continue;
-
-fw_unreserved_regions(s, e, init_domheap_pages, 0);
-}
-
-mi->nr_mods = 0;
-
-remove_early_mappings();
-}
-
 /* Relocate the FDT in Xen heap */
 static void * __init relocate_fdt(paddr_t dtb_paddr, size_t dtb_size)
 {
diff --git a/xen/arch/arm/setup_mmu.c b/xen/arch/arm/setup_mmu.c
index 7e5d87f8bd..611a60633e 100644
--- a/xen/arch/arm/setup_mmu.c
+++ b/xen/arch/arm/setup_mmu.c
@@ -340,6 +340,31 @@ void __init setup_mm(void)
 }
 #endif
 
+void __init discard_initial_modules(void)
+{
+struct bootmodules *mi = 
+int i;
+
+for ( i = 0; i < mi->nr_mods; i++ )
+{
+paddr_t s = mi->module[i].start;
+paddr_t e = s + PAGE_ALIGN(mi->module[i].size);
+
+if ( mi->module[i].kind == BOOTMOD_XEN )
+continue;
+
+if ( !mfn_valid(maddr_to_mfn(s)) ||
+ !mfn_valid(maddr_to_mfn(e)) )
+continue;
+
+fw_unreserved_regions(s, e, init_domheap_pages, 0);
+}
+
+mi->nr_mods = 0;
+
+remove_early_mappings();
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index f7d74ea604..f47f1f39ee 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -152,6 +152,32 @@ bool __init mpu_memory_section_contains(paddr_t s, paddr_t 
e,
 return false;
 }
 
+void __init discard_initial_modules(void)
+{
+unsigned int i = 0;
+
+/*
+ * Xenheap in MPU system must be statically configured in FDT in MPU
+ * system, so its base address and size couldn't change and it could not
+ * accept freed memory from boot modules.
+ * Disable MPU memory region of boot module section, since it will be in
+ * no use after boot.
+ */
+for ( ; i < mpuinfo.sections[MSINFO_BOOTMODULE].nr_banks; i++ )
+{
+paddr_t start = mpuinfo.sections[MSINFO_BOOTMODULE].bank[i].start;
+paddr_t size = mpuinfo.sections[MSINFO_BOOTMODULE].bank[i].size;
+int rc;
+
+rc = destroy_xen_mappings(start, start + size);
+if ( rc )
+panic("mpu: Unable to destroy boot module section 0x%"PRIpaddr"- 
0x%"PRIpaddr"\n",
+  start, start + size);
+}
+
+remove_early_mappings();
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1

[PATCH v2 24/40] xen/mpu: introduce "mpu,xxx-memory-section"

2023-01-12 Thread Penny Zheng

In MPU system, all kinds of resources, including system resource and
domain resource must be statically configured in Device Tree, i.e,
guest RAM must be statically allocated through "xen,static-mem" property
under domain node.

However, due to limited MPU protection regions and a wide variety of resource,
we could easily exhaust all MPU protection regions very quickly.
So we want to introduce a set of new property, "#mpu,xxx-memory-section"
to mitigate the impact.
Each property limits the available host address range of one kind of
system/domain resource.

This commit also introduces "#mpu,guest-memory-section" as an example, for
limiting the scattering of static memory as guest RAM.
Guest RAM shall be not only statically configured through "xen,static-mem"
property in MPU system, but also shall be defined inside
outside "mpu,guest-memory-section".

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/bootfdt.c   | 13 ---
 xen/arch/arm/include/asm/setup.h | 24 +
 xen/arch/arm/setup_mpu.c | 58 
 3 files changed, 91 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
index 0085c28d74..d7a5dd0ede 100644
--- a/xen/arch/arm/bootfdt.c
+++ b/xen/arch/arm/bootfdt.c
@@ -59,10 +59,10 @@ void __init device_tree_get_reg(const __be32 **cell, u32 
address_cells,
 *size = dt_next_cell(size_cells, cell);
 }
 
-static int __init device_tree_get_meminfo(const void *fdt, int node,
-  const char *prop_name,
-  u32 address_cells, u32 size_cells,
-  void *data, enum membank_type type)
+int __init device_tree_get_meminfo(const void *fdt, int node,
+   const char *prop_name,
+   u32 address_cells, u32 size_cells,
+   void *data, enum membank_type type)
 {
 const struct fdt_property *prop;
 unsigned int i, banks;
@@ -315,6 +315,11 @@ static int __init process_chosen_node(const void *fdt, int 
node,
 bootinfo.static_heap = true;
 }
 
+#ifdef CONFIG_HAS_MPU
+if ( process_mpuinfo(fdt, node, address_cells, size_cells) )
+return -EINVAL;
+#endif
+
 printk("Checking for initrd in /chosen\n");
 
 prop = fdt_get_property(fdt, node, "linux,initrd-start", );
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 8f353b67f8..3581f8f990 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -172,6 +172,11 @@ void device_tree_get_reg(const __be32 **cell, u32 
address_cells,
 u32 device_tree_get_u32(const void *fdt, int node,
 const char *prop_name, u32 dflt);
 
+int device_tree_get_meminfo(const void *fdt, int node,
+const char *prop_name,
+u32 address_cells, u32 size_cells,
+void *data, enum membank_type type);
+
 int map_range_to_domain(const struct dt_device_node *dev,
 u64 addr, u64 len, void *data);
 
@@ -185,6 +190,25 @@ struct init_info
 unsigned int cpuid;
 };
 
+#ifdef CONFIG_HAS_MPU
+/* Index of MPU memory section */
+enum mpu_section_info {
+MSINFO_GUEST,
+MSINFO_MAX
+};
+
+extern const char *mpu_section_info_str[MSINFO_MAX];
+
+struct mpuinfo {
+struct meminfo sections[MSINFO_MAX];
+};
+
+extern struct mpuinfo mpuinfo;
+
+extern int process_mpuinfo(const void *fdt, int node, uint32_t address_cells,
+   uint32_t size_cells);
+#endif /* CONFIG_HAS_MPU */
+
 #endif
 /*
  * Local variables:
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index ca0d8237d5..09a38a34a4 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -20,12 +20,70 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
+const char *mpu_section_info_str[MSINFO_MAX] = {
+"mpu,guest-memory-section",
+};
+
+/*
+ * mpuinfo stores mpu memory section info, which is configured under
+ * "mpu,xxx-memory-section" in Device Tree.
+ */
+struct mpuinfo __initdata mpuinfo;
+
+/*
+ * Due to limited MPU protection regions and a wide variety of resource,
+ * "#mpu,xxx-memory-section" is introduced to mitigate the impact.
+ * Each property limits the available host address range of one kind of
+ * system/domain resource.
+ *
+ * "mpu,guest-memory-section": guest RAM must be statically allocated
+ * through "xen,static-mem" property in MPU system. "mpu,guest-memory-section"
+ * limits the scattering of "xen,static-mem", as users could not define
+ * a "xen,static-mem" outside "mpu,guest-memory-section".
+ */
+static int __init process_mpu_memory_section(const void *fdt, int node,
+ const char *name, void *data,
+

[PATCH v2 28/40] xen/mpu: map boot module section in MPU system

2023-01-12 Thread Penny Zheng

In MPU system, we could not afford mapping a new MPU memory region
with each new guest boot module, it will exhaust limited MPU memory regions
very quickly.

So we introduce `mpu,boot-module-section` for users to statically configure
one big memory section or very few memory sections for all guests' boot mudules.
Users shall make sure that any guest boot module defined in Device Tree is
within the section, including kernel module(BOOTMOD_KERNEL), device tree
passthrough module(BOOTMOD_GUEST_DTB), and ramdisk module(BOOTMOD_RAMDISK).

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/setup.h | 1 +
 xen/arch/arm/mm_mpu.c| 2 +-
 xen/arch/arm/setup_mpu.c | 7 +++
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index b7a2225c25..61f24b5848 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -195,6 +195,7 @@ struct init_info
 enum mpu_section_info {
 MSINFO_GUEST,
 MSINFO_DEVICE,
+MSINFO_BOOTMODULE,
 MSINFO_MAX
 };
 
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 1566ba60af..ea64aa38e4 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -74,6 +74,7 @@ struct page_info *frame_table;
 static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
 REGION_HYPERVISOR_SWITCH,
 REGION_HYPERVISOR_NOCACHE,
+REGION_HYPERVISOR_BOOT,
 };
 
 /* Write a MPU protection region */
@@ -686,7 +687,6 @@ void __init setup_static_mappings(void)
 #endif
 map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]);
 }
-/* TODO: boot-module section, etc */
 }
 
 /* Map a frame table to cover physical addresses ps through pe */
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index ec05542f68..160934bf86 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -30,6 +30,7 @@
 const char *mpu_section_info_str[MSINFO_MAX] = {
 "mpu,guest-memory-section",
 "mpu,device-memory-section",
+"mpu,boot-module-section",
 };
 
 /*
@@ -52,6 +53,12 @@ struct mpuinfo __initdata mpuinfo;
  * "mpu,device-memory-section": this section draws the device memory layout
  * with the least number of memory regions for all devices in system that will
  * be used in Xen, like `UART`, `GIC`, etc.
+ *
+ * "mpu,boot-module-section": this property uses one big memory section or
+ * very few memory sections to describe all guests' boot mudules. Users shall
+ * make sure that any guest boot module defined in Device Tree is within
+ * the section, including kernel module(BOOTMOD_KERNEL), device tree
+ * passthrough module(BOOTMOD_GUEST_DTB), and ramdisk module(BOOTMOD_RAMDISK).
  */
 static int __init process_mpu_memory_section(const void *fdt, int node,
  const char *name, void *data,
-- 
2.25.1

[PATCH v2 25/40] xen/mpu: map MPU guest memory section before static memory initialization

2023-01-12 Thread Penny Zheng

Previous commit introduces a new device tree property
"mpu,guest-memory-section" to define MPU guest memory section, which
will mitigate the scattering of statically-configured guest RAM.

We only need to set up MPU memory region mapping for MPU guest memory section
to have access to all guest RAM.
And this should happen before static memory 
initialization(init_staticmem_pages())

MPU memory region for MPU guest memory secction gets switched out when
idle vcpu leaving, to avoid region overlapping if the vcpu enters into guest
mode later. On the contrary, it gets switched in when idle vcpu entering.
We introduce a bit in region "region.prlar.sw"(struct pr_t region) to
indicate this kind of feature.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/mpu.h | 14 ++---
 xen/arch/arm/mm_mpu.c| 47 +---
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index b85e420a90..0044bbf05d 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -45,22 +45,26 @@
  * [3:4] Execute Never
  * [5:6] Access Permission
  * [7]   Region Present
- * [8]   Boot-only Region
+ * [8:9] 0b00: Fixed Region; 0b01: Boot-only Region;
+ *   0b10: Region needs switching out/in during vcpu context switch;
  */
 #define _REGION_AI_BIT0
 #define _REGION_XN_BIT3
 #define _REGION_AP_BIT5
 #define _REGION_PRESENT_BIT   7
-#define _REGION_BOOTONLY_BIT  8
+#define _REGION_TRANSIENT_BIT 8
 #define _REGION_XN(2U << _REGION_XN_BIT)
 #define _REGION_RO(2U << _REGION_AP_BIT)
 #define _REGION_PRESENT   (1U << _REGION_PRESENT_BIT)
-#define _REGION_BOOTONLY  (1U << _REGION_BOOTONLY_BIT)
+#define _REGION_BOOTONLY  (1U << _REGION_TRANSIENT_BIT)
+#define _REGION_SWITCH(2U << _REGION_TRANSIENT_BIT)
 #define REGION_AI_MASK(x) (((x) >> _REGION_AI_BIT) & 0x7U)
 #define REGION_XN_MASK(x) (((x) >> _REGION_XN_BIT) & 0x3U)
 #define REGION_AP_MASK(x) (((x) >> _REGION_AP_BIT) & 0x3U)
 #define REGION_RO_MASK(x) (((x) >> _REGION_AP_BIT) & 0x2U)
 #define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
+#define REGION_SWITCH_MASK(x) (((x) >> _REGION_TRANSIENT_BIT) & 0x2U)
+#define REGION_TRANSIENT_MASK(x)  (((x) >> _REGION_TRANSIENT_BIT) & 0x3U)
 
 /*
  * _REGION_NORMAL is convenience define. It is not meant to be used
@@ -73,6 +77,7 @@
 
 #define REGION_HYPERVISOR REGION_HYPERVISOR_RW
 #define REGION_HYPERVISOR_BOOT(REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
+#define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
 
 #define INVALID_REGION(~0UL)
 
@@ -98,7 +103,8 @@ typedef union {
 unsigned long ns:1; /* Not-Secure */
 unsigned long res:1;/* Reserved 0 by hardware */
 unsigned long limit:42; /* Limit Address */
-unsigned long pad:16;
+unsigned long pad:15;
+unsigned long sw:1; /* Region gets switched out/in during vcpu 
context switch? */
 } reg;
 uint64_t bits;
 } prlar_t;
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 7b282be4fb..d2e19e836c 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -71,6 +71,10 @@ static paddr_t dtb_paddr;
 
 struct page_info *frame_table;
 
+static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
+REGION_HYPERVISOR_SWITCH,
+};
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({   \
 uint64_t _sel = sel;\
@@ -414,10 +418,13 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t 
limit,
 if ( system_state <= SYS_STATE_active )
 {
 /*
- * If it is a boot-only region (i.e. region for early FDT),
- * it shall be added from the tail for late init re-organizing
+ * If it is a transient region, including boot-only region
+ * (i.e. region for early FDT), and region which needs switching
+ * in/out during vcpu context switch(i.e. region for guest memory
+ * section), it shall be added from the tail for late init
+ * re-organizing
  */
-if ( REGION_BOOTONLY_MASK(flags) )
+if ( REGION_TRANSIENT_MASK(flags) )
 idx = next_transient_region_idx;
 else
 idx = next_fixed_region_idx;
@@ -427,6 +434,13 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t 
limit,
 /* Set permission */
 xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
 xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
+/*
+ * Bit sw indicates that region gets switched out when idle vcpu
+ * leaving hypervisor

[PATCH] xen/mpu: make Xen boot to idle on MPU systems(DNM)

2023-01-12 Thread Penny Zheng

From: Wei Chen 

As we have not implmented guest support in part#1 series of MPU
support, Xen can not create any guest in boot time. So in this
patch we make Xen boot to idle on MPU system for reviewer to
test part#1 series.

THIS PATCH IS ONLY FOR TESTING, NOT FOR REVIEWING.

Signed-off-by: Wei Chen 
---
 xen/arch/arm/mm_mpu.c |  3 +++
 xen/arch/arm/setup.c  | 21 -
 xen/arch/arm/traps.c  |  2 ++
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 434ed872c1..73d5779ab4 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -32,6 +32,9 @@
 #include 
 #include 
 
+/* Non-boot CPUs use this to find the correct pagetables. */
+uint64_t init_ttbr;
+
 #ifdef NDEBUG
 static inline void
 __attribute__ ((__format__ (__printf__, 1, 2)))
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index b21fc4b8e2..d04ad8f838 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -803,16 +803,19 @@ void __init start_xen(unsigned long boot_phys_offset,
 #endif
 enable_cpu_features();
 
-/* Create initial domain 0. */
-if ( !is_dom0less_mode() )
-create_dom0();
-else
-printk(XENLOG_INFO "Xen dom0less mode detected\n");
-
-if ( acpi_disabled )
+if ( !IS_ENABLED(CONFIG_ARM_V8R) )
 {
-create_domUs();
-alloc_static_evtchn();
+/* Create initial domain 0. */
+if ( !is_dom0less_mode() )
+create_dom0();
+else
+printk(XENLOG_INFO "Xen dom0less mode detected\n");
+
+if ( acpi_disabled )
+{
+create_domUs();
+alloc_static_evtchn();
+}
 }
 
 /*
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 061c92acbd..2444f7f6d8 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -963,7 +963,9 @@ void vcpu_show_registers(const struct vcpu *v)
 ctxt.ifsr32_el2 = v->arch.ifsr;
 #endif
 
+#ifndef CONFIG_HAS_MPU
 ctxt.vttbr_el2 = v->domain->arch.p2m.vttbr;
+#endif
 
 _show_registers(>arch.cpu_info->guest_cpu_user_regs, , 1, v);
 }
-- 
2.25.1

[PATCH v2 12/40] xen/mpu: introduce helpers for MPU enablement

2023-01-12 Thread Penny Zheng

We need a new helper for Xen to enable MPU in boot-time.
The new helper is semantically consistent with the original enable_mmu.

If the Background region is enabled, then the MPU uses the default memory
map as the Background region for generating the memory
attributes when MPU is disabled.
Since the default memory map of the Armv8-R AArch64 architecture is
IMPLEMENTATION DEFINED, we always turn off the Background region.

In this patch, we also introduce a neutral name enable_mm for
Xen to enable MMU/MPU. This can help us to keep one code flow
in head.S

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/arm64/head.S |  5 +++--
 xen/arch/arm/arm64/head_mmu.S |  4 ++--
 xen/arch/arm/arm64/head_mpu.S | 19 +++
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 145e3d53dc..7f3f973468 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -258,7 +258,8 @@ real_start_efi:
  * and memory regions for MPU systems.
  */
 blprepare_early_mappings
-blenable_mmu
+/* Turn on MMU or MPU */
+blenable_mm
 
 /* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
 ldr   x0, =primary_switched
@@ -316,7 +317,7 @@ GLOBAL(init_secondary)
 blcheck_cpu_mode
 blcpu_init
 blprepare_early_mappings
-blenable_mmu
+blenable_mm
 
 /* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
 ldr   x0, =secondary_switched
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index 2346f755df..b59c40495f 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -217,7 +217,7 @@ ENDPROC(prepare_early_mappings)
  *
  * Clobbers x0 - x3
  */
-ENTRY(enable_mmu)
+ENTRY(enable_mm)
 PRINT("- Turning on paging -\r\n")
 
 /*
@@ -239,7 +239,7 @@ ENTRY(enable_mmu)
 msr   SCTLR_EL2, x0  /* now paging is enabled */
 isb  /* Now, flush the icache */
 ret
-ENDPROC(enable_mmu)
+ENDPROC(enable_mm)
 
 /*
  * Remove the 1:1 map from the page-tables. It is not easy to keep track
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
index 0b97ce4646..e2ac69b0cc 100644
--- a/xen/arch/arm/arm64/head_mpu.S
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -315,6 +315,25 @@ ENDPROC(prepare_early_mappings)
 
 GLOBAL(_end_boot)
 
+/*
+ * Enable EL2 MPU and data cache
+ * If the Background region is enabled, then the MPU uses the default memory
+ * map as the Background region for generating the memory
+ * attributes when MPU is disabled.
+ * Since the default memory map of the Armv8-R AArch64 architecture is
+ * IMPLEMENTATION DEFINED, we intend to turn off the Background region here.
+ */
+ENTRY(enable_mm)
+mrs   x0, SCTLR_EL2
+orr   x0, x0, #SCTLR_Axx_ELx_M/* Enable MPU */
+orr   x0, x0, #SCTLR_Axx_ELx_C/* Enable D-cache */
+orr   x0, x0, #SCTLR_Axx_ELx_WXN  /* Enable WXN */
+dsb   sy
+msr   SCTLR_EL2, x0
+isb
+ret
+ENDPROC(enable_mm)
+
 /*
  * Local variables:
  * mode: ASM
-- 
2.25.1

[PATCH v2 30/40] xen/mpu: disable VMAP sub-system for MPU systems

2023-01-12 Thread Penny Zheng

VMAP in MMU system, is used to remap a range of normal memory
or device memory to another virtual address with new attributes
for specific purpose, like ALTERNATIVE feature. Since there is
no virtual address translation support in MPU system, we can
not support VMAP in MPU system.

So in this patch, we disable VMAP for MPU systems, and some
features depending on VMAP also need to be disabled at the same
time, Like ALTERNATIVE, CPU ERRATA.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/Kconfig   |  3 +-
 xen/arch/arm/Makefile  |  2 +-
 xen/arch/arm/include/asm/alternative.h | 15 +
 xen/arch/arm/include/asm/cpuerrata.h   | 12 
 xen/arch/arm/setup.c   |  7 +++
 xen/arch/x86/Kconfig   |  1 +
 xen/common/Kconfig |  3 +
 xen/common/Makefile|  2 +-
 xen/include/xen/vmap.h | 81 --
 9 files changed, 119 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index c6b6b612d1..9230c8b885 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -11,12 +11,13 @@ config ARM_64
 
 config ARM
def_bool y
-   select HAS_ALTERNATIVE
+   select HAS_ALTERNATIVE if !ARM_V8R
select HAS_DEVICE_TREE
select HAS_PASSTHROUGH
select HAS_PDX
select HAS_PMAP
select IOMMU_FORCE_PT_SHARE
+   select HAS_VMAP if !ARM_V8R
 
 config ARCH_DEFCONFIG
string
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 23dfbc..c949661590 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -10,7 +10,7 @@ obj-$(CONFIG_HAS_VPCI) += vpci.o
 
 obj-$(CONFIG_HAS_ALTERNATIVE) += alternative.o
 obj-y += bootfdt.init.o
-obj-y += cpuerrata.o
+obj-$(CONFIG_HAS_ALTERNATIVE) += cpuerrata.o
 obj-y += cpufeature.o
 obj-y += decode.o
 obj-y += device.o
diff --git a/xen/arch/arm/include/asm/alternative.h 
b/xen/arch/arm/include/asm/alternative.h
index 1eb4b60fbb..bc23d1d34f 100644
--- a/xen/arch/arm/include/asm/alternative.h
+++ b/xen/arch/arm/include/asm/alternative.h
@@ -8,6 +8,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include 
 #include 
 #include 
 
@@ -28,8 +29,22 @@ typedef void (*alternative_cb_t)(const struct alt_instr *alt,
 const uint32_t *origptr, uint32_t *updptr,
 int nr_inst);
 
+#ifdef CONFIG_HAS_ALTERNATIVE
 void apply_alternatives_all(void);
 int apply_alternatives(const struct alt_instr *start, const struct alt_instr 
*end);
+#else
+static inline void apply_alternatives_all(void)
+{
+ASSERT_UNREACHABLE();
+}
+
+static inline int apply_alternatives(const struct alt_instr *start,
+ const struct alt_instr *end)
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+#endif /* !CONFIG_HAS_ALTERNATIVE */
 
 #define ALTINSTR_ENTRY(feature, cb)  \
" .word 661b - .\n" /* label   */ \
diff --git a/xen/arch/arm/include/asm/cpuerrata.h 
b/xen/arch/arm/include/asm/cpuerrata.h
index 8d7e7b9375..5d97f33763 100644
--- a/xen/arch/arm/include/asm/cpuerrata.h
+++ b/xen/arch/arm/include/asm/cpuerrata.h
@@ -4,8 +4,20 @@
 #include 
 #include 
 
+#ifdef CONFIG_HAS_ALTERNATIVE
 void check_local_cpu_errata(void);
 void enable_errata_workarounds(void);
+#else
+static inline void check_local_cpu_errata(void)
+{
+ASSERT_UNREACHABLE();
+}
+
+static inline void enable_errata_workarounds(void)
+{
+ASSERT_UNREACHABLE();
+}
+#endif /* !CONFIG_HAS_ALTERNATIVE */
 
 #define CHECK_WORKAROUND_HELPER(erratum, feature, arch) \
 static inline bool check_workaround_##erratum(void) \
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3ebf9e9a5c..0eac33e68c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -721,7 +721,9 @@ void __init start_xen(unsigned long boot_phys_offset,
  */
 system_state = SYS_STATE_boot;
 
+#ifdef CONFIG_HAS_VMAP
 vm_init();
+#endif
 
 if ( acpi_disabled )
 {
@@ -753,11 +755,13 @@ void __init start_xen(unsigned long boot_phys_offset,
 nr_cpu_ids = smp_get_max_cpus();
 printk(XENLOG_INFO "SMP: Allowing %u CPUs\n", nr_cpu_ids);
 
+#ifdef CONFIG_HAS_ALTERNATIVE
 /*
  * Some errata relies on SMCCC version which is detected by psci_init()
  * (called from smp_init_cpus()).
  */
 check_local_cpu_errata();
+#endif
 
 check_local_cpu_features();
 
@@ -824,12 +828,15 @@ void __init start_xen(unsigned long boot_phys_offset,
 
 do_initcalls();
 
+
+#ifdef CONFIG_HAS_ALTERNATIVE
 /*
  * It needs to be called after do_initcalls to be able to use
  * stop_machine (tasklets initialized via an initcall).
  */
 apply_alternatives_all();
 enable_errata_workarounds();
+#endif
 enable_cpu_features();
 
 /* Create initial domain 0. */
diff --git a/xen/arch/x86/Kconfig

[PATCH v2 21/40] xen/arm: move MMU-specific setup_mm to setup_mmu.c

2023-01-12 Thread Penny Zheng

setup_mm is used for Xen to setup memory management subsystem, like boot
allocator, direct-mapping, xenheap, frametable and static memory pages.
We could inherit some components seamlessly in MPU system like
boot allocator, whilst we need to implement some components differently
in MPU, like xenheap, and some components could not be applied in MPU system,
like direct-mapping.

In the commit, we move setup_mm and its related functions and
variables to setup_mmu.c in preparation of implementing MPU
version of setup_mm later in future commits

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/Makefile|   3 +
 xen/arch/arm/include/asm/setup.h |   5 +
 xen/arch/arm/setup.c | 326 +---
 xen/arch/arm/setup_mmu.c | 350 +++
 4 files changed, 362 insertions(+), 322 deletions(-)
 create mode 100644 xen/arch/arm/setup_mmu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 21188b207f..adeb17b7ab 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -51,6 +51,9 @@ obj-y += physdev.o
 obj-y += processor.o
 obj-y += psci.o
 obj-y += setup.o
+ifneq ($(CONFIG_HAS_MPU), y)
+obj-y += setup_mmu.o
+endif
 obj-y += shutdown.o
 obj-y += smp.o
 obj-y += smpboot.o
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 4f39a1aa0a..8f353b67f8 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -158,6 +158,11 @@ struct bootcmdline 
*boot_cmdline_find_by_kind(bootmodule_kind kind);
 struct bootcmdline * boot_cmdline_find_by_name(const char *name);
 const char *boot_module_kind_as_string(bootmodule_kind kind);
 
+extern void init_pdx(void);
+extern void init_staticmem_pages(void);
+extern void populate_boot_allocator(void);
+extern void setup_mm(void);
+
 extern uint32_t hyp_traps_vector[];
 void init_traps(void);
 
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index d7d200179c..3ebf9e9a5c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -2,7 +2,7 @@
 /*
  * xen/arch/arm/setup.c
  *
- * Early bringup code for an ARMv7-A with virt extensions.
+ * Early bringup code for an ARMv7-A/ARM64v8R with virt extensions.
  *
  * Tim Deegan 
  * Copyright (c) 2011 Citrix Systems.
@@ -57,11 +57,6 @@ struct cpuinfo_arm __read_mostly system_cpuinfo;
 bool __read_mostly acpi_disabled;
 #endif
 
-#ifdef CONFIG_ARM_32
-static unsigned long opt_xenheap_megabytes __initdata;
-integer_param("xenheap_megabytes", opt_xenheap_megabytes);
-#endif
-
 domid_t __read_mostly max_init_domid;
 
 static __used void init_done(void)
@@ -455,138 +450,6 @@ static void * __init relocate_fdt(paddr_t dtb_paddr, 
size_t dtb_size)
 return fdt;
 }
 
-#ifdef CONFIG_ARM_32
-/*
- * Returns the end address of the highest region in the range s..e
- * with required size and alignment that does not conflict with the
- * modules from first_mod to nr_modules.
- *
- * For non-recursive callers first_mod should normally be 0 (all
- * modules and Xen itself) or 1 (all modules but not Xen).
- */
-static paddr_t __init consider_modules(paddr_t s, paddr_t e,
-   uint32_t size, paddr_t align,
-   int first_mod)
-{
-const struct bootmodules *mi = 
-int i;
-int nr;
-
-s = (s+align-1) & ~(align-1);
-e = e & ~(align-1);
-
-if ( s > e ||  e - s < size )
-return 0;
-
-/* First check the boot modules */
-for ( i = first_mod; i < mi->nr_mods; i++ )
-{
-paddr_t mod_s = mi->module[i].start;
-paddr_t mod_e = mod_s + mi->module[i].size;
-
-if ( s < mod_e && mod_s < e )
-{
-mod_e = consider_modules(mod_e, e, size, align, i+1);
-if ( mod_e )
-return mod_e;
-
-return consider_modules(s, mod_s, size, align, i+1);
-}
-}
-
-/* Now check any fdt reserved areas. */
-
-nr = fdt_num_mem_rsv(device_tree_flattened);
-
-for ( ; i < mi->nr_mods + nr; i++ )
-{
-paddr_t mod_s, mod_e;
-
-if ( fdt_get_mem_rsv(device_tree_flattened,
- i - mi->nr_mods,
- _s, _e ) < 0 )
-/* If we can't read it, pretend it doesn't exist... */
-continue;
-
-/* fdt_get_mem_rsv returns length */
-mod_e += mod_s;
-
-if ( s < mod_e && mod_s < e )
-{
-mod_e = consider_modules(mod_e, e, size, align, i+1);
-if ( mod_e )
-return mod_e;
-
-return consider_modules(s, mod_s, size, align, i+1);
-}
-}
-
-/*
- * i is the current bootmodule we are evaluating, across all
- * possible kinds of bootmodules.
- *
- * When retrieving the corresponding reserved-memory addresses, we
- * need to index the bootinfo.reserved_mem bank starting from 0, and
- * only counting the reserved-memory modules.

[PATCH v2 33/40] xen/arm: check mapping status and attributes for MPU copy_from_paddr

2023-01-12 Thread Penny Zheng

From: Wei Chen 

We introduce map_page_to_xen_misc/unmap_page_to_xen_misc to temporarily
map a page in Xen misc field to gain access, however, in MPU system,
all resource is statically configured in Device Tree and already mapped
at very early boot stage.

When enabling map_page_to_xen_misc for copy_from_paddr in MPU system,
we need to check whether a given paddr is properly mapped.

Signed-off-by: Wei Chen 
Signed-off-by: Penny Zheng 
---
 xen/arch/arm/kernel.c |  2 +-
 xen/arch/arm/mm_mpu.c | 21 +
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index ee7144ec13..ce2b3347d7 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -57,7 +57,7 @@ void __init copy_from_paddr(void *dst, paddr_t paddr, 
unsigned long len)
 s = paddr & (PAGE_SIZE - 1);
 l = min(PAGE_SIZE - s, len);
 
-src = map_page_to_xen_misc(maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+src = map_page_to_xen_misc(maddr_to_mfn(paddr), 
DEFINE_ATTRIBUTE(HYPERVISOR_WC));
 ASSERT(src != NULL);
 memcpy(dst, src + s, l);
 clean_dcache_va_range(dst, l);
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 7b54c87acf..0b720004ee 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -790,6 +790,27 @@ void *ioremap(paddr_t pa, size_t len)
 return ioremap_attr(pa, len, REGION_HYPERVISOR_NOCACHE);
 }
 
+/*
+ * In MPU system, due to limited MPU memory regions, all resource is statically
+ * configured in Device Tree and mapped at very early stage, dynamic temporary
+ * page mapping is not allowed.
+ * So in map_page_to_xen_misc, we need to check if page is already properly
+ * mapped with #attributes.
+ */
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
+{
+paddr_t pa = mfn_to_maddr(mfn);
+
+if ( !check_region_and_attributes(pa, PAGE_SIZE, attributes, 
"map_to_misc") )
+return NULL;
+
+return maddr_to_virt(pa);
+}
+
+void unmap_page_from_xen_misc(void)
+{
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
-- 
2.25.1

[PATCH v2 39/40] xen/mpu: re-order xen_mpumap in arch_init_finialize

2023-01-12 Thread Penny Zheng

In function init_done, we have finished the booting and we do the final
clean-up working, including marking the section .data.ro_after_init
read-only, free init text and init data section, etc.

In MPU system, other than above operations, we also need to re-order
Xen MPU memory region mapping table(xen_mpumap).

In xen_mpumap, we have two type MPU memory region: fixed memory region
and switching memory region.
Fixed memory region are referring to the regions which won't change
since birth, like Xen .text section, while switching region(i.e. device memory)
are regions that gets switched out when idle vcpu leaving hypervisor mode,
and gets switched in when idle vcpu entering hypervisor mode. They were added
at tail during the boot stage.
To save the trouble of hunting down each switching region in time-sensitive
context switch, we re-order xen_mpumap to keep fixed regions still in the
front, and switching ones in the heels of them.

We define a MPU memory region mapping table(sw_mpumap) to store all
switching regions. After disabling them at its original position, we
re-enable them at re-ordering position.

Signed-off-by: Penny Zheng 
---
 xen/arch/arm/include/asm/arm64/mpu.h |   5 ++
 xen/arch/arm/include/asm/mm_mpu.h|   1 +
 xen/arch/arm/include/asm/setup.h |   2 +
 xen/arch/arm/mm_mpu.c| 110 +++
 xen/arch/arm/setup.c |  13 +---
 xen/arch/arm/setup_mmu.c |  16 
 xen/arch/arm/setup_mpu.c |  20 +
 7 files changed, 155 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index b4e50a9a0e..e058f36435 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -155,6 +155,11 @@ typedef struct {
 (uint64_t)((_pr->prlar.reg.limit << MPU_REGION_SHIFT) | 0x3f); \
 })
 
+#define region_needs_switching_on_ctxt(pr) ({   \
+pr_t *_pr = pr; \
+_pr->prlar.reg.sw;  \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM64_MPU_H__ */
diff --git a/xen/arch/arm/include/asm/mm_mpu.h 
b/xen/arch/arm/include/asm/mm_mpu.h
index 5aa61c43b6..f8f54eb901 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -10,6 +10,7 @@
  * section by section based on static configuration in Device Tree.
  */
 extern void setup_static_mappings(void);
+extern int reorder_xen_mpumap(void);
 
 extern struct page_info *frame_table;
 
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index d4c1336597..39cd95553d 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -182,6 +182,8 @@ int map_range_to_domain(const struct dt_device_node *dev,
 
 extern const char __ro_after_init_start[], __ro_after_init_end[];
 
+extern void arch_init_finialize(void);
+
 struct init_info
 {
 /* Pointer to the stack, used by head.S when entering in C */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 118bb11d1a..434ed872c1 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -80,6 +80,25 @@ static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
 
 extern char __init_data_begin[], __init_end[];
 
+/*
+ * MPU memory mapping table records regions that needs switching in/out
+ * during vcpu context switch
+ */
+static pr_t *sw_mpumap;
+static uint64_t nr_sw_mpumap;
+
+/*
+ * After reordering, nr_xen_mpumap records number of regions for Xen fixed
+ * memory mapping
+ */
+static uint64_t nr_xen_mpumap;
+
+/*
+ * After reordering, nr_cpu_mpumap records number of EL2 valid
+ * MPU memory regions
+ */
+static uint64_t nr_cpu_mpumap;
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({   \
 uint64_t _sel = sel;\
@@ -847,6 +866,97 @@ void unmap_page_from_xen_misc(void)
 {
 }
 
+void dump_hyp_mapping(void)
+{
+uint64_t i = 0;
+pr_t region;
+
+for ( i = 0; i < nr_cpu_mpumap; i++ )
+{
+access_protection_region(true, , NULL, i);
+printk(XENLOG_INFO
+   "MPU memory region [%lu]: 0x%"PRIpaddr" - 0x%"PRIpaddr".\n",
+   i, pr_get_base(), pr_get_limit());
+}
+}
+
+/* Standard entry to dynamically allocate MPU memory region mapping table. */
+static pr_t *alloc_mpumap(void)
+{
+pr_t *map;
+
+/*
+ * A MPU memory region structure(pr_t) takes 16 bytes, even with maximum
+ * supported MPU protection regions in EL2, 255, MPU table at most takes up
+ * less than 4KB(PAGE_SIZE).
+ */
+map = alloc_xenheap_pages(0, 0);
+if ( map == NULL )
+return NULL;
+
+clear_page(map);
+return map;
+}
+
+/*
+ * Switching region(i.e. device memory) are regions that gets switched out
+ * when idle vcpu leaving hypervisor mode, and gets switched in when idle vcpu
+ * entering

[PATCH v2 38/40] xen/mpu: implement setup_virt_paging for MPU system

2023-01-12 Thread Penny Zheng

For MMU system setup_virt_paging is used to configure stage 2 address
translation, like IPA bits, VMID bits, etc. And this function is also doing the
VMID allocator initializtion for later VM creation.

Except for IPA bits and VMID bits, the setup_virt_paging function in MPU
system should be also responsible for determining the default EL1/EL0
translation regime.
ARMv8-R AArch64 could have the following memory translation regime:
- PMSAv8-64 at both EL1/EL0 and EL2
- PMSAv8-64 or VMSAv8-64 at EL1/EL0 and PMSAv8-64 at EL2
The default value will be VMSAv8-64, unless the platform could not support,
which could be checked against MSA_frac bit in Memory Model Feature Register 0(
ID_AA64MMFR0_EL1)

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/sysregs.h |  6 ++
 xen/arch/arm/include/asm/cpufeature.h|  7 ++
 xen/arch/arm/include/asm/p2m.h   | 18 +
 xen/arch/arm/include/asm/processor.h | 13 
 xen/arch/arm/p2m.c   | 28 
 xen/arch/arm/p2m_mmu.c   | 38 --
 xen/arch/arm/p2m_mpu.c   | 91 ++--
 7 files changed, 159 insertions(+), 42 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h 
b/xen/arch/arm/include/asm/arm64/sysregs.h
index 9546e8e3d0..7d4f959dae 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -507,6 +507,12 @@
 /* MPU Protection Region Enable Register encode */
 #define PRENR_EL2 S3_4_C6_C1_1
 
+/* Virtualization Secure Translation Control Register */
+#define VSTCR_EL2  S3_4_C2_C6_2
+#define VSTCR_EL2_RES1_SHIFT 31
+#define VSTCR_EL2_SA_SHIFT   30
+#define VSTCR_EL2_SC_SHIFT   20
+
 #endif
 
 #ifdef CONFIG_ARM_SECURE_STATE
diff --git a/xen/arch/arm/include/asm/cpufeature.h 
b/xen/arch/arm/include/asm/cpufeature.h
index c62cf6293f..513e5b9918 100644
--- a/xen/arch/arm/include/asm/cpufeature.h
+++ b/xen/arch/arm/include/asm/cpufeature.h
@@ -244,6 +244,12 @@ struct cpuinfo_arm {
 unsigned long tgranule_16K:4;
 unsigned long tgranule_64K:4;
 unsigned long tgranule_4K:4;
+#ifdef CONFIG_ARM_V8R
+unsigned long __res:16;
+unsigned long msa:4;
+unsigned long msa_frac:4;
+unsigned long __res0:8;
+#else
 unsigned long tgranule_16k_2:4;
 unsigned long tgranule_64k_2:4;
 unsigned long tgranule_4k_2:4;
@@ -251,6 +257,7 @@ struct cpuinfo_arm {
 unsigned long __res0:8;
 unsigned long fgt:4;
 unsigned long ecv:4;
+#endif
 
 /* MMFR1 */
 unsigned long hafdbs:4;
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index a430aca232..cd28a9091a 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -14,9 +14,27 @@
 /* Holds the bit size of IPAs in p2m tables.  */
 extern unsigned int p2m_ipa_bits;
 
+#define MAX_VMID_8_BIT  (1UL << 8)
+#define MAX_VMID_16_BIT (1UL << 16)
+
+#define INVALID_VMID 0 /* VMID 0 is reserved */
+
+#ifdef CONFIG_ARM_64
+extern unsigned int max_vmid;
+/* VMID is by default 8 bit width on AArch64 */
+#define MAX_VMID   max_vmid
+#else
+/* VMID is always 8 bit width on AArch32 */
+#define MAX_VMIDMAX_VMID_8_BIT
+#endif
+
+extern spinlock_t vmid_alloc_lock;
+extern unsigned long *vmid_mask;
+
 struct domain;
 
 extern void memory_type_changed(struct domain *);
+extern void p2m_vmid_allocator_init(void);
 
 /* Per-p2m-table state */
 struct p2m_domain {
diff --git a/xen/arch/arm/include/asm/processor.h 
b/xen/arch/arm/include/asm/processor.h
index 1dd81d7d52..d866421d88 100644
--- a/xen/arch/arm/include/asm/processor.h
+++ b/xen/arch/arm/include/asm/processor.h
@@ -388,6 +388,12 @@
 
 #define VTCR_RES1   (_AC(1,UL)<<31)
 
+#ifdef CONFIG_ARM_V8R
+#define VTCR_MSA_VMSA   (_AC(0x1,UL)<<31)
+#define VTCR_MSA_PMSA   ~(_AC(0x1,UL)<<31)
+#define NSA_SEL2~(_AC(0x1,UL)<<30)
+#endif
+
 /* HCPTR Hyp. Coprocessor Trap Register */
 #define HCPTR_TAM   ((_AC(1,U)<<30))
 #define HCPTR_TTA   ((_AC(1,U)<<20))/* Trap trace registers */
@@ -447,6 +453,13 @@
 #define MM64_VMID_16_BITS_SUPPORT   0x2
 #endif
 
+#ifdef CONFIG_ARM_V8R
+#define MM64_MSA_PMSA_SUPPORT   0xf
+#define MM64_MSA_FRAC_NONE_SUPPORT  0x0
+#define MM64_MSA_FRAC_PMSA_SUPPORT  0x1
+#define MM64_MSA_FRAC_VMSA_SUPPORT  0x2
+#endif
+
 #ifndef __ASSEMBLY__
 
 extern register_t __cpu_logical_map[];
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 42f51051e0..0d0063aa2e 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -4,6 +4,21 @@
 
 #include 
 #include 
+#include 
+
+#ifdef CONFIG_ARM_64
+unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
+#endif
+
+spinlock_t vmid_alloc_lock = SPIN_LOCK_UNLOCKED;
+
+/*
+ * VTTBR_EL2 VMID field is 8 or 16 bits. AArch64 may support 16-bit VMID.
+ * Using a bitmap here limits us to 256 or 65536 (for AArch64)

[PATCH v2 29/40] xen/mpu: introduce mpu_memory_section_contains for address range check

2023-01-12 Thread Penny Zheng

We have already introduced "mpu,xxx-memory-section" to limit system/domain
configuration, so we shall add check to verfify user's configuration.

We shall check if any guest boot module is within the boot module section,
including kernel module(BOOTMOD_KERNEL), device tree
passthrough module(BOOTMOD_GUEST_DTB), and ramdisk module(BOOTMOD_RAMDISK).

We also shall check if any guest RAM through "xen,static-mem" is within
the guest memory section.

Function mpu_memory_section_contains is introduced to do above check.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/domain_build.c  |  4 
 xen/arch/arm/include/asm/setup.h |  2 ++
 xen/arch/arm/kernel.c| 18 ++
 xen/arch/arm/setup_mpu.c | 22 ++
 4 files changed, 46 insertions(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 829cea8de8..f48a3f679f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -546,6 +546,10 @@ static mfn_t __init acquire_static_memory_bank(struct 
domain *d,
d, *psize);
 return INVALID_MFN;
 }
+#ifdef CONFIG_HAS_MPU
+if ( !mpu_memory_section_contains(*pbase, *pbase + *psize, MSINFO_GUEST) )
+return INVALID_MFN;
+#endif
 
 smfn = maddr_to_mfn(*pbase);
 res = acquire_domstatic_pages(d, smfn, PFN_DOWN(*psize), 0);
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 61f24b5848..d4c1336597 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -209,6 +209,8 @@ extern struct mpuinfo mpuinfo;
 
 extern int process_mpuinfo(const void *fdt, int node, uint32_t address_cells,
uint32_t size_cells);
+extern bool mpu_memory_section_contains(paddr_t s, paddr_t e,
+enum mpu_section_info type);
 #endif /* CONFIG_HAS_MPU */
 
 #endif
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 0475d8fae7..ee7144ec13 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -467,6 +467,12 @@ int __init kernel_probe(struct kernel_info *info,
 mod = boot_module_find_by_addr_and_kind(
 BOOTMOD_KERNEL, kernel_addr);
 info->kernel_bootmodule = mod;
+#ifdef CONFIG_HAS_MPU
+if ( !mpu_memory_section_contains(mod->start,
+  mod->start + mod->size,
+  MSINFO_BOOTMODULE) )
+return -EINVAL;
+#endif
 }
 else if ( dt_device_is_compatible(node, "multiboot,ramdisk") )
 {
@@ -477,6 +483,12 @@ int __init kernel_probe(struct kernel_info *info,
 dt_get_range(, node, _addr, );
 info->initrd_bootmodule = boot_module_find_by_addr_and_kind(
 BOOTMOD_RAMDISK, initrd_addr);
+#ifdef CONFIG_HAS_MPU
+if ( !mpu_memory_section_contains(mod->start,
+  mod->start + mod->size,
+  MSINFO_BOOTMODULE) )
+return -EINVAL;
+#endif
 }
 else if ( dt_device_is_compatible(node, "multiboot,device-tree") )
 {
@@ -489,6 +501,12 @@ int __init kernel_probe(struct kernel_info *info,
 dt_get_range(, node, _addr, );
 info->dtb_bootmodule = boot_module_find_by_addr_and_kind(
 BOOTMOD_GUEST_DTB, dtb_addr);
+#ifdef CONFIG_HAS_MPU
+if ( !mpu_memory_section_contains(mod->start,
+  mod->start + mod->size,
+  MSINFO_BOOTMODULE) )
+return -EINVAL;
+#endif
 }
 else
 continue;
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index 160934bf86..f7d74ea604 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -130,6 +130,28 @@ void __init setup_mm(void)
 init_staticmem_pages();
 }
 
+bool __init mpu_memory_section_contains(paddr_t s, paddr_t e,
+enum mpu_section_info type)
+{
+unsigned int i = 0;
+
+for ( ; i < mpuinfo.sections[type].nr_banks; i++ )
+{
+paddr_t section_start = mpuinfo.sections[type].bank[i].start;
+paddr_t section_size = mpuinfo.sections[type].bank[i].size;
+paddr_t section_end = section_start + section_size;
+
+/* range inclusive */
+if ( s >= section_start && e <= section_end )
+return true;
+}
+
+printk(XENLOG_ERR
+   "mpu: invalid range configuration 0x%"PRIpaddr" - 0x%"PRIpaddr", 
and it shall be within %s\n",
+   s, e, mpu_section_info_str[i]);
+return false;
+}
+
 /*
  * Local variables:
  * mode: C
-- 
2.25.1

[PATCH v2 31/40] xen/mpu: disable FIXMAP in MPU system

2023-01-12 Thread Penny Zheng

FIXMAP in MMU system is used to do special-purpose 4K mapping, like
mapping early UART, temporarily mapping source codes for copy and paste
(copy_from_paddr), ect. As there is no VMSA in MPU system, we do not
support FIXMAP in MPU system.

We deine !CONFIG_HAS_FIXMAP to provide empty stubbers for MPU system

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/Kconfig  |  3 ++-
 xen/arch/arm/include/asm/fixmap.h | 28 +---
 xen/common/Kconfig|  3 +++
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 9230c8b885..91491341c4 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -13,9 +13,10 @@ config ARM
def_bool y
select HAS_ALTERNATIVE if !ARM_V8R
select HAS_DEVICE_TREE
+   select HAS_FIXMAP if !ARM_V8R
select HAS_PASSTHROUGH
select HAS_PDX
-   select HAS_PMAP
+   select HAS_PMAP if !ARM_V8R
select IOMMU_FORCE_PT_SHARE
select HAS_VMAP if !ARM_V8R
 
diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index d0c9a52c8c..f0f4eb57ac 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -4,9 +4,6 @@
 #ifndef __ASM_FIXMAP_H
 #define __ASM_FIXMAP_H
 
-#include 
-#include 
-
 /* Fixmap slots */
 #define FIXMAP_CONSOLE  0  /* The primary UART */
 #define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
@@ -22,6 +19,11 @@
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_HAS_FIXMAP
+
+#include 
+#include 
+
 /*
  * Direct access to xen_fixmap[] should only happen when {set,
  * clear}_fixmap() is unusable (e.g. where we would end up to
@@ -43,6 +45,26 @@ static inline unsigned int virt_to_fix(vaddr_t vaddr)
 return ((vaddr - FIXADDR_START) >> PAGE_SHIFT);
 }
 
+#else /* !CONFIG_HAS_FIXMAP */
+
+static inline void set_fixmap(unsigned int map, mfn_t mfn,
+  unsigned int attributes)
+{
+ASSERT_UNREACHABLE();
+}
+
+static inline void clear_fixmap(unsigned int map)
+{
+ASSERT_UNREACHABLE();
+}
+
+static inline unsigned int virt_to_fix(vaddr_t vaddr)
+{
+ASSERT_UNREACHABLE();
+return -EINVAL;
+}
+#endif /* !CONFIG_HAS_FIXMAP */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ASM_FIXMAP_H */
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index ba16366a4b..680dc6f59c 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -43,6 +43,9 @@ config HAS_EX_TABLE
 config HAS_FAST_MULTIPLY
bool
 
+config HAS_FIXMAP
+   bool
+
 config HAS_IOPORTS
bool
 
-- 
2.25.1

[PATCH v2 26/40] xen/mpu: destroy an existing entry in Xen MPU memory mapping table

2023-01-12 Thread Penny Zheng

This commit expands xen_mpumap_update/xen_mpumap_update_entry to include
destroying an existing entry.

We define a new helper "control_xen_mpumap_region_from_index" to enable/disable
the MPU region based on index. If region is within [0, 31], we could quickly
disable the MPU region through PRENR_EL2 which provides direct access to the
PRLAR_EL2.EN bits of EL2 MPU regions.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/mpu.h | 20 ++
 xen/arch/arm/include/asm/arm64/sysregs.h |  3 +
 xen/arch/arm/mm_mpu.c| 77 ++--
 3 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index 0044bbf05d..c1dea1c8e9 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -16,6 +16,8 @@
  */
 #define ARM_MAX_MPU_MEMORY_REGIONS 255
 
+#define MPU_PRENR_BITS32
+
 /* Access permission attributes. */
 /* Read/Write at EL2, No Access at EL1/EL0. */
 #define AP_RW_EL2 0x0
@@ -132,6 +134,24 @@ typedef struct {
 _pr->prlar.reg.en;  \
 })
 
+/*
+ * Access to get base address of MPU protection region(pr_t).
+ * The base address shall be zero extended.
+ */
+#define pr_get_base(pr) ({  \
+pr_t *_pr = pr; \
+(uint64_t)_pr->prbar.reg.base << MPU_REGION_SHIFT;  \
+})
+
+/*
+ * Access to get limit address of MPU protection region(pr_t).
+ * The limit address shall be concatenated with 0x3f.
+ */
+#define pr_get_limit(pr) ({\
+pr_t *_pr = pr;\
+(uint64_t)((_pr->prlar.reg.limit << MPU_REGION_SHIFT) | 0x3f); \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM64_MPU_H__ */
diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h 
b/xen/arch/arm/include/asm/arm64/sysregs.h
index aca9bca5b1..c46daf6f69 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -505,6 +505,9 @@
 /* MPU Type registers encode */
 #define MPUIR_EL2 S3_4_C0_C0_4
 
+/* MPU Protection Region Enable Register encode */
+#define PRENR_EL2 S3_4_C6_C1_1
+
 #endif
 
 /* Access to system registers */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index d2e19e836c..3a0d110b13 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -385,6 +385,45 @@ static int mpumap_contain_region(pr_t *mpu, uint64_t 
nr_regions,
 return MPUMAP_REGION_FAILED;
 }
 
+/* Disable or enable EL2 MPU memory region at index #index */
+static void control_mpu_region_from_index(uint64_t index, bool enable)
+{
+pr_t region;
+
+access_protection_region(true, , NULL, index);
+if ( (region_is_valid() && enable) ||
+ (!region_is_valid() && !enable) )
+{
+printk(XENLOG_WARNING
+   "mpu: MPU memory region[%lu] is already %s\n", index,
+   enable ? "enabled" : "disabled");
+return;
+}
+
+/*
+ * ARM64v8R provides PRENR_EL2 to have direct access to the
+ * PRLAR_EL2.EN bits of EL2 MPU regions from 0 to 31.
+ */
+if ( index < MPU_PRENR_BITS )
+{
+uint64_t orig, after;
+
+orig = READ_SYSREG(PRENR_EL2);
+if ( enable )
+/* Set respective bit */
+after = orig | (1UL << index);
+else
+/* Clear respective bit */
+after = orig & (~(1UL << index));
+WRITE_SYSREG(after, PRENR_EL2);
+}
+else
+{
+region.prlar.reg.en = enable ? 1 : 0;
+access_protection_region(false, NULL, (const pr_t*), index);
+}
+}
+
 /*
  * Update an entry at the index @idx.
  * @base:  base address
@@ -449,6 +488,30 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t 
limit,
 if ( system_state <= SYS_STATE_active )
 update_boot_xen_mpumap_idx(idx);
 }
+else
+{
+/*
+ * Currently, we only support destroying a *WHOLE* MPU memory region,
+ * part-region removing is not supported, as in worst case, it will
+ * lead to two fragments in result after destroying.
+ * part-region removing will be introduced only when actual usage
+ * comes.
+ */
+if ( rc == MPUMAP_REGION_INCLUSIVE )
+{
+region_printk("mpu: part-region removing is not supported\n");
+return -EINVAL;
+}
+
+/* We are removing the region */
+if ( rc != MPUMAP_REGION_FOUND )
+return -EINVAL;
+
+control_mpu_region_from_index(idx, false);
+
+/* Clear the according MPU memory region entry.*/
+memset(_mpumap[idx], 0, sizeof(pr_t));
+}
 
 return 0;
 }
@@ -589,6 +652,15 @@ static void __init map_mpu_memory_section_on_boot(enum 
mpu_section_info type,
 }
 }
 
+int destroy_xen_mappings(unsigned long

[PATCH v2 14/40] xen/arm64: head: Jump to the runtime mapping in enable_mm()

2023-01-12 Thread Penny Zheng

At the moment, on MMU system, enable_mm() will return to an address in
the 1:1 mapping, then each path is responsible to switch to virtual runtime
mapping. Then remove_identity_mapping() is called to remove all 1:1 mapping.

Since remove_identity_mapping() is not necessary on MPU system, and we also
avoid creating empty function for MPU system, trying to keep only one codeflow
in arm64/head.S, we move path switch and remove_identity_mapping() in
enable_mm() on MMU system.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/arm64/head.S | 28 +---
 xen/arch/arm/arm64/head_mmu.S | 33 ++---
 2 files changed, 43 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index a92883319d..6358305f03 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -258,20 +258,15 @@ real_start_efi:
  * and memory regions for MPU systems.
  */
 blprepare_early_mappings
+/*
+ * Address in the runtime mapping to jump to after the
+ * MMU/MPU is enabled
+ */
+ldr   lr, =primary_switched
 /* Turn on MMU or MPU */
-blenable_mm
+benable_mm
 
-/* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
-ldr   x0, =primary_switched
-brx0
 primary_switched:
-/*
- * The 1:1 map may clash with other parts of the Xen virtual memory
- * layout. As it is not used anymore, remove it completely to
- * avoid having to worry about replacing existing mapping
- * afterwards.
- */
-blremove_identity_mapping
 blsetup_early_uart
 #ifdef CONFIG_EARLY_PRINTK
 /* Use a virtual address to access the UART. */
@@ -317,11 +312,14 @@ GLOBAL(init_secondary)
 blcheck_cpu_mode
 blcpu_init
 blprepare_early_mappings
-blenable_mm
 
-/* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
-ldr   x0, =secondary_switched
-brx0
+/*
+ * Address in the runtime mapping to jump to after the
+ * MMU/MPU is enabled
+ */
+ldr   lr, =secondary_switched
+benable_mm
+
 secondary_switched:
 /*
  * Non-boot CPUs need to move on to the proper pagetables, which were
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index a19b7c873d..c9e83bbe2d 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -211,9 +211,11 @@ virtphys_clash:
 ENDPROC(prepare_early_mappings)
 
 /*
- * Turn on the Data Cache and the MMU. The function will return on the 1:1
- * mapping. In other word, the caller is responsible to switch to the runtime
- * mapping.
+ * Turn on the Data Cache and the MMU. The function will return
+ * to the virtual address provided in LR (e.g. the runtime mapping).
+ *
+ * Inputs:
+ * lr(x30): Virtual address to return to
  *
  * Clobbers x0 - x3
  */
@@ -238,6 +240,31 @@ ENTRY(enable_mm)
 dsb   sy /* Flush PTE writes and finish reads */
 msr   SCTLR_EL2, x0  /* now paging is enabled */
 isb  /* Now, flush the icache */
+
+/*
+ * The MMU is turned on and we are in the 1:1 mapping. Switch
+ * to the runtime mapping.
+ */
+ldr   x0, =1f
+brx0
+1:
+/*
+ * The 1:1 map may clash with other parts of the Xen virtual memory
+ * layout. As it is not used anymore, remove it completely to
+ * avoid having to worry about replacing existing mapping
+ * afterwards.
+ *
+ * On return this will jump to the virtual address requested by
+ * the caller
+ */
+b remove_identity_mapping
+
+/*
+ * Here might not be reached, as "ret" in remove_identity_mapping
+ * will use the return address in LR in advance. But keep ret here
+ * might be more safe if "ret" in remove_identity_mapping is removed
+ * in future.
+ */
 ret
 ENDPROC(enable_mm)
 
-- 
2.25.1

[PATCH v2 18/40] xen/mpu: introduce helper access_protection_region

2023-01-12 Thread Penny Zheng

Each EL2 MPU protection region could be configured using PRBAR_EL2 and
PRLAR_EL2.

This commit introduces a new helper access_protection_region() to access
EL2 MPU protection region, including both read/write operations.

As explained in section G1.3.18 of the reference manual for AArch64v8R,
a set of system register PRBAR_EL2 and PRLAR_EL2 provide access to
the EL2 MPU region which is determined by the value of 'n' and
PRSELR_EL2.REGION as PRSELR_EL2.REGION<7:4>:n.(n = 0, 1, 2, ... , 15)
For example to access regions from 16 to 31:
- Set PRSELR_EL2 to 0b1
- Region 16 configuration is accessible through PRBAR0_EL2 and PRLAR0_EL2
- Region 17 configuration is accessible through PRBAR1_EL2 and PRLAR1_EL2
- Region 18 configuration is accessible through PRBAR2_EL2 and PRLAR2_EL2
- ...
- Region 31 configuration is accessible through PRBAR15_EL2 and PRLAR15_EL2

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/mm_mpu.c | 151 ++
 1 file changed, 151 insertions(+)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index c9e17ab6da..f2b494449c 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -46,6 +46,157 @@ uint64_t __ro_after_init next_transient_region_idx;
 /* Maximum number of supported MPU memory regions by the EL2 MPU. */
 uint64_t __ro_after_init max_xen_mpumap;
 
+/* Write a MPU protection region */
+#define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({   \
+uint64_t _sel = sel;\
+const pr_t *_pr = pr;   \
+asm volatile(   \
+"msr "__stringify(PRSELR_EL2)", %0;" /* Selects the region */   \
+"dsb sy;"   \
+"msr "__stringify(prbar_el2)", %1;" /* Write PRBAR_EL2 */\
+"msr "__stringify(prlar_el2)", %2;" /* Write PRLAR_EL2 */\
+"dsb sy;"   \
+: : "r" (_sel), "r" (_pr->prbar.bits), "r" (_pr->prlar.bits));  \
+})
+
+/* Read a MPU protection region */
+#define READ_PROTECTION_REGION(sel, prbar_el2, prlar_el2) ({\
+uint64_t _sel = sel;\
+pr_t _pr;   \
+asm volatile(   \
+"msr "__stringify(PRSELR_EL2)", %2;" /* Selects the region */   \
+"dsb sy;"   \
+"mrs %0, "__stringify(prbar_el2)";" /* Read PRBAR_EL2 */ \
+"mrs %1, "__stringify(prlar_el2)";" /* Read PRLAR_EL2 */ \
+"dsb sy;"   \
+: "=r" (_pr.prbar.bits), "=r" (_pr.prlar.bits) : "r" (_sel));   \
+_pr;\
+})
+
+/*
+ * Access MPU protection region, including both read/write operations.
+ * Armv8-R AArch64 at most supports 255 MPU protection regions.
+ * See section G1.3.18 of the reference manual for Armv8-R AArch64,
+ * PRBAR_EL2 and PRLAR_EL2 provide access to the EL2 MPU region
+ * determined by the value of 'n' and PRSELR_EL2.REGION as
+ * PRSELR_EL2.REGION<7:4>:n(n = 0, 1, 2, ... , 15)
+ * For example to access regions from 16 to 31 (0b1 to 0b1):
+ * - Set PRSELR_EL2 to 0b1
+ * - Region 16 configuration is accessible through PRBAR0_ELx and PRLAR0_ELx
+ * - Region 17 configuration is accessible through PRBAR1_ELx and PRLAR1_ELx
+ * - Region 18 configuration is accessible through PRBAR2_ELx and PRLAR2_ELx
+ * - ...
+ * - Region 31 configuration is accessible through PRBAR15_ELx and PRLAR15_ELx
+ *
+ * @read: if it is read operation.
+ * @pr_read: mpu protection region returned by read op.
+ * @pr_write: const mpu protection region passed through write op.
+ * @sel: mpu protection region selector
+ */
+static void access_protection_region(bool read, pr_t *pr_read,
+ const pr_t *pr_write, uint64_t sel)
+{
+switch ( sel & 0xf )
+{
+case 0:
+if ( read )
+*pr_read = READ_PROTECTION_REGION(sel, PRBAR0_EL2, PRLAR0_EL2);
+else
+WRITE_PROTECTION_REGION(sel, pr_write, PRBAR0_EL2, PRLAR0_EL2);
+break;
+case 1:
+if ( read )
+*pr_read = READ_PROTECTION_REGION(sel, PRBAR1_EL2, PRLAR1_EL2);
+else
+WRITE_PROTECTION_REGION(sel, pr_write, PRBAR1_EL2, PRLAR1_EL2);
+break;
+case 2:
+if ( read )
+*pr_read = READ_PROTECTION_REGION(sel, PRBAR2_EL2, PRLAR2_EL2);
+else
+WRITE_PROTECTION_REGION(sel, pr_write, PRBAR2_EL2, PRLAR2_EL2);
+break;
+case 3:
+if ( read )
+*pr_read = READ_PROTECTION_REGION(sel, PRBAR3_EL2, PRLAR3_EL2);
+else

[PATCH v2 23/40] xen/mpu: initialize frametable in MPU system

2023-01-12 Thread Penny Zheng

Xen is using page as the smallest granularity for memory managment.
And we want to follow the same concept in MPU system.
That is, structure page_info and the frametable which is used for storing
and managing page_info is also required in MPU system.

In MPU system, since there is no virtual address translation (VA == PA),
we can not use a fixed VA address(FRAMETABLE_VIRT_START) to map frametable
like MMU system does.
Instead, we define a variable "struct page_info *frame_table" as frametable
pointer, and ask boot allocator to allocate memory for frametable.

As frametable is successfully initialized, the convertion between machine frame
number/machine address/"virtual address" and page-info structure is
ready too, like mfn_to_page/maddr_to_page/virt_to_page, etc

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/mm.h | 15 ---
 xen/arch/arm/include/asm/mm_mmu.h | 16 
 xen/arch/arm/include/asm/mm_mpu.h | 17 +
 xen/arch/arm/mm_mpu.c | 25 +
 4 files changed, 58 insertions(+), 15 deletions(-)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index e29158028a..7969ec9f98 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -176,7 +176,6 @@ struct page_info
 
 #define maddr_get_owner(ma)   (page_get_owner(maddr_to_page((ma
 
-#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
 /* PDX of the first page in the frame table. */
 extern unsigned long frametable_base_pdx;
 
@@ -280,20 +279,6 @@ static inline uint64_t gvirt_to_maddr(vaddr_t va, paddr_t 
*pa,
 #define virt_to_mfn(va) __virt_to_mfn(va)
 #define mfn_to_virt(mfn)__mfn_to_virt(mfn)
 
-/* Convert between Xen-heap virtual addresses and page-info structures. */
-static inline struct page_info *virt_to_page(const void *v)
-{
-unsigned long va = (unsigned long)v;
-unsigned long pdx;
-
-ASSERT(va >= XENHEAP_VIRT_START);
-ASSERT(va < directmap_virt_end);
-
-pdx = (va - XENHEAP_VIRT_START) >> PAGE_SHIFT;
-pdx += mfn_to_pdx(directmap_mfn_start);
-return frame_table + pdx - frametable_base_pdx;
-}
-
 static inline void *page_to_virt(const struct page_info *pg)
 {
 return mfn_to_virt(mfn_x(page_to_mfn(pg)));
diff --git a/xen/arch/arm/include/asm/mm_mmu.h 
b/xen/arch/arm/include/asm/mm_mmu.h
index 6d7e5ddde7..bc1b04c4c7 100644
--- a/xen/arch/arm/include/asm/mm_mmu.h
+++ b/xen/arch/arm/include/asm/mm_mmu.h
@@ -23,6 +23,8 @@ extern uint64_t init_ttbr;
 extern void setup_directmap_mappings(unsigned long base_mfn,
  unsigned long nr_mfns);
 
+#define frame_table ((struct page_info *)FRAMETABLE_VIRT_START)
+
 static inline paddr_t __virt_to_maddr(vaddr_t va)
 {
 uint64_t par = va_to_par(va);
@@ -49,6 +51,20 @@ static inline void *maddr_to_virt(paddr_t ma)
 }
 #endif
 
+/* Convert between Xen-heap virtual addresses and page-info structures. */
+static inline struct page_info *virt_to_page(const void *v)
+{
+unsigned long va = (unsigned long)v;
+unsigned long pdx;
+
+ASSERT(va >= XENHEAP_VIRT_START);
+ASSERT(va < directmap_virt_end);
+
+pdx = (va - XENHEAP_VIRT_START) >> PAGE_SHIFT;
+pdx += mfn_to_pdx(directmap_mfn_start);
+return frame_table + pdx - frametable_base_pdx;
+}
+
 #endif /* __ARCH_ARM_MM_MMU__ */
 
 /*
diff --git a/xen/arch/arm/include/asm/mm_mpu.h 
b/xen/arch/arm/include/asm/mm_mpu.h
index fe6a828a50..eebd5b5d35 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -9,6 +9,8 @@
  */
 extern void setup_static_mappings(void);
 
+extern struct page_info *frame_table;
+
 static inline paddr_t __virt_to_maddr(vaddr_t va)
 {
 /* In MPU system, VA == PA. */
@@ -22,6 +24,21 @@ static inline void *maddr_to_virt(paddr_t ma)
 return (void *)ma;
 }
 
+/* Convert between virtual address to page-info structure. */
+static inline struct page_info *virt_to_page(const void *v)
+{
+unsigned long va = (unsigned long)v;
+unsigned long pdx;
+
+/*
+ * In MPU system, VA == PA, virt_to_maddr() outputs the
+ * exact input address.
+ */
+pdx = mfn_to_pdx(maddr_to_mfn(virt_to_maddr(va)));
+
+return frame_table + pdx - frametable_base_pdx;
+}
+
 #endif /* __ARCH_ARM_MM_MPU__ */
 
 /*
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index f057ee26df..7b282be4fb 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -69,6 +69,8 @@ static DEFINE_SPINLOCK(xen_mpumap_lock);
 
 static paddr_t dtb_paddr;
 
+struct page_info *frame_table;
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({   \
 uint64_t _sel = sel;\
@@ -564,6 +566,29 @@ void __init setup_static_mappings(void)
 /* TODO: guest memory section, device memory section, boot-module section, 
etc */
 }
 
+/* Map a frame table to cover

[PATCH v2 34/40] xen/mpu: free init memory in MPU system

2023-01-12 Thread Penny Zheng

This commit implements free_init_memory in MPU system, trying to keep
the same strategy with MMU system.

In order to inserting BRK instruction into init code section, which
aims to provok a fault on purpose, we should change init code section
permission to RW at first.
Function modify_xen_mappings is introduced to modify permission of the
existing valid MPU memory region.

Then we nuke the instruction cache to remove entries related to init
text.
At last, we destroy these two MPU memory regions referring init text and
init data using destroy_xen_mappings.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/mm_mpu.c | 85 ++-
 1 file changed, 83 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 0b720004ee..de0c7d919a 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -20,6 +20,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -77,6 +78,8 @@ static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
 REGION_HYPERVISOR_BOOT,
 };
 
+extern char __init_data_begin[], __init_end[];
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({   \
 uint64_t _sel = sel;\
@@ -443,8 +446,41 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t 
limit,
 if ( rc == MPUMAP_REGION_OVERLAP )
 return -EINVAL;
 
+/* We are updating the permission. */
+if ( (flags & _REGION_PRESENT) && (rc == MPUMAP_REGION_FOUND ||
+   rc == MPUMAP_REGION_INCLUSIVE) )
+{
+
+/*
+ * Currently, we only support modifying a *WHOLE* MPU memory region,
+ * part-region modification is not supported, as in worst case, it will
+ * lead to three fragments in result after modification.
+ * part-region modification will be introduced only when actual usage
+ * come
+ */
+if ( rc == MPUMAP_REGION_INCLUSIVE )
+{
+region_printk("mpu: part-region modification is not supported\n");
+return -EINVAL;
+}
+
+/* We don't allow changing memory attributes. */
+if (xen_mpumap[idx].prlar.reg.ai != REGION_AI_MASK(flags) )
+{
+region_printk("Modifying memory attributes is not allowed (0x%x -> 
0x%x).\n",
+  xen_mpumap[idx].prlar.reg.ai, REGION_AI_MASK(flags));
+return -EINVAL;
+}
+
+/* Set new permission */
+xen_mpumap[idx].prbar.reg.ap = REGION_AP_MASK(flags);
+xen_mpumap[idx].prbar.reg.xn = REGION_XN_MASK(flags);
+
+access_protection_region(false, NULL, (const pr_t*)(_mpumap[idx]),
+ idx);
+}
 /* We are inserting a mapping => Create new region. */
-if ( flags & _REGION_PRESENT )
+else if ( flags & _REGION_PRESENT )
 {
 if ( rc != MPUMAP_REGION_FAILED )
 return -EINVAL;
@@ -831,11 +867,56 @@ void mmu_init_secondary_cpu(void)
 
 int modify_xen_mappings(unsigned long s, unsigned long e, unsigned int flags)
 {
-return -ENOSYS;
+ASSERT(IS_ALIGNED(s, PAGE_SIZE));
+ASSERT(IS_ALIGNED(e, PAGE_SIZE));
+ASSERT(s <= e);
+return xen_mpumap_update(s, e, flags);
 }
 
 void free_init_memory(void)
 {
+/* Kernel init text section. */
+paddr_t init_text = virt_to_maddr(_sinittext);
+paddr_t init_text_end = round_pgup(virt_to_maddr(_einittext));
+/* Kernel init data. */
+paddr_t init_data = virt_to_maddr(__init_data_begin);
+paddr_t init_data_end = round_pgup(virt_to_maddr(__init_end));
+unsigned long init_section[4] = {(unsigned long)init_text,
+ (unsigned long)init_text_end,
+ (unsigned long)init_data,
+ (unsigned long)init_data_end};
+unsigned int nr_init = 2;
+uint32_t insn = AARCH64_BREAK_FAULT;
+unsigned int i = 0, j = 0;
+
+/* Change kernel init text section to RW. */
+modify_xen_mappings((unsigned long)init_text,
+(unsigned long)init_text_end, REGION_HYPERVISOR_RW);
+
+/*
+ * From now on, init will not be used for execution anymore,
+ * so nuke the instruction cache to remove entries related to init.
+ */
+invalidate_icache_local();
+
+/* Destroy two MPU memory regions referring init text and init data. */
+for ( ; i < nr_init; i++ )
+{
+uint32_t *p;
+unsigned int nr;
+int rc;
+
+i = 2 * i;
+p = (uint32_t *)init_section[i];
+nr = (init_section[i + 1] - init_section[i]) / sizeof(uint32_t);
+
+for ( ; j < nr ; j++ )
+*(p + j) = insn;
+
+rc = destroy_xen_mappings(init_section[i], init_section[i + 1]);
+if ( rc < 0 )
+panic("Unable to remove the init section (rc = %d)\n",

[PATCH v2 19/40] xen/mpu: populate a new region in Xen MPU mapping table

2023-01-12 Thread Penny Zheng

The new helper xen_mpumap_update() is responsible for updating an entry
in Xen MPU memory mapping table, including creating a new entry, updating
or destroying an existing one.

This commit only talks about populating a new entry in Xen MPU mapping table(
xen_mpumap). Others will be introduced in the following commits.

In xen_mpumap_update_entry(), firstly, we shall check if requested address
range [base, limit) is not mapped. Then we use pr_of_xenaddr() to build up the
structure of MPU memory region(pr_t).
In the last, we set memory attribute and permission based on variable @flags.

To summarize all region attributes in one variable @flags, layout of the
flags is elaborated as follows:
[0:2] Memory attribute Index
[3:4] Execute Never
[5:6] Access Permission
[7]   Region Present
Also, we provide a set of definitions(REGION_HYPERVISOR_RW, etc) that combine
the memory attribute and permission for common combinations.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/mpu.h |  72 +++
 xen/arch/arm/mm_mpu.c| 276 ++-
 2 files changed, 340 insertions(+), 8 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index c945dd53db..fcde6ad0db 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -16,6 +16,61 @@
  */
 #define ARM_MAX_MPU_MEMORY_REGIONS 255
 
+/* Access permission attributes. */
+/* Read/Write at EL2, No Access at EL1/EL0. */
+#define AP_RW_EL2 0x0
+/* Read/Write at EL2/EL1/EL0 all levels. */
+#define AP_RW_ALL 0x1
+/* Read-only at EL2, No Access at EL1/EL0. */
+#define AP_RO_EL2 0x2
+/* Read-only at EL2/EL1/EL0 all levels. */
+#define AP_RO_ALL 0x3
+
+/*
+ * Excute never.
+ * Stage 1 EL2 translation regime.
+ * XN[1] determines whether execution of the instruction fetched from the MPU
+ * memory region is permitted.
+ * Stage 2 EL1/EL0 translation regime.
+ * XN[0] determines whether execution of the instruction fetched from the MPU
+ * memory region is permitted.
+ */
+#define XN_DISABLED0x0
+#define XN_P2M_ENABLED 0x1
+#define XN_ENABLED 0x2
+
+/*
+ * Layout of the flags used for updating Xen MPU region attributes
+ * [0:2] Memory attribute Index
+ * [3:4] Execute Never
+ * [5:6] Access Permission
+ * [7]   Region Present
+ */
+#define _REGION_AI_BIT0
+#define _REGION_XN_BIT3
+#define _REGION_AP_BIT5
+#define _REGION_PRESENT_BIT   7
+#define _REGION_XN(2U << _REGION_XN_BIT)
+#define _REGION_RO(2U << _REGION_AP_BIT)
+#define _REGION_PRESENT   (1U << _REGION_PRESENT_BIT)
+#define REGION_AI_MASK(x) (((x) >> _REGION_AI_BIT) & 0x7U)
+#define REGION_XN_MASK(x) (((x) >> _REGION_XN_BIT) & 0x3U)
+#define REGION_AP_MASK(x) (((x) >> _REGION_AP_BIT) & 0x3U)
+#define REGION_RO_MASK(x) (((x) >> _REGION_AP_BIT) & 0x2U)
+
+/*
+ * _REGION_NORMAL is convenience define. It is not meant to be used
+ * outside of this header.
+ */
+#define _REGION_NORMAL(MT_NORMAL|_REGION_PRESENT)
+
+#define REGION_HYPERVISOR_RW  (_REGION_NORMAL|_REGION_XN)
+#define REGION_HYPERVISOR_RO  (_REGION_NORMAL|_REGION_XN|_REGION_RO)
+
+#define REGION_HYPERVISOR REGION_HYPERVISOR_RW
+
+#define INVALID_REGION(~0UL)
+
 #ifndef __ASSEMBLY__
 
 /* Protection Region Base Address Register */
@@ -49,6 +104,23 @@ typedef struct {
 prlar_t prlar;
 } pr_t;
 
+/* Access to set base address of MPU protection region(pr_t). */
+#define pr_set_base(pr, paddr) ({   \
+pr_t *_pr = pr; \
+_pr->prbar.reg.base = (paddr >> MPU_REGION_SHIFT);  \
+})
+
+/* Access to set limit address of MPU protection region(pr_t). */
+#define pr_set_limit(pr, paddr) ({  \
+pr_t *_pr = pr; \
+_pr->prlar.reg.limit = (paddr >> MPU_REGION_SHIFT); \
+})
+
+#define region_is_valid(pr) ({  \
+pr_t *_pr = pr; \
+_pr->prlar.reg.en;  \
+})
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM64_MPU_H__ */
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index f2b494449c..08720a7c19 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -22,9 +22,23 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
+#ifdef NDEBUG
+static inline void
+__attribute__ ((__format__ (__printf__, 1, 2)))
+region_printk(const char *fmt, ...) {}
+#else
+#define region_printk(fmt, args...) \
+do  \
+{   \
+dprintk(XENLOG_ERR, fmt, ## args);  \
+WARN(); \
+} while (0)
+#endif
+
 /* Xen MPU memory region mapping table. */
 pr_t __aligned(PAGE_SIZE)

[PATCH v2 13/40] xen/mpu: introduce unified function setup_early_uart to map early UART

2023-01-12 Thread Penny Zheng

In MMU system, we map the UART in the fixmap (when earlyprintk is used).
However in MPU system, we map the UART with a transient MPU memory
region.

So we introduce a new unified function setup_early_uart to replace
the previous setup_fixmap.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/arm64/head.S   |  2 +-
 xen/arch/arm/arm64/head_mmu.S   |  4 +-
 xen/arch/arm/arm64/head_mpu.S   | 52 +
 xen/arch/arm/include/asm/early_printk.h |  1 +
 4 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 7f3f973468..a92883319d 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -272,7 +272,7 @@ primary_switched:
  * afterwards.
  */
 blremove_identity_mapping
-blsetup_fixmap
+blsetup_early_uart
 #ifdef CONFIG_EARLY_PRINTK
 /* Use a virtual address to access the UART. */
 ldr   x23, =EARLY_UART_VIRTUAL_ADDRESS
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index b59c40495f..a19b7c873d 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -312,7 +312,7 @@ ENDPROC(remove_identity_mapping)
  *
  * Clobbers x0 - x3
  */
-ENTRY(setup_fixmap)
+ENTRY(setup_early_uart)
 #ifdef CONFIG_EARLY_PRINTK
 /* Add UART to the fixmap table */
 ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
@@ -325,7 +325,7 @@ ENTRY(setup_fixmap)
 dsb   nshst
 
 ret
-ENDPROC(setup_fixmap)
+ENDPROC(setup_early_uart)
 
 /* Fail-stop */
 fail:   PRINT("- Boot failed -\r\n")
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
index e2ac69b0cc..72d1e0863d 100644
--- a/xen/arch/arm/arm64/head_mpu.S
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -18,8 +18,10 @@
 #define REGION_TEXT_PRBAR   0x38/* SH=11 AP=10 XN=00 */
 #define REGION_RO_PRBAR 0x3A/* SH=11 AP=10 XN=10 */
 #define REGION_DATA_PRBAR   0x32/* SH=11 AP=00 XN=10 */
+#define REGION_DEVICE_PRBAR 0x22/* SH=10 AP=00 XN=10 */
 
 #define REGION_NORMAL_PRLAR 0x0f/* NS=0 ATTR=111 EN=1 */
+#define REGION_DEVICE_PRLAR 0x09/* NS=0 ATTR=100 EN=1 */
 
 /*
  * Macro to round up the section address to be PAGE_SIZE aligned
@@ -334,6 +336,56 @@ ENTRY(enable_mm)
 ret
 ENDPROC(enable_mm)
 
+/*
+ * Map the early UART with a new transient MPU memory region.
+ *
+ * x27: region selector
+ * x28: prbar
+ * x29: prlar
+ *
+ * Clobbers x0 - x4
+ *
+ */
+ENTRY(setup_early_uart)
+#ifdef CONFIG_EARLY_PRINTK
+/* stack LR as write_pr will be called later like nested function */
+mov   x3, lr
+
+/*
+ * MPU region for early UART is a transient region, since it will be
+ * replaced by specific device memory layout when FDT gets parsed.
+ */
+load_paddr x0, next_transient_region_idx
+ldr   x4, [x0]
+
+ldr   x28, =CONFIG_EARLY_UART_BASE_ADDRESS
+and   x28, x28, #MPU_REGION_MASK
+mov   x1, #REGION_DEVICE_PRBAR
+orr   x28, x28, x1
+
+ldr x29, =(CONFIG_EARLY_UART_BASE_ADDRESS + EARLY_UART_SIZE)
+roundup_section x29
+/* Limit address is inclusive */
+sub   x29, x29, #1
+and   x29, x29, #MPU_REGION_MASK
+mov   x2, #REGION_DEVICE_PRLAR
+orr   x29, x29, x2
+
+mov   x27, x4
+blwrite_pr
+
+/* Create a new entry in xen_mpumap for early UART */
+create_mpu_entry xen_mpumap, x4, x28, x29, x1, x2
+
+/* Update next_transient_region_idx */
+sub   x4, x4, #1
+str   x4, [x0]
+
+mov   lr, x3
+ret
+#endif
+ENDPROC(setup_early_uart)
+
 /*
  * Local variables:
  * mode: ASM
diff --git a/xen/arch/arm/include/asm/early_printk.h 
b/xen/arch/arm/include/asm/early_printk.h
index 44a230853f..d87623e6d5 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -22,6 +22,7 @@
  * for EARLY_UART_VIRTUAL_ADDRESS.
  */
 #define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
+#define EARLY_UART_SIZE0x1000
 
 #else
 
-- 
2.25.1

[PATCH v2 10/40] xen/arm: split MMU and MPU config files from config.h

2023-01-12 Thread Penny Zheng

From: Wei Chen 

Xen defines some global configuration macros for Arm in
config.h. We still want to use it for Armv8-R systems, but
there are some address related macros that are defined for
MMU systems. These macros will not be used by MPU systems,
Adding ifdefery with CONFIG_HAS_MPU to gate these macros
will result in a messy and hard-to-read/maintain code.

So we keep some common definitions still in config.h, but
move virtual address related definitions to a new file -
config_mmu.h. And use a new file config_mpu.h to store
definitions for MPU systems. To avoid spreading #ifdef
everywhere, we keep the same definition names for MPU
systems, like XEN_VIRT_START and HYPERVISOR_VIRT_START,
but the definition contents are MPU specific.

Signed-off-by: Wei Chen 
---
v1 -> v2:
1. Remove duplicated FIXMAP definitions from config_mmu.h
---
 xen/arch/arm/include/asm/config.h | 103 +++
 xen/arch/arm/include/asm/config_mmu.h | 112 ++
 xen/arch/arm/include/asm/config_mpu.h |  25 ++
 3 files changed, 147 insertions(+), 93 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/config_mmu.h
 create mode 100644 xen/arch/arm/include/asm/config_mpu.h

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 25a625ff08..86d8142959 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -48,6 +48,12 @@
 
 #define INVALID_VCPU_ID MAX_VIRT_CPUS
 
+/* Used for calculating PDX */
+#ifdef CONFIG_ARM_64
+#define FRAMETABLE_SIZEGB(32)
+#define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
+#endif
+
 #define __LINUX_ARM_ARCH__ 7
 #define CONFIG_AEABI
 
@@ -71,99 +77,10 @@
 #include 
 #include 
 
-/*
- * Common ARM32 and ARM64 layout:
- *   0  -   2M   Unmapped
- *   2M -   4M   Xen text, data, bss
- *   4M -   6M   Fixmap: special-purpose 4K mapping slots
- *   6M -  10M   Early boot mapping of FDT
- *   10M - 12M   Livepatch vmap (if compiled in)
- *
- * ARM32 layout:
- *   0  -  12M   
- *
- *  32M - 128M   Frametable: 24 bytes per page for 16GB of RAM
- * 256M -   1G   VMAP: ioremap and early_ioremap use this virtual address
- *space
- *
- *   1G -   2G   Xenheap: always-mapped memory
- *   2G -   4G   Domheap: on-demand-mapped
- *
- * ARM64 layout:
- * 0x - 0x007f (512GB, L0 slot [0])
- *   0  -  12M   
- *
- *   1G -   2G   VMAP: ioremap and early_ioremap
- *
- *  32G -  64G   Frametable: 24 bytes per page for 5.3TB of RAM
- *
- * 0x0080 - 0x7fff (127.5TB, L0 slots [1..255])
- *  Unused
- *
- * 0x8000 - 0x84ff (5TB, L0 slots [256..265])
- *  1:1 mapping of RAM
- *
- * 0x8500 - 0x (123TB, L0 slots [266..511])
- *  Unused
- */
-
-#define XEN_VIRT_START _AT(vaddr_t,0x0020)
-#define FIXMAP_ADDR(n)(_AT(vaddr_t,0x0040) + (n) * PAGE_SIZE)
-
-#define BOOT_FDT_VIRT_START_AT(vaddr_t,0x0060)
-#define BOOT_FDT_VIRT_SIZE _AT(vaddr_t, MB(4))
-
-#ifdef CONFIG_LIVEPATCH
-#define LIVEPATCH_VMAP_START   _AT(vaddr_t,0x00a0)
-#define LIVEPATCH_VMAP_SIZE_AT(vaddr_t, MB(2))
-#endif
-
-#define HYPERVISOR_VIRT_START  XEN_VIRT_START
-
-#ifdef CONFIG_ARM_32
-
-#define CONFIG_SEPARATE_XENHEAP 1
-
-#define FRAMETABLE_VIRT_START  _AT(vaddr_t,0x0200)
-#define FRAMETABLE_SIZEMB(128-32)
-#define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
-#define FRAMETABLE_VIRT_END(FRAMETABLE_VIRT_START + FRAMETABLE_SIZE - 1)
-
-#define VMAP_VIRT_START_AT(vaddr_t,0x1000)
-#define VMAP_VIRT_SIZE _AT(vaddr_t, GB(1) - MB(256))
-
-#define XENHEAP_VIRT_START _AT(vaddr_t,0x4000)
-#define XENHEAP_VIRT_SIZE  _AT(vaddr_t, GB(1))
-
-#define DOMHEAP_VIRT_START _AT(vaddr_t,0x8000)
-#define DOMHEAP_VIRT_SIZE  _AT(vaddr_t, GB(2))
-
-#define DOMHEAP_ENTRIES1024  /* 1024 2MB mapping slots */
-
-/* Number of domheap pagetable pages required at the second level (2MB 
mappings) */
-#define DOMHEAP_SECOND_PAGES (DOMHEAP_VIRT_SIZE >> FIRST_SHIFT)
-
-#else /* ARM_64 */
-
-#define SLOT0_ENTRY_BITS  39
-#define SLOT0(slot) (_AT(vaddr_t,slot) << SLOT0_ENTRY_BITS)
-#define SLOT0_ENTRY_SIZE  SLOT0(1)
-
-#define VMAP_VIRT_START  GB(1)
-#define VMAP_VIRT_SIZE   GB(1)
-
-#define FRAMETABLE_VIRT_START  GB(32)
-#define FRAMETABLE_SIZEGB(32)
-#define FRAMETABLE_NR  (FRAMETABLE_SIZE / sizeof(*frame_table))
-
-#define DIRECTMAP_VIRT_START   SLOT0(256)
-#define DIRECTMAP_SIZE (SLOT0_ENTRY_SIZE * (265-256))
-#define DIRECTMAP_VIRT_END (DIRECTMAP_VIRT_START + DIRECTMAP_SIZE - 1)
-
-#define XENHEAP_VIRT_START directmap_virt_start
-
-#define HYPERVISOR_VIRT_ENDDIRECTMAP_VIRT_END
-
+#ifdef CONFIG_HAS_MPU
+#include 
+#else
+#include 
 #endif
 
 #define NR_hypercalls 64
diff --git a/xen/arch/arm/include/asm/config_mmu.h 
b/xen/arch/arm/include/asm/config_mmu.h
new file mode 100644

[PATCH v2 32/40] xen/mpu: implement MPU version of ioremap_xxx

2023-01-12 Thread Penny Zheng

Function ioremap_xxx is normally being used to remap device address ranges
in MMU system during device driver initialization.

However, in MPU system, virtual translation is not supported and
device memory layout is statically configured in Device Tree, and being mapped
at very early stage.
So here we only add a check to verify this assumption.

But for tolerating a few cases where the function is called to map for
temporary copy and paste, like ioremap_wc in kernel image loading, the
region attribute mismatch will be treated as warning than error.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/mpu.h |  1 +
 xen/arch/arm/include/asm/mm.h| 16 -
 xen/arch/arm/include/asm/mm_mpu.h|  2 +
 xen/arch/arm/mm_mpu.c| 88 
 xen/include/xen/vmap.h   | 12 
 5 files changed, 106 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index 8e8679bc82..b4e50a9a0e 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -82,6 +82,7 @@
 #define REGION_HYPERVISOR_BOOT(REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
 #define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
 #define REGION_HYPERVISOR_NOCACHE 
(_REGION_DEVICE|MT_DEVICE_nGnRE|_REGION_SWITCH)
+#define REGION_HYPERVISOR_WC  (_REGION_DEVICE|MT_NORMAL_NC)
 
 #define INVALID_REGION(~0UL)
 
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 7969ec9f98..fa44cfc50d 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -14,6 +14,10 @@
 # error "unknown ARM variant"
 #endif
 
+#if defined(CONFIG_HAS_MPU)
+# include 
+#endif
+
 /* Align Xen to a 2 MiB boundary. */
 #define XEN_PADDR_ALIGN (1 << 21)
 
@@ -198,19 +202,25 @@ extern void setup_frametable_mappings(paddr_t ps, paddr_t 
pe);
 /* map a physical range in virtual memory */
 void __iomem *ioremap_attr(paddr_t start, size_t len, unsigned int attributes);
 
+#ifndef CONFIG_HAS_MPU
+#define DEFINE_ATTRIBUTE(var)   (PAGE_##var)
+#else
+#define DEFINE_ATTRIBUTE(var)   (REGION_##var)
+#endif
+
 static inline void __iomem *ioremap_nocache(paddr_t start, size_t len)
 {
-return ioremap_attr(start, len, PAGE_HYPERVISOR_NOCACHE);
+return ioremap_attr(start, len, DEFINE_ATTRIBUTE(HYPERVISOR_NOCACHE));
 }
 
 static inline void __iomem *ioremap_cache(paddr_t start, size_t len)
 {
-return ioremap_attr(start, len, PAGE_HYPERVISOR);
+return ioremap_attr(start, len, DEFINE_ATTRIBUTE(HYPERVISOR));
 }
 
 static inline void __iomem *ioremap_wc(paddr_t start, size_t len)
 {
-return ioremap_attr(start, len, PAGE_HYPERVISOR_WC);
+return ioremap_attr(start, len, DEFINE_ATTRIBUTE(HYPERVISOR_WC));
 }
 
 /* XXX -- account for base */
diff --git a/xen/arch/arm/include/asm/mm_mpu.h 
b/xen/arch/arm/include/asm/mm_mpu.h
index eebd5b5d35..5aa61c43b6 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -2,6 +2,8 @@
 #ifndef __ARCH_ARM_MM_MPU__
 #define __ARCH_ARM_MM_MPU__
 
+#include 
+
 #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
 /*
  * Function setup_static_mappings() sets up MPU memory region mapping
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index ea64aa38e4..7b54c87acf 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -712,32 +712,100 @@ void __init setup_frametable_mappings(paddr_t ps, 
paddr_t pe)
frametable_size - (nr_pdxs * sizeof(struct page_info)));
 }
 
-/* TODO: Implementation on the first usage */
-void dump_hyp_walk(vaddr_t addr)
+static bool region_attribute_match(pr_t *region, unsigned int attributes)
 {
+if ( region->prbar.reg.ap != REGION_AP_MASK(attributes) )
+{
+printk(XENLOG_ERR "region permission is not matched (0x%x -> 0x%x)\n",
+   region->prbar.reg.ap, REGION_AP_MASK(attributes));
+return false;
+}
+
+if ( region->prbar.reg.xn != REGION_XN_MASK(attributes) )
+{
+printk(XENLOG_ERR "region execution permission is not matched (0x%x -> 
0x%x)\n",
+   region->prbar.reg.xn, REGION_XN_MASK(attributes));
+return false;
+}
+
+if ( region->prlar.reg.ai != REGION_AI_MASK(attributes) )
+{
+printk(XENLOG_ERR "region memory attributes is not matched (0x%x -> 
0x%x)\n",
+   region->prlar.reg.ai, REGION_AI_MASK(attributes));
+return false;
+}
+
+return true;
 }
 
-void __init remove_early_mappings(void)
+static bool check_region_and_attributes(paddr_t pa, size_t len,
+unsigned int attributes,
+const char *prefix)
+{
+pr_t *region;
+int rc;
+uint64_t idx;
+
+rc = mpumap_contain_region(xen_mpumap, max_xen_mpumap, pa, pa + len - 1,
+   );
+if ( rc !=

[PATCH v2 37/40] xen/mpu: move MMU specific P2M code to p2m_mmu.c

2023-01-12 Thread Penny Zheng

Current P2M implementation is designed for MMU system. Only a few codes
can be shared by MPU system, like P2M pool, IPA, etc
We move the MMU-specific codes into p2m_mmu.c, and place stub functions
in p2m_mpu.c which wait for implementing on the first usage. And we
keep generic codes in p2m.c

We also move MMU-specific definitions to p2m_mmu.h, like P2M_ROOT_LEVEL and
function p2m_tlb_flush_sync.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/Makefile  |5 +
 xen/arch/arm/include/asm/p2m.h |   17 +-
 xen/arch/arm/include/asm/p2m_mmu.h |   28 +
 xen/arch/arm/p2m.c | 2276 +--
 xen/arch/arm/p2m_mmu.c | 2295 
 xen/arch/arm/p2m_mpu.c |  191 +++
 6 files changed, 2528 insertions(+), 2284 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/p2m_mmu.h
 create mode 100644 xen/arch/arm/p2m_mmu.c
 create mode 100644 xen/arch/arm/p2m_mpu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index c949661590..ea650db52b 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -44,6 +44,11 @@ obj-y += mm_mpu.o
 endif
 obj-y += monitor.o
 obj-y += p2m.o
+ifneq ($(CONFIG_HAS_MPU), y)
+obj-y += p2m_mmu.o
+else
+obj-y += p2m_mpu.o
+endif
 obj-y += percpu.o
 obj-y += platform.o
 obj-y += platform_hypercall.o
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index 91df922e1c..a430aca232 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -14,17 +14,6 @@
 /* Holds the bit size of IPAs in p2m tables.  */
 extern unsigned int p2m_ipa_bits;
 
-#ifdef CONFIG_ARM_64
-extern unsigned int p2m_root_order;
-extern unsigned int p2m_root_level;
-#define P2M_ROOT_ORDERp2m_root_order
-#define P2M_ROOT_LEVEL p2m_root_level
-#else
-/* First level P2M is always 2 consecutive pages */
-#define P2M_ROOT_ORDER1
-#define P2M_ROOT_LEVEL 1
-#endif
-
 struct domain;
 
 extern void memory_type_changed(struct domain *);
@@ -162,6 +151,10 @@ typedef enum {
 #endif
 #include 
 
+#ifndef CONFIG_HAS_MPU
+#include 
+#endif
+
 static inline bool arch_acquire_resource_check(struct domain *d)
 {
 /*
@@ -252,8 +245,6 @@ static inline int p2m_is_write_locked(struct p2m_domain 
*p2m)
 return rw_is_write_locked(>lock);
 }
 
-void p2m_tlb_flush_sync(struct p2m_domain *p2m);
-
 /* Look up the MFN corresponding to a domain's GFN. */
 mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t);
 
diff --git a/xen/arch/arm/include/asm/p2m_mmu.h 
b/xen/arch/arm/include/asm/p2m_mmu.h
new file mode 100644
index 00..a0f2440336
--- /dev/null
+++ b/xen/arch/arm/include/asm/p2m_mmu.h
@@ -0,0 +1,28 @@
+#ifndef _XEN_P2M_MMU_H
+#define _XEN_P2M_MMU_H
+
+#ifdef CONFIG_ARM_64
+extern unsigned int p2m_root_order;
+extern unsigned int p2m_root_level;
+#define P2M_ROOT_ORDERp2m_root_order
+#define P2M_ROOT_LEVEL p2m_root_level
+#else
+/* First level P2M is always 2 consecutive pages */
+#define P2M_ROOT_ORDER1
+#define P2M_ROOT_LEVEL 1
+#endif
+
+struct p2m_domain;
+
+void p2m_tlb_flush_sync(struct p2m_domain *p2m);
+
+#endif /* _XEN_P2M_MMU_H */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 948f199d84..42f51051e0 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,36 +1,9 @@
 /* SPDX-License-Identifier: GPL-2.0 */
-#include 
-#include 
-#include 
-#include 
 #include 
 #include 
-#include 
 
-#include 
 #include 
-#include 
-#include 
 #include 
-#include 
-
-#define MAX_VMID_8_BIT  (1UL << 8)
-#define MAX_VMID_16_BIT (1UL << 16)
-
-#define INVALID_VMID 0 /* VMID 0 is reserved */
-
-#ifdef CONFIG_ARM_64
-unsigned int __read_mostly p2m_root_order;
-unsigned int __read_mostly p2m_root_level;
-static unsigned int __read_mostly max_vmid = MAX_VMID_8_BIT;
-/* VMID is by default 8 bit width on AArch64 */
-#define MAX_VMID   max_vmid
-#else
-/* VMID is always 8 bit width on AArch32 */
-#define MAX_VMIDMAX_VMID_8_BIT
-#endif
-
-#define P2M_ROOT_PAGES(1arch.paging.p2m_freelist);
-spin_unlock(>arch.paging.lock);
-}
-
-return pg;
-}
-
-static void p2m_free_page(struct domain *d, struct page_info *pg)
-{
-if ( is_hardware_domain(d) )
-free_domheap_page(pg);
-else
-{
-spin_lock(>arch.paging.lock);
-page_list_add_tail(pg, >arch.paging.p2m_freelist);
-spin_unlock(>arch.paging.lock);
-}
-}
-
 /* Return the size of the pool, in bytes. */
 int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
 {
@@ -186,441 +115,10 @@ int p2m_teardown_allocation(struct domain *d)
 return ret;
 }
 
-/* Unlock the flush and do a P2M TLB flush if necessary */
-void p2m_write_unlock(struct p2m_domain *p2m)
-{
-/*
- * The final flush is done with the P2M write lock taken

[PATCH v2 22/40] xen/mpu: implement MPU version of setup_mm in setup_mpu.c

2023-01-12 Thread Penny Zheng

In MPU system, system RAM shall be statically partitioned into
different functionality section in Device Tree at the very beginning,
including static xenheap, guest memory section, boot-module section, etc.
So using a virtual contigious memory region to do direct-mapping for the
whole system RAM is not applicable in MPU system.

Function setup_static_mappings is introduced to set up MPU memory
region mapping section by section based on static configuration in
Device Tree.
And this commit is only responsible for static xenheap mapping, which is
implemented in setup_staticheap_mappings. All the other static
memory section mapping will be introduced later.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/Makefile |  2 +
 xen/arch/arm/include/asm/mm_mpu.h |  5 +++
 xen/arch/arm/mm_mpu.c | 41 ++
 xen/arch/arm/setup_mpu.c  | 70 +++
 4 files changed, 118 insertions(+)
 create mode 100644 xen/arch/arm/setup_mpu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index adeb17b7ab..23dfbc 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -53,6 +53,8 @@ obj-y += psci.o
 obj-y += setup.o
 ifneq ($(CONFIG_HAS_MPU), y)
 obj-y += setup_mmu.o
+else
+obj-y += setup_mpu.o
 endif
 obj-y += shutdown.o
 obj-y += smp.o
diff --git a/xen/arch/arm/include/asm/mm_mpu.h 
b/xen/arch/arm/include/asm/mm_mpu.h
index 3a4b07f187..fe6a828a50 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -3,6 +3,11 @@
 #define __ARCH_ARM_MM_MPU__
 
 #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
+/*
+ * Function setup_static_mappings() sets up MPU memory region mapping
+ * section by section based on static configuration in Device Tree.
+ */
+extern void setup_static_mappings(void);
 
 static inline paddr_t __virt_to_maddr(vaddr_t va)
 {
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index b34dbf4515..f057ee26df 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -523,6 +523,47 @@ void * __init early_fdt_map(paddr_t fdt_paddr)
 return fdt_virt;
 }
 
+/*
+ * Heap must be statically configured in Device Tree through
+ * "xen,static-heap" in MPU system.
+ */
+static void __init setup_staticheap_mappings(void)
+{
+unsigned int bank = 0;
+
+for ( ; bank < bootinfo.reserved_mem.nr_banks; bank++ )
+{
+if ( bootinfo.reserved_mem.bank[bank].type == MEMBANK_STATIC_HEAP )
+{
+paddr_t bank_start = round_pgup(
+ bootinfo.reserved_mem.bank[bank].start);
+paddr_t bank_size = round_pgdown(
+bootinfo.reserved_mem.bank[bank].size);
+
+/* Map static heap with fixed MPU memory region */
+
+if ( map_pages_to_xen(bank_start, maddr_to_mfn(bank_start),
+  bank_size >> PAGE_SHIFT,
+  REGION_HYPERVISOR) )
+panic("mpu: failed to map static heap\n");
+}
+}
+}
+
+/*
+ * System RAM is statically partitioned into different functionality
+ * section in Device Tree, including static xenheap, guest memory
+ * section, boot-module section, etc.
+ * Function setup_static_mappings sets up MPU memory region mapping
+ * section by section.
+ */
+void __init setup_static_mappings(void)
+{
+setup_staticheap_mappings();
+
+/* TODO: guest memory section, device memory section, boot-module section, 
etc */
+}
+
 /* TODO: Implementation on the first usage */
 void dump_hyp_walk(vaddr_t addr)
 {
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
new file mode 100644
index 00..ca0d8237d5
--- /dev/null
+++ b/xen/arch/arm/setup_mpu.c
@@ -0,0 +1,70 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * xen/arch/arm/setup_mpu.c
+ *
+ * Early bringup code for an Armv8-R with virt extensions.
+ *
+ * Copyright (C) 2022 Arm Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+void __init setup_mm(void)
+{
+paddr_t ram_start = ~0, ram_end = 0, ram_size = 0;
+unsigned int bank;
+
+if ( !bootinfo.mem.nr_banks )
+panic("No memory bank\n");
+
+init_pdx();
+
+populate_boot_allocator();
+
+total_pages = 0;
+for ( bank = 0 ; bank < bootinfo.mem.nr_banks; bank++ )
+{
+paddr_t

[PATCH v2 27/40] xen/mpu: map device memory resource in MPU system

2023-01-12 Thread Penny Zheng

In MPU system, we could not afford mapping a new MPU memory region
with each new device, it will exhaust limited MPU memory regions
very quickly.

So we introduce `mpu,device-memory-section` for users to statically
configure the whole system device memory with the least number of
memory regions in Device Tree. This section shall cover all devices
that will be used in Xen, like `UART`, `GIC`, etc.

Before we map `mpu,device-memory-section` with device memory attributes and
permissions(REGION_HYPRVISOR_NOCACHE), we shall destroy the mapping for early
UART which got set up in assembly boot-time, to avoid MPU memory
region overlapping.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/mpu.h |  6 --
 xen/arch/arm/include/asm/setup.h |  1 +
 xen/arch/arm/mm_mpu.c| 14 +-
 xen/arch/arm/setup_mpu.c |  5 +
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index c1dea1c8e9..8e8679bc82 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -69,10 +69,11 @@
 #define REGION_TRANSIENT_MASK(x)  (((x) >> _REGION_TRANSIENT_BIT) & 0x3U)
 
 /*
- * _REGION_NORMAL is convenience define. It is not meant to be used
- * outside of this header.
+ * _REGION_NORMAL and _REGION_DEVICE are convenience defines. They are not
+ * meant to be used outside of this header.
  */
 #define _REGION_NORMAL(MT_NORMAL|_REGION_PRESENT)
+#define _REGION_DEVICE(_REGION_XN|_REGION_PRESENT)
 
 #define REGION_HYPERVISOR_RW  (_REGION_NORMAL|_REGION_XN)
 #define REGION_HYPERVISOR_RO  (_REGION_NORMAL|_REGION_XN|_REGION_RO)
@@ -80,6 +81,7 @@
 #define REGION_HYPERVISOR REGION_HYPERVISOR_RW
 #define REGION_HYPERVISOR_BOOT(REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
 #define REGION_HYPERVISOR_SWITCH  (REGION_HYPERVISOR_RW|_REGION_SWITCH)
+#define REGION_HYPERVISOR_NOCACHE 
(_REGION_DEVICE|MT_DEVICE_nGnRE|_REGION_SWITCH)
 
 #define INVALID_REGION(~0UL)
 
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 3581f8f990..b7a2225c25 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -194,6 +194,7 @@ struct init_info
 /* Index of MPU memory section */
 enum mpu_section_info {
 MSINFO_GUEST,
+MSINFO_DEVICE,
 MSINFO_MAX
 };
 
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 3a0d110b13..1566ba60af 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -73,6 +73,7 @@ struct page_info *frame_table;
 
 static const unsigned int mpu_section_mattr[MSINFO_MAX] = {
 REGION_HYPERVISOR_SWITCH,
+REGION_HYPERVISOR_NOCACHE,
 };
 
 /* Write a MPU protection region */
@@ -673,8 +674,19 @@ void __init setup_static_mappings(void)
 setup_staticheap_mappings();
 
 for ( uint8_t i = MSINFO_GUEST; i < MSINFO_MAX; i++ )
+{
+#ifdef CONFIG_EARLY_PRINTK
+if ( i == MSINFO_DEVICE )
+/*
+ * Destroy early UART mapping before mapping device memory section.
+ * WARNING：console will be inaccessible temporarily.
+ */
+destroy_xen_mappings(CONFIG_EARLY_UART_BASE_ADDRESS,
+ CONFIG_EARLY_UART_BASE_ADDRESS + 
EARLY_UART_SIZE);
+#endif
 map_mpu_memory_section_on_boot(i, mpu_section_mattr[i]);
-/* TODO: device memory section, boot-module section, etc */
+}
+/* TODO: boot-module section, etc */
 }
 
 /* Map a frame table to cover physical addresses ps through pe */
diff --git a/xen/arch/arm/setup_mpu.c b/xen/arch/arm/setup_mpu.c
index 09a38a34a4..ec05542f68 100644
--- a/xen/arch/arm/setup_mpu.c
+++ b/xen/arch/arm/setup_mpu.c
@@ -29,6 +29,7 @@
 
 const char *mpu_section_info_str[MSINFO_MAX] = {
 "mpu,guest-memory-section",
+"mpu,device-memory-section",
 };
 
 /*
@@ -47,6 +48,10 @@ struct mpuinfo __initdata mpuinfo;
  * through "xen,static-mem" property in MPU system. "mpu,guest-memory-section"
  * limits the scattering of "xen,static-mem", as users could not define
  * a "xen,static-mem" outside "mpu,guest-memory-section".
+ *
+ * "mpu,device-memory-section": this section draws the device memory layout
+ * with the least number of memory regions for all devices in system that will
+ * be used in Xen, like `UART`, `GIC`, etc.
  */
 static int __init process_mpu_memory_section(const void *fdt, int node,
  const char *name, void *data,
-- 
2.25.1

[PATCH v2 15/40] xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h

2023-01-12 Thread Penny Zheng

From: Wei Chen 

To make the code readable and maintainable, we move MMU-specific
memory management code from mm.c to mm_mmu.c and move MMU-specific
definitions from mm.h to mm_mmu.h.
Later we will create mm_mpu.h and mm_mpu.c for MPU-specific memory
management code.
This will avoid lots of #ifdef in memory management code and header files.

Signed-off-by: Wei Chen 
Signed-off-by: Penny Zheng 
---
 xen/arch/arm/Makefile |5 +
 xen/arch/arm/include/asm/mm.h |   19 +-
 xen/arch/arm/include/asm/mm_mmu.h |   35 +
 xen/arch/arm/mm.c | 1352 +---
 xen/arch/arm/mm_mmu.c | 1376 +
 xen/arch/arm/mm_mpu.c |   67 ++
 6 files changed, 1488 insertions(+), 1366 deletions(-)
 create mode 100644 xen/arch/arm/include/asm/mm_mmu.h
 create mode 100644 xen/arch/arm/mm_mmu.c

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 4d076b278b..21188b207f 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -37,6 +37,11 @@ obj-y += kernel.init.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += mem_access.o
 obj-y += mm.o
+ifneq ($(CONFIG_HAS_MPU), y)
+obj-y += mm_mmu.o
+else
+obj-y += mm_mpu.o
+endif
 obj-y += monitor.o
 obj-y += p2m.o
 obj-y += percpu.o
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 68adcac9fa..1b9fdb6ff5 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -154,13 +154,6 @@ struct page_info
 #define _PGC_need_scrub   _PGC_allocated
 #define PGC_need_scrubPGC_allocated
 
-extern mfn_t directmap_mfn_start, directmap_mfn_end;
-extern vaddr_t directmap_virt_end;
-#ifdef CONFIG_ARM_64
-extern vaddr_t directmap_virt_start;
-extern unsigned long directmap_base_pdx;
-#endif
-
 #ifdef CONFIG_ARM_32
 #define is_xen_heap_page(page) is_xen_heap_mfn(page_to_mfn(page))
 #define is_xen_heap_mfn(mfn) ({ \
@@ -192,8 +185,6 @@ extern unsigned long total_pages;
 
 #define PDX_GROUP_SHIFT SECOND_SHIFT
 
-/* Boot-time pagetable setup */
-extern void setup_pagetables(unsigned long boot_phys_offset);
 /* Map FDT in boot pagetable */
 extern void *early_fdt_map(paddr_t fdt_paddr);
 /* Remove early mappings */
@@ -203,12 +194,6 @@ extern void remove_early_mappings(void);
 extern int init_secondary_pagetables(int cpu);
 /* Switch secondary CPUS to its own pagetables and finalise MMU setup */
 extern void mmu_init_secondary_cpu(void);
-/*
- * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
- * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
- * For Arm64, map the region in the directmap area.
- */
-extern void setup_directmap_mappings(unsigned long base_mfn, unsigned long 
nr_mfns);
 /* Map a frame table to cover physical addresses ps through pe */
 extern void setup_frametable_mappings(paddr_t ps, paddr_t pe);
 /* map a physical range in virtual memory */
@@ -256,6 +241,10 @@ static inline void __iomem *ioremap_wc(paddr_t start, 
size_t len)
 #define vmap_to_mfn(va) maddr_to_mfn(virt_to_maddr((vaddr_t)va))
 #define vmap_to_page(va)mfn_to_page(vmap_to_mfn(va))
 
+#ifndef CONFIG_HAS_MPU
+#include 
+#endif
+
 /* Page-align address and convert to frame number format */
 #define paddr_to_pfn_aligned(paddr)paddr_to_pfn(PAGE_ALIGN(paddr))
 
diff --git a/xen/arch/arm/include/asm/mm_mmu.h 
b/xen/arch/arm/include/asm/mm_mmu.h
new file mode 100644
index 00..a5e63d8af8
--- /dev/null
+++ b/xen/arch/arm/include/asm/mm_mmu.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARCH_ARM_MM_MMU__
+#define __ARCH_ARM_MM_MMU__
+
+extern mfn_t directmap_mfn_start, directmap_mfn_end;
+extern vaddr_t directmap_virt_end;
+#ifdef CONFIG_ARM_64
+extern vaddr_t directmap_virt_start;
+extern unsigned long directmap_base_pdx;
+#endif
+
+/* Boot-time pagetable setup */
+extern void setup_pagetables(unsigned long boot_phys_offset);
+#define setup_mm_mappings(boot_phys_offset) setup_pagetables(boot_phys_offset)
+
+/* Non-boot CPUs use this to find the correct pagetables. */
+extern uint64_t init_ttbr;
+/*
+ * For Arm32, set up the direct-mapped xenheap: up to 1GB of contiguous,
+ * always-mapped memory. Base must be 32MB aligned and size a multiple of 32MB.
+ * For Arm64, map the region in the directmap area.
+ */
+extern void setup_directmap_mappings(unsigned long base_mfn,
+ unsigned long nr_mfns);
+
+#endif /* __ARCH_ARM_MM_MMU__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 8f15814c5e..e1ce2a62dc 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -2,371 +2,24 @@
 /*
  * xen/arch/arm/mm.c
  *
- * MMU code for an ARMv7-A with virt extensions.
+ * Memory management common code for MMU and MPU system.
  *
  * Tim Deegan 
  * Copyright (c) 2011 Citrix Systems.
  */

[PATCH v2 40/40] xen/mpu: add Kconfig option to enable Armv8-R AArch64 support

2023-01-12 Thread Penny Zheng

Introduce a Kconfig option to enable Armv8-R64 architecture
support. STATIC_MEMORY and HAS_MPU will be selected by
ARM_V8R by default, because Armv8-R64 only has PMSAv8-64 on secure-EL2
and only supports statically configured system.

Signed-off-by: Wei Chen 
---
 xen/arch/arm/Kconfig | 13 +
 1 file changed, 13 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ee942a33bc..dc93b805a6 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -9,6 +9,15 @@ config ARM_64
select 64BIT
select HAS_FAST_MULTIPLY
 
+config ARM_V8R
+   bool "ARMv8-R AArch64 architecture support (UNSUPPORTED)" if UNSUPPORTED
+   default n
+   select STATIC_MEMORY
+   depends on ARM_64
+   help
+ This option enables Armv8-R profile for Arm64. Enabling this option
+ results in selecting MPU.
+
 config ARM
def_bool y
select HAS_ALTERNATIVE if !ARM_V8R
@@ -68,6 +77,10 @@ config HAS_ITS
 bool "GICv3 ITS MSI controller support (UNSUPPORTED)" if UNSUPPORTED
 depends on GICV3 && !NEW_VGIC && !ARM_32
 
+config HAS_MPU
+   bool "Protected Memory System Architecture"
+   depends on ARM_V8R
+
 config HVM
 def_bool y
 
-- 
2.25.1

[PATCH v2 17/40] xen/mpu: plump virt/maddr/mfn convertion in MPU system

2023-01-12 Thread Penny Zheng

virt_to_maddr and maddr_to_virt are used widely in Xen code. So
even there is no VMSA in MPU system, we keep the interface name to
stay the same code flow.

We move the existing virt/maddr convertion from mm.h to mm_mmu.h.
And the MPU version of virt/maddr convertion is simple, returning
the input address as the output.

We should overide virt_to_mfn/mfn_to_virt in source file mm_mpu.c the
same way in mm_mmu.c.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/mm.h | 26 --
 xen/arch/arm/include/asm/mm_mmu.h | 26 ++
 xen/arch/arm/include/asm/mm_mpu.h | 13 +
 xen/arch/arm/mm_mpu.c |  6 ++
 4 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 9b4c07d965..e29158028a 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -250,32 +250,6 @@ static inline void __iomem *ioremap_wc(paddr_t start, 
size_t len)
 /* Page-align address and convert to frame number format */
 #define paddr_to_pfn_aligned(paddr)paddr_to_pfn(PAGE_ALIGN(paddr))
 
-static inline paddr_t __virt_to_maddr(vaddr_t va)
-{
-uint64_t par = va_to_par(va);
-return (par & PADDR_MASK & PAGE_MASK) | (va & ~PAGE_MASK);
-}
-#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
-
-#ifdef CONFIG_ARM_32
-static inline void *maddr_to_virt(paddr_t ma)
-{
-ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
-ma -= mfn_to_maddr(directmap_mfn_start);
-return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
-}
-#else
-static inline void *maddr_to_virt(paddr_t ma)
-{
-ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
-   (DIRECTMAP_SIZE >> PAGE_SHIFT));
-return (void *)(XENHEAP_VIRT_START -
-(directmap_base_pdx << PAGE_SHIFT) +
-((ma & ma_va_bottom_mask) |
- ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
-}
-#endif
-
 /*
  * Translate a guest virtual address to a machine address.
  * Return the fault information if the translation has failed else 0.
diff --git a/xen/arch/arm/include/asm/mm_mmu.h 
b/xen/arch/arm/include/asm/mm_mmu.h
index a5e63d8af8..6d7e5ddde7 100644
--- a/xen/arch/arm/include/asm/mm_mmu.h
+++ b/xen/arch/arm/include/asm/mm_mmu.h
@@ -23,6 +23,32 @@ extern uint64_t init_ttbr;
 extern void setup_directmap_mappings(unsigned long base_mfn,
  unsigned long nr_mfns);
 
+static inline paddr_t __virt_to_maddr(vaddr_t va)
+{
+uint64_t par = va_to_par(va);
+return (par & PADDR_MASK & PAGE_MASK) | (va & ~PAGE_MASK);
+}
+#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
+
+#ifdef CONFIG_ARM_32
+static inline void *maddr_to_virt(paddr_t ma)
+{
+ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
+ma -= mfn_to_maddr(directmap_mfn_start);
+return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
+}
+#else
+static inline void *maddr_to_virt(paddr_t ma)
+{
+ASSERT((mfn_to_pdx(maddr_to_mfn(ma)) - directmap_base_pdx) <
+   (DIRECTMAP_SIZE >> PAGE_SHIFT));
+return (void *)(XENHEAP_VIRT_START -
+(directmap_base_pdx << PAGE_SHIFT) +
+((ma & ma_va_bottom_mask) |
+ ((ma & ma_top_mask) >> pfn_pdx_hole_shift)));
+}
+#endif
+
 #endif /* __ARCH_ARM_MM_MMU__ */
 
 /*
diff --git a/xen/arch/arm/include/asm/mm_mpu.h 
b/xen/arch/arm/include/asm/mm_mpu.h
index 1f3cff7743..3a4b07f187 100644
--- a/xen/arch/arm/include/asm/mm_mpu.h
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -4,6 +4,19 @@
 
 #define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
 
+static inline paddr_t __virt_to_maddr(vaddr_t va)
+{
+/* In MPU system, VA == PA. */
+return (paddr_t)va;
+}
+#define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
+
+static inline void *maddr_to_virt(paddr_t ma)
+{
+/* In MPU system, VA == PA. */
+return (void *)ma;
+}
+
 #endif /* __ARCH_ARM_MM_MPU__ */
 
 /*
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 87a12042cc..c9e17ab6da 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -29,6 +29,12 @@
 pr_t __aligned(PAGE_SIZE) __section(".data.page_aligned")
  xen_mpumap[ARM_MAX_MPU_MEMORY_REGIONS];
 
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef virt_to_mfn
+#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+#undef mfn_to_virt
+#define mfn_to_virt(mfn) __mfn_to_virt(mfn_x(mfn))
+
 /* Index into MPU memory region map for fixed regions, ascending from zero. */
 uint64_t __ro_after_init next_fixed_region_idx;
 /*
-- 
2.25.1

[PATCH v2 36/40] xen/mpu: Use secure hypervisor timer for AArch64v8R

2023-01-12 Thread Penny Zheng

As AArch64v8R only has one secure state, we have to use secure EL2 hypervisor
timer for Xen in secure EL2.

In this patch, we introduce a Kconfig option ARM_SECURE_STATE.
With this new Kconfig option, we can re-define the timer's
system register name in different secure state, but keep the
timer code flow unchanged.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/Kconfig |  7 +++
 xen/arch/arm/include/asm/arm64/sysregs.h | 21 -
 xen/arch/arm/include/asm/cpregs.h|  4 ++--
 xen/arch/arm/time.c  | 14 +++---
 4 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 91491341c4..ee942a33bc 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -47,6 +47,13 @@ config ARM_EFI
  be booted as an EFI application. This is only useful for
  Xen that may run on systems that have UEFI firmware.
 
+config ARM_SECURE_STATE
+   bool "Xen will run in Arm Secure State"
+   depends on ARM_V8R
+   help
+ In this state, a Processing Element (PE) can access the secure
+ physical address space, and the secure copy of banked registers.
+
 config GICV3
bool "GICv3 driver"
depends on !NEW_VGIC
diff --git a/xen/arch/arm/include/asm/arm64/sysregs.h 
b/xen/arch/arm/include/asm/arm64/sysregs.h
index c46daf6f69..9546e8e3d0 100644
--- a/xen/arch/arm/include/asm/arm64/sysregs.h
+++ b/xen/arch/arm/include/asm/arm64/sysregs.h
@@ -458,7 +458,6 @@
 #define ZCR_ELx_LEN_SIZE 9
 #define ZCR_ELx_LEN_MASK 0x1ff
 
-/* System registers for Armv8-R AArch64 */
 #ifdef CONFIG_HAS_MPU
 
 /* EL2 MPU Protection Region Base Address Register encode */
@@ -510,6 +509,26 @@
 
 #endif
 
+#ifdef CONFIG_ARM_SECURE_STATE
+/*
+ * The Armv8-R AArch64 architecture always executes code in Secure
+ * state with EL2 as the highest Exception.
+ *
+ * Hypervisor timer registers for Secure EL2.
+ */
+#define CNTHPS_TVAL_EL2  S3_4_C14_C5_0
+#define CNTHPS_CTL_EL2   S3_4_C14_C5_1
+#define CNTHPS_CVAL_EL2  S3_4_C14_C5_2
+#define CNTHPx_TVAL_EL2  CNTHPS_TVAL_EL2
+#define CNTHPx_CTL_EL2   CNTHPS_CTL_EL2
+#define CNTHPx_CVAL_EL2  CNTHPS_CVAL_EL2
+#else
+/* Hypervisor timer registers for Non-Secure EL2. */
+#define CNTHPx_TVAL_EL2  CNTHP_TVAL_EL2
+#define CNTHPx_CTL_EL2   CNTHP_CTL_EL2
+#define CNTHPx_CVAL_EL2  CNTHP_CVAL_EL2
+#endif /* CONFIG_ARM_SECURE_STATE */
+
 /* Access to system registers */
 
 #define WRITE_SYSREG64(v, name) do {\
diff --git a/xen/arch/arm/include/asm/cpregs.h 
b/xen/arch/arm/include/asm/cpregs.h
index 6b083de204..a704677fbc 100644
--- a/xen/arch/arm/include/asm/cpregs.h
+++ b/xen/arch/arm/include/asm/cpregs.h
@@ -374,8 +374,8 @@
 #define CLIDR_EL1   CLIDR
 #define CNTFRQ_EL0  CNTFRQ
 #define CNTHCTL_EL2 CNTHCTL
-#define CNTHP_CTL_EL2   CNTHP_CTL
-#define CNTHP_CVAL_EL2  CNTHP_CVAL
+#define CNTHPx_CTL_EL2  CNTHP_CTL
+#define CNTHPx_CVAL_EL2 CNTHP_CVAL
 #define CNTKCTL_EL1 CNTKCTL
 #define CNTPCT_EL0  CNTPCT
 #define CNTP_CTL_EL0CNTP_CTL
diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c
index 433d7be909..3bba733b83 100644
--- a/xen/arch/arm/time.c
+++ b/xen/arch/arm/time.c
@@ -196,13 +196,13 @@ int reprogram_timer(s_time_t timeout)
 
 if ( timeout == 0 )
 {
-WRITE_SYSREG(0, CNTHP_CTL_EL2);
+WRITE_SYSREG(0, CNTHPx_CTL_EL2);
 return 1;
 }
 
 deadline = ns_to_ticks(timeout) + boot_count;
-WRITE_SYSREG64(deadline, CNTHP_CVAL_EL2);
-WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHP_CTL_EL2);
+WRITE_SYSREG64(deadline, CNTHPx_CVAL_EL2);
+WRITE_SYSREG(CNTx_CTL_ENABLE, CNTHPx_CTL_EL2);
 isb();
 
 /* No need to check for timers in the past; the Generic Timer fires
@@ -213,7 +213,7 @@ int reprogram_timer(s_time_t timeout)
 /* Handle the firing timer */
 static void htimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
 {
-if ( unlikely(!(READ_SYSREG(CNTHP_CTL_EL2) & CNTx_CTL_PENDING)) )
+if ( unlikely(!(READ_SYSREG(CNTHPx_CTL_EL2) & CNTx_CTL_PENDING)) )
 return;
 
 perfc_incr(hyp_timer_irqs);
@@ -222,7 +222,7 @@ static void htimer_interrupt(int irq, void *dev_id, struct 
cpu_user_regs *regs)
 raise_softirq(TIMER_SOFTIRQ);
 
 /* Disable the timer to avoid more interrupts */
-WRITE_SYSREG(0, CNTHP_CTL_EL2);
+WRITE_SYSREG(0, CNTHPx_CTL_EL2);
 }
 
 static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
@@ -281,7 +281,7 @@ void init_timer_interrupt(void)
 /* Do not let the VMs program the physical timer, only read the physical 
counter */
 WRITE_SYSREG(CNTHCTL_EL2_EL1PCTEN, CNTHCTL_EL2);
 WRITE_SYSREG(0, CNTP_CTL_EL0);/* Physical timer disabled */
-WRITE_SYSREG(0, CNTHP_CTL_EL2);   /* Hypervisor's timer disabled */
+WRITE_SYSREG(0, CNTHPx_CTL_EL2);

[PATCH v2 20/40] xen/mpu: plump early_fdt_map in MPU systems

2023-01-12 Thread Penny Zheng

In MPU system, device tree binary can be packed with Xen
image through CONFIG_DTB_FILE, or provided by bootloader through x0.

In MPU system, each section in xen.lds.S is PAGE_SIZE aligned.
So in order to not overlap with the previous BSS section, dtb section
should be made page-aligned too.
We add . = ALIGN(PAGE_SIZE); in the head of dtb section to make it happen.

In this commit, we map early FDT with a transient MPU memory region at
rear with REGION_HYPERVISOR_BOOT.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/arm64/mpu.h |  5 +++
 xen/arch/arm/mm_mpu.c| 63 +---
 xen/arch/arm/xen.lds.S   |  5 ++-
 3 files changed, 67 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/include/asm/arm64/mpu.h 
b/xen/arch/arm/include/asm/arm64/mpu.h
index fcde6ad0db..b85e420a90 100644
--- a/xen/arch/arm/include/asm/arm64/mpu.h
+++ b/xen/arch/arm/include/asm/arm64/mpu.h
@@ -45,18 +45,22 @@
  * [3:4] Execute Never
  * [5:6] Access Permission
  * [7]   Region Present
+ * [8]   Boot-only Region
  */
 #define _REGION_AI_BIT0
 #define _REGION_XN_BIT3
 #define _REGION_AP_BIT5
 #define _REGION_PRESENT_BIT   7
+#define _REGION_BOOTONLY_BIT  8
 #define _REGION_XN(2U << _REGION_XN_BIT)
 #define _REGION_RO(2U << _REGION_AP_BIT)
 #define _REGION_PRESENT   (1U << _REGION_PRESENT_BIT)
+#define _REGION_BOOTONLY  (1U << _REGION_BOOTONLY_BIT)
 #define REGION_AI_MASK(x) (((x) >> _REGION_AI_BIT) & 0x7U)
 #define REGION_XN_MASK(x) (((x) >> _REGION_XN_BIT) & 0x3U)
 #define REGION_AP_MASK(x) (((x) >> _REGION_AP_BIT) & 0x3U)
 #define REGION_RO_MASK(x) (((x) >> _REGION_AP_BIT) & 0x2U)
+#define REGION_BOOTONLY_MASK(x)   (((x) >> _REGION_BOOTONLY_BIT) & 0x1U)
 
 /*
  * _REGION_NORMAL is convenience define. It is not meant to be used
@@ -68,6 +72,7 @@
 #define REGION_HYPERVISOR_RO  (_REGION_NORMAL|_REGION_XN|_REGION_RO)
 
 #define REGION_HYPERVISOR REGION_HYPERVISOR_RW
+#define REGION_HYPERVISOR_BOOT(REGION_HYPERVISOR_RW|_REGION_BOOTONLY)
 
 #define INVALID_REGION(~0UL)
 
diff --git a/xen/arch/arm/mm_mpu.c b/xen/arch/arm/mm_mpu.c
index 08720a7c19..b34dbf4515 100644
--- a/xen/arch/arm/mm_mpu.c
+++ b/xen/arch/arm/mm_mpu.c
@@ -20,11 +20,16 @@
  */
 
 #include 
+#include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #ifdef NDEBUG
 static inline void
@@ -62,6 +67,8 @@ uint64_t __ro_after_init max_xen_mpumap;
 
 static DEFINE_SPINLOCK(xen_mpumap_lock);
 
+static paddr_t dtb_paddr;
+
 /* Write a MPU protection region */
 #define WRITE_PROTECTION_REGION(sel, pr, prbar_el2, prlar_el2) ({   \
 uint64_t _sel = sel;\
@@ -403,7 +410,16 @@ static int xen_mpumap_update_entry(paddr_t base, paddr_t 
limit,
 
 /* During boot time, the default index is next_fixed_region_idx. */
 if ( system_state <= SYS_STATE_active )
-idx = next_fixed_region_idx;
+{
+/*
+ * If it is a boot-only region (i.e. region for early FDT),
+ * it shall be added from the tail for late init re-organizing
+ */
+if ( REGION_BOOTONLY_MASK(flags) )
+idx = next_transient_region_idx;
+else
+idx = next_fixed_region_idx;
+}
 
 xen_mpumap[idx] = pr_of_xenaddr(base, limit - 1, 
REGION_AI_MASK(flags));
 /* Set permission */
@@ -465,14 +481,51 @@ int map_pages_to_xen(unsigned long virt,
  mfn_to_maddr(mfn_add(mfn, nr_mfns)), flags);
 }
 
-/* TODO: Implementation on the first usage */
-void dump_hyp_walk(vaddr_t addr)
+void * __init early_fdt_map(paddr_t fdt_paddr)
 {
+void *fdt_virt;
+uint32_t size;
+
+/*
+ * Check whether the physical FDT address is set and meets the minimum
+ * alignment requirement. Since we are relying on MIN_FDT_ALIGN to be at
+ * least 8 bytes so that we always access the magic and size fields
+ * of the FDT header after mapping the first chunk, double check if
+ * that is indeed the case.
+ */
+ BUILD_BUG_ON(MIN_FDT_ALIGN < 8);
+ if ( !fdt_paddr || fdt_paddr % MIN_FDT_ALIGN )
+ return NULL;
+
+dtb_paddr = fdt_paddr;
+/*
+ * In MPU system, device tree binary can be packed with Xen image
+ * through CONFIG_DTB_FILE, or provided by bootloader through x0.
+ * Map FDT with a transient MPU memory region of MAX_FDT_SIZE.
+ * After that, we can do some magic check.
+ */
+if ( map_pages_to_xen(round_pgdown(fdt_paddr),
+  maddr_to_mfn(round_pgdown(fdt_paddr)),
+  round_pgup(MAX_FDT_SIZE) >> PAGE_SHIFT,
+  REGION_HYPERVISOR_BOOT) )
+panic("Unable to map the device-tree.\n");
+
+/* VA ==

[PATCH v2 16/40] xen/arm: introduce setup_mm_mappings

2023-01-12 Thread Penny Zheng

Function setup_pagetables is responsible for boot-time pagetable setup
in MMU system.
But in MPU system, we have already built up start-of-day Xen MPU memory region
mapping at the very beginning in assembly.

So in order to keep only one codeflow in arm/setup.c, setup_mm_mappings
, with a more generic name, is introduced and act as an empty stub in
MPU system.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/include/asm/mm.h |  2 ++
 xen/arch/arm/include/asm/mm_mpu.h | 16 
 xen/arch/arm/setup.c  |  2 +-
 3 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/include/asm/mm_mpu.h

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 1b9fdb6ff5..9b4c07d965 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -243,6 +243,8 @@ static inline void __iomem *ioremap_wc(paddr_t start, 
size_t len)
 
 #ifndef CONFIG_HAS_MPU
 #include 
+#else
+#include 
 #endif
 
 /* Page-align address and convert to frame number format */
diff --git a/xen/arch/arm/include/asm/mm_mpu.h 
b/xen/arch/arm/include/asm/mm_mpu.h
new file mode 100644
index 00..1f3cff7743
--- /dev/null
+++ b/xen/arch/arm/include/asm/mm_mpu.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef __ARCH_ARM_MM_MPU__
+#define __ARCH_ARM_MM_MPU__
+
+#define setup_mm_mappings(boot_phys_offset) ((void)(boot_phys_offset))
+
+#endif /* __ARCH_ARM_MM_MPU__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 1f26f67b90..d7d200179c 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -1003,7 +1003,7 @@ void __init start_xen(unsigned long boot_phys_offset,
 /* Initialize traps early allow us to get backtrace when an error occurred 
*/
 init_traps();
 
-setup_pagetables(boot_phys_offset);
+setup_mm_mappings(boot_phys_offset);
 
 smp_clear_cpu_maps();
 
-- 
2.25.1

[PATCH v2 11/40] xen/mpu: build up start-of-day Xen MPU memory region map

2023-01-12 Thread Penny Zheng

From: Penny Zheng 

The start-of-day Xen MPU memory region layout shall be like as follows:

xen_mpumap[0] : Xen text
xen_mpumap[1] : Xen read-only data
xen_mpumap[2] : Xen read-only after init data
xen_mpumap[3] : Xen read-write data
xen_mpumap[4] : Xen BSS
..
xen_mpumap[max_xen_mpumap - 2]: Xen init data
xen_mpumap[max_xen_mpumap - 1]: Xen init text

max_xen_mpumap refers to the number of regions supported by the EL2 MPU.
The layout shall be compliant with what we describe in xen.lds.S, or the
codes need adjustment.

As MMU system and MPU system have different functions to create
the boot MMU/MPU memory management data, instead of introducing
extra #ifdef in main code flow, we introduce a neutral name
prepare_early_mappings for both, and also to replace create_page_tables for MMU.

Signed-off-by: Penny Zheng 
Signed-off-by: Wei Chen 
---
 xen/arch/arm/arm64/Makefile  |   2 +
 xen/arch/arm/arm64/head.S|  17 +-
 xen/arch/arm/arm64/head_mmu.S|   4 +-
 xen/arch/arm/arm64/head_mpu.S| 323 +++
 xen/arch/arm/include/asm/arm64/mpu.h |  63 +
 xen/arch/arm/include/asm/arm64/sysregs.h |  49 
 xen/arch/arm/mm_mpu.c|  48 
 xen/arch/arm/xen.lds.S   |   4 +
 8 files changed, 502 insertions(+), 8 deletions(-)
 create mode 100644 xen/arch/arm/arm64/head_mpu.S
 create mode 100644 xen/arch/arm/include/asm/arm64/mpu.h
 create mode 100644 xen/arch/arm/mm_mpu.c

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 22da2f54b5..438c9737ad 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -10,6 +10,8 @@ obj-y += entry.o
 obj-y += head.o
 ifneq ($(CONFIG_HAS_MPU),y)
 obj-y += head_mmu.o
+else
+obj-y += head_mpu.o
 endif
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 782bd1f94c..145e3d53dc 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -68,9 +68,9 @@
  *  x24 -
  *  x25 -
  *  x26 - skip_zero_bss (boot cpu only)
- *  x27 -
- *  x28 -
- *  x29 -
+ *  x27 - region selector (mpu only)
+ *  x28 - prbar (mpu only)
+ *  x29 - prlar (mpu only)
  *  x30 - lr
  */
 
@@ -82,7 +82,7 @@
  * ---
  *
  * The requirements are:
- *   MMU = off, D-cache = off, I-cache = on or off,
+ *   MMU/MPU = off, D-cache = off, I-cache = on or off,
  *   x0 = physical address to the FDT blob.
  *
  * This must be the very first address in the loaded image.
@@ -252,7 +252,12 @@ real_start_efi:
 
 blcheck_cpu_mode
 blcpu_init
-blcreate_page_tables
+
+/*
+ * Create boot memory management data, pagetable for MMU systems
+ * and memory regions for MPU systems.
+ */
+blprepare_early_mappings
 blenable_mmu
 
 /* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
@@ -310,7 +315,7 @@ GLOBAL(init_secondary)
 #endif
 blcheck_cpu_mode
 blcpu_init
-blcreate_page_tables
+blprepare_early_mappings
 blenable_mmu
 
 /* We are still in the 1:1 mapping. Jump to the runtime Virtual 
Address. */
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index 6ff13c751c..2346f755df 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -123,7 +123,7 @@
  *
  * Clobbers x0 - x4
  */
-ENTRY(create_page_tables)
+ENTRY(prepare_early_mappings)
 /* Prepare the page-tables for mapping Xen */
 ldr   x0, =XEN_VIRT_START
 create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
@@ -208,7 +208,7 @@ virtphys_clash:
 /* Identity map clashes with boot_third, which we cannot handle yet */
 PRINT("- Unable to build boot page tables - virt and phys addresses 
clash. -\r\n")
 b fail
-ENDPROC(create_page_tables)
+ENDPROC(prepare_early_mappings)
 
 /*
  * Turn on the Data Cache and the MMU. The function will return on the 1:1
diff --git a/xen/arch/arm/arm64/head_mpu.S b/xen/arch/arm/arm64/head_mpu.S
new file mode 100644
index 00..0b97ce4646
--- /dev/null
+++ b/xen/arch/arm/arm64/head_mpu.S
@@ -0,0 +1,323 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Start-of-day code for an Armv8-R AArch64 MPU system.
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ * One entry in Xen MPU memory region mapping table(xen_mpumap) is a structure
+ * of pr_t, which is 16-bytes size, so the entry offset is the order of 4.
+ */
+#define MPU_ENTRY_SHIFT 0x4
+
+#define REGION_SEL_MASK 0xf
+
+#define REGION_TEXT_PRBAR   0x38/* SH=11 AP=10 XN=00 */
+#define REGION_RO_PRBAR 0x3A/* SH=11 AP=10 XN=10 */
+#define REGION_DATA_PRBAR   0x32/* SH=11 AP=00 XN=10 */
+
+#define REGION_NORMAL_PRLAR 0x0f/* NS=0 ATTR=111 EN=1 */
+
+/*
+ * Macro to round up the section address to

[PATCH v2 09/40] xen/arm: decouple copy_from_paddr with FIXMAP

2023-01-12 Thread Penny Zheng

From: Wei Chen 

copy_from_paddr will map a page to Xen's FIXMAP_MISC area for
temporary access. But for those systems do not support VMSA,
they can not implement set_fixmap/clear_fixmap, that means they
can't always use the same virtual address for source address.

In this case, we introduce to helpers to decouple copy_from_paddr
with set_fixmap/clear_fixmap. map_page_to_xen_misc can always
return the same virtual address as before for VMSA systems. It
also can return different address for non-VMSA systems.

Signed-off-by: Wei Chen 
---
v1 -> v2:
1. New patch
---
 xen/arch/arm/include/asm/setup.h |  4 
 xen/arch/arm/kernel.c| 13 +++--
 xen/arch/arm/mm.c| 12 
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index a926f30a2b..4f39a1aa0a 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -119,6 +119,10 @@ extern struct bootinfo bootinfo;
 
 extern domid_t max_init_domid;
 
+/* Map a page to misc area */
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes);
+/* Unmap the page from misc area */
+void unmap_page_from_xen_misc(void);
 void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
 
 size_t estimate_efi_size(unsigned int mem_nr_banks);
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 23b840ea9e..0475d8fae7 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -49,18 +49,19 @@ struct minimal_dtb_header {
  */
 void __init copy_from_paddr(void *dst, paddr_t paddr, unsigned long len)
 {
-void *src = (void *)FIXMAP_ADDR(FIXMAP_MISC);
-
-while (len) {
+while ( len )
+{
+void *src;
 unsigned long l, s;
 
-s = paddr & (PAGE_SIZE-1);
+s = paddr & (PAGE_SIZE - 1);
 l = min(PAGE_SIZE - s, len);
 
-set_fixmap(FIXMAP_MISC, maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+src = map_page_to_xen_misc(maddr_to_mfn(paddr), PAGE_HYPERVISOR_WC);
+ASSERT(src != NULL);
 memcpy(dst, src + s, l);
 clean_dcache_va_range(dst, l);
-clear_fixmap(FIXMAP_MISC);
+unmap_page_from_xen_misc();
 
 paddr += l;
 dst += l;
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 0fc6f2992d..8f15814c5e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -355,6 +355,18 @@ void clear_fixmap(unsigned int map)
 BUG_ON(res != 0);
 }
 
+void *map_page_to_xen_misc(mfn_t mfn, unsigned int attributes)
+{
+set_fixmap(FIXMAP_MISC, mfn, attributes);
+
+return fix_to_virt(FIXMAP_MISC);
+}
+
+void unmap_page_from_xen_misc(void)
+{
+clear_fixmap(FIXMAP_MISC);
+}
+
 void flush_page_to_ram(unsigned long mfn, bool sync_icache)
 {
 void *v = map_domain_page(_mfn(mfn));
-- 
2.25.1

[PATCH v2 08/40] xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R

2023-01-12 Thread Penny Zheng

From: Wei Chen 

There is no VMSA support on Armv8-R AArch64, so we can not map early
UART to FIXMAP_CONSOLE. Instead, we use PA == VA to define
EARLY_UART_VIRTUAL_ADDRESS on Armv8-R AArch64.

Signed-off-by: Wei Chen 
---
1. New patch
---
 xen/arch/arm/include/asm/early_printk.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/xen/arch/arm/include/asm/early_printk.h 
b/xen/arch/arm/include/asm/early_printk.h
index c5149b2976..44a230853f 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -15,10 +15,22 @@
 
 #ifdef CONFIG_EARLY_PRINTK
 
+#ifdef CONFIG_ARM_V8R
+
+/*
+ * For Armv-8r, there is not VMSA support in EL2, so we use VA == PA
+ * for EARLY_UART_VIRTUAL_ADDRESS.
+ */
+#define EARLY_UART_VIRTUAL_ADDRESS CONFIG_EARLY_UART_BASE_ADDRESS
+
+#else
+
 /* need to add the uart address offset in page to the fixmap address */
 #define EARLY_UART_VIRTUAL_ADDRESS \
 (FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
~PAGE_MASK))
 
+#endif /* CONFIG_ARM_V8R */
+
 #endif /* !CONFIG_EARLY_PRINTK */
 
 #endif
-- 
2.25.1

[PATCH v2 06/40] xen/arm64: move MMU related code from head.S to head_mmu.S

2023-01-12 Thread Penny Zheng

From: Wei Chen 

There are lots of MMU specific code in head.S. This code will not
be used in MPU systems. If we use #ifdef to gate them, the code
will become messy and hard to maintain. So we move MMU related
code to head_mmu.S, and keep common code still in head.S.

And some assembly macros that will be shared by MMU and MPU later,
we move them to macros.h.

Signed-off-by: Wei Chen 
Signed-off-by: Henry Wang 
---
v1 -> v2:
1. Move macros to macros.h
2. Remove the indention modification
3. Duplicate "fail" instead of exporting it.
---
 xen/arch/arm/arm64/Makefile |   3 +
 xen/arch/arm/arm64/head.S   | 383 
 xen/arch/arm/arm64/head_mmu.S   | 372 +++
 xen/arch/arm/include/asm/arm64/macros.h |  51 
 4 files changed, 426 insertions(+), 383 deletions(-)
 create mode 100644 xen/arch/arm/arm64/head_mmu.S

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index 6d507da0d4..22da2f54b5 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -8,6 +8,9 @@ obj-y += domctl.o
 obj-y += domain.o
 obj-y += entry.o
 obj-y += head.o
+ifneq ($(CONFIG_HAS_MPU),y)
+obj-y += head_mmu.o
+endif
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
 obj-y += smc.o
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index b2214bc5e3..5cfa47279b 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -28,17 +28,6 @@
 #include 
 #endif
 
-#define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_MEM0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
-#define PT_MEM_L3 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
-#define PT_DEV0xe71 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=0 P=1 */
-#define PT_DEV_L3 0xe73 /* nG=1 AF=1 SH=10 AP=01 NS=1 ATTR=100 T=1 P=1 */
-
-/* Convenience defines to get slot used by Xen mapping. */
-#define XEN_ZEROETH_SLOTzeroeth_table_offset(XEN_VIRT_START)
-#define XEN_FIRST_SLOT  first_table_offset(XEN_VIRT_START)
-#define XEN_SECOND_SLOT second_table_offset(XEN_VIRT_START)
-
 #define __HEAD_FLAG_PAGE_SIZE   ((PAGE_SHIFT - 10) / 2)
 
 #define __HEAD_FLAG_PHYS_BASE   1
@@ -85,57 +74,6 @@
  *  x30 - lr
  */
 
-#ifdef CONFIG_EARLY_PRINTK
-/*
- * Macro to print a string to the UART, if there is one.
- *
- * Clobbers x0 - x3
- */
-#define PRINT(_s)  \
-mov   x3, lr ; \
-adr   x0, 98f ;\
-blputs;\
-mov   lr, x3 ; \
-RODATA_STR(98, _s)
-
-/*
- * Macro to print the value of register \xb
- *
- * Clobbers x0 - x4
- */
-.macro print_reg xb
-mov   x0, \xb
-mov   x4, lr
-blputn
-mov   lr, x4
-.endm
-
-#else /* CONFIG_EARLY_PRINTK */
-#define PRINT(s)
-
-.macro print_reg xb
-.endm
-
-#endif /* !CONFIG_EARLY_PRINTK */
-
-/*
- * Pseudo-op for PC relative adr ,  where  is
- * within the range +/- 4GB of the PC.
- *
- * @dst: destination register (64 bit wide)
- * @sym: name of the symbol
- */
-.macro  adr_l, dst, sym
-adrp \dst, \sym
-add  \dst, \dst, :lo12:\sym
-.endm
-
-/* Load the physical address of a symbol into xb */
-.macro load_paddr xb, sym
-ldr \xb, =\sym
-add \xb, \xb, x20
-.endm
-
 .section .text.header, "ax", %progbits
 /*.aarch64*/
 
@@ -500,296 +438,6 @@ cpu_init:
 ret
 ENDPROC(cpu_init)
 
-/*
- * Macro to find the slot number at a given page-table level
- *
- * slot: slot computed
- * virt: virtual address
- * lvl:  page-table level
- */
-.macro get_table_slot, slot, virt, lvl
-ubfx  \slot, \virt, #XEN_PT_LEVEL_SHIFT(\lvl), #XEN_PT_LPAE_SHIFT
-.endm
-
-/*
- * Macro to create a page table entry in \ptbl to \tbl
- *
- * ptbl:table symbol where the entry will be created
- * tbl: table symbol to point to
- * virt:virtual address
- * lvl: page-table level
- * tmp1:scratch register
- * tmp2:scratch register
- * tmp3:scratch register
- *
- * Preserves \virt
- * Clobbers \tmp1, \tmp2, \tmp3
- *
- * Also use x20 for the phys offset.
- *
- * Note that all parameters using registers should be distinct.
- */
-.macro create_table_entry, ptbl, tbl, virt, lvl, tmp1, tmp2, tmp3
-get_table_slot \tmp1, \virt, \lvl   /* \tmp1 := slot in \tlb */
-
-load_paddr \tmp2, \tbl
-mov   \tmp3, #PT_PT /* \tmp3 := right for linear PT */
-orr   \tmp3, \tmp3, \tmp2   /*  + \tlb paddr */
-
-adr_l \tmp2, \ptbl
-
-str   \tmp3, [\tmp2, \tmp1, lsl #3]
-.endm
-
-/*
- * Macro to create a mapping entry in \tbl to \phys. Only mapping in 3rd
- * level table (i.e page granularity) is supported.
- *
- * ptbl: table symbol where the entry will be created
- * virt:virtual address
- * phys:physical address (should be page aligned)
- * tmp1:scratch register
- * tmp2:scratch register
- * tmp3:scratch register
- * type:mapping type. If not

[PATCH v2 07/40] xen/arm64: add .text.idmap for Xen identity map sections

2023-01-12 Thread Penny Zheng

From: Wei Chen 

Only the first 4KB of Xen image will be mapped as identity
(PA == VA). At the moment, Xen guarantees this by having
everything that needs to be used in the identity mapping
in head.S before _end_boot and checking at link time if this
fits in 4KB.

In previous patch, we have moved the MMU code outside of
head.S. Although we have added .text.header to the new file
to guarantee all identity map code still in the first 4KB.
However, the order of these two files on this 4KB depends
on the build tools. Currently, we use the build tools to
process the order of objs in the Makefile to ensure that
head.S must be at the top. But if you change to another build
tools, it may not be the same result.

In this patch we introduce .text.idmap to head_mmu.S, and
add this section after .text.header. to ensure code of
head_mmu.S after the code of header.S.

After this, we will still include some code that does not
belong to identity map before _end_boot. Because we have
moved _end_boot to head_mmu.S. That means all code in head.S
will be included before _end_boot. In this patch, we also
added .text flag in the place of original _end_boot in head.S.
All the code after .text in head.S will not be included in
identity map section.

Signed-off-by: Wei Chen 
---
v1 -> v2:
1. New patch.
---
 xen/arch/arm/arm64/head.S | 6 ++
 xen/arch/arm/arm64/head_mmu.S | 2 +-
 xen/arch/arm/xen.lds.S| 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 5cfa47279b..782bd1f94c 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -466,6 +466,12 @@ fail:   PRINT("- Boot failed -\r\n")
 b 1b
 ENDPROC(fail)
 
+/*
+ * For the code that do not need in indentity map section,
+ * we put them back to normal .text section
+ */
+.section .text, "ax", %progbits
+
 #ifdef CONFIG_EARLY_PRINTK
 /*
  * Initialize the UART. Should only be called on the boot CPU.
diff --git a/xen/arch/arm/arm64/head_mmu.S b/xen/arch/arm/arm64/head_mmu.S
index e2c8f07140..6ff13c751c 100644
--- a/xen/arch/arm/arm64/head_mmu.S
+++ b/xen/arch/arm/arm64/head_mmu.S
@@ -105,7 +105,7 @@
 str   \tmp2, [\tmp3, \tmp1, lsl #3]
 .endm
 
-.section .text.header, "ax", %progbits
+.section .text.idmap, "ax", %progbits
 /*.aarch64*/
 
 /*
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 92c2984052..bc45ea2c65 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -33,6 +33,7 @@ SECTIONS
   .text : {
 _stext = .;/* Text section */
*(.text.header)
+   *(.text.idmap)
 
*(.text.cold)
*(.text.unlikely .text.*_unlikely .text.unlikely.*)
-- 
2.25.1

[PATCH v2 05/40] xen/arm64: prepare for moving MMU related code from head.S

2023-01-12 Thread Penny Zheng

From: Wei Chen 

We want to reuse head.S for MPU systems, but there are some
code implemented for MMU systems only. We will move such
code to another MMU specific file. But before that, we will
do some preparations in this patch to make them easier
for reviewing:
1. Fix the indentations of code comments.
2. Export some symbols that will be accessed out of file
   scope.

Signed-off-by: Wei Chen 
---
v1 -> v2:
1. New patch.
---
 xen/arch/arm/arm64/head.S | 40 +++
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 93f9b0b9d5..b2214bc5e3 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -136,22 +136,22 @@
 add \xb, \xb, x20
 .endm
 
-.section .text.header, "ax", %progbits
-/*.aarch64*/
+.section .text.header, "ax", %progbits
+/*.aarch64*/
 
-/*
- * Kernel startup entry point.
- * ---
- *
- * The requirements are:
- *   MMU = off, D-cache = off, I-cache = on or off,
- *   x0 = physical address to the FDT blob.
- *
- * This must be the very first address in the loaded image.
- * It should be linked at XEN_VIRT_START, and loaded at any
- * 4K-aligned address.  All of text+data+bss must fit in 2MB,
- * or the initial pagetable code below will need adjustment.
- */
+/*
+ * Kernel startup entry point.
+ * ---
+ *
+ * The requirements are:
+ *   MMU = off, D-cache = off, I-cache = on or off,
+ *   x0 = physical address to the FDT blob.
+ *
+ * This must be the very first address in the loaded image.
+ * It should be linked at XEN_VIRT_START, and loaded at any
+ * 4K-aligned address.  All of text+data+bss must fit in 2MB,
+ * or the initial pagetable code below will need adjustment.
+ */
 
 GLOBAL(start)
 /*
@@ -586,7 +586,7 @@ ENDPROC(cpu_init)
  *
  * Clobbers x0 - x4
  */
-create_page_tables:
+ENTRY(create_page_tables)
 /* Prepare the page-tables for mapping Xen */
 ldr   x0, =XEN_VIRT_START
 create_table_entry boot_pgtable, boot_first, x0, 0, x1, x2, x3
@@ -680,7 +680,7 @@ ENDPROC(create_page_tables)
  *
  * Clobbers x0 - x3
  */
-enable_mmu:
+ENTRY(enable_mmu)
 PRINT("- Turning on paging -\r\n")
 
 /*
@@ -714,7 +714,7 @@ ENDPROC(enable_mmu)
  *
  * Clobbers x0 - x1
  */
-remove_identity_mapping:
+ENTRY(remove_identity_mapping)
 /*
  * Find the zeroeth slot used. Remove the entry from zeroeth
  * table if the slot is not XEN_ZEROETH_SLOT.
@@ -775,7 +775,7 @@ ENDPROC(remove_identity_mapping)
  *
  * Clobbers x0 - x3
  */
-setup_fixmap:
+ENTRY(setup_fixmap)
 #ifdef CONFIG_EARLY_PRINTK
 /* Add UART to the fixmap table */
 ldr   x0, =EARLY_UART_VIRTUAL_ADDRESS
@@ -871,7 +871,7 @@ ENDPROC(init_uart)
  * x0: Nul-terminated string to print.
  * x23: Early UART base address
  * Clobbers x0-x1 */
-puts:
+ENTRY(puts)
 early_uart_ready x23, 1
 ldrb  w1, [x0], #1   /* Load next char */
 cbz   w1, 1f /* Exit on nul */
-- 
2.25.1

[PATCH v2 04/40] xen/arm: add an option to define Xen start address for Armv8-R

2023-01-12 Thread Penny Zheng

From: Wei Chen 

On Armv8-A, Xen has a fixed virtual start address (link address
too) for all Armv8-A platforms. In an MMU based system, Xen can
map its loaded address to this virtual start address. So, on
Armv8-A platforms, the Xen start address does not need to be
configurable. But on Armv8-R platforms, there is no MMU to map
loaded address to a fixed virtual address and different platforms
will have very different address space layout. So Xen cannot use
a fixed physical address on MPU based system and need to have it
configurable.

In this patch we introduce one Kconfig option for users to define
the default Xen start address for Armv8-R. Users can enter the
address in config time, or select the tailored platform config
file from arch/arm/configs.

And as we introduced Armv8-R platforms to Xen, that means the
existed Arm64 platforms should not be listed in Armv8-R platform
list, so we add !ARM_V8R dependency for these platforms.

Signed-off-by: Wei Chen 
Signed-off-by: Jiamei.Xie 
---
v1 -> v2:
1. Remove the platform header fvp_baser.h.
2. Remove the default start address for fvp_baser64.
3. Remove the description of default address from commit log.
4. Change HAS_MPU to ARM_V8R for Xen start address dependency.
   No matter Arm-v8r board has MPU or not, it always need to
   specify the start address.
---
 xen/arch/arm/Kconfig   |  8 
 xen/arch/arm/platforms/Kconfig | 16 +---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index ace7178c9a..c6b6b612d1 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -145,6 +145,14 @@ config TEE
  This option enables generic TEE mediators support. It allows guests
  to access real TEE via one of TEE mediators implemented in XEN.
 
+config XEN_START_ADDRESS
+   hex "Xen start address: keep default to use platform defined address"
+   default 0
+   depends on ARM_V8R
+   help
+ This option allows to set the customized address at which Xen will be
+ linked on MPU systems. This address must be aligned to a page size.
+
 source "arch/arm/tee/Kconfig"
 
 config STATIC_SHM
diff --git a/xen/arch/arm/platforms/Kconfig b/xen/arch/arm/platforms/Kconfig
index c93a6b2756..0904793a0b 100644
--- a/xen/arch/arm/platforms/Kconfig
+++ b/xen/arch/arm/platforms/Kconfig
@@ -1,6 +1,7 @@
 choice
prompt "Platform Support"
default ALL_PLAT
+   default FVP_BASER if ARM_V8R
---help---
Choose which hardware platform to enable in Xen.
 
@@ -8,13 +9,14 @@ choice
 
 config ALL_PLAT
bool "All Platforms"
+   depends on !ARM_V8R
---help---
Enable support for all available hardware platforms. It doesn't
automatically select any of the related drivers.
 
 config QEMU
bool "QEMU aarch virt machine support"
-   depends on ARM_64
+   depends on ARM_64 && !ARM_V8R
select GICV3
select HAS_PL011
---help---
@@ -23,7 +25,7 @@ config QEMU
 
 config RCAR3
bool "Renesas RCar3 support"
-   depends on ARM_64
+   depends on ARM_64 && !ARM_V8R
select HAS_SCIF
select IPMMU_VMSA
---help---
@@ -31,14 +33,22 @@ config RCAR3
 
 config MPSOC
bool "Xilinx Ultrascale+ MPSoC support"
-   depends on ARM_64
+   depends on ARM_64 && !ARM_V8R
select HAS_CADENCE_UART
select ARM_SMMU
---help---
Enable all the required drivers for Xilinx Ultrascale+ MPSoC
 
+config FVP_BASER
+   bool "Fixed Virtual Platform BaseR support"
+   depends on ARM_V8R
+   help
+ Enable platform specific configurations for Fixed Virtual
+ Platform BaseR
+
 config NO_PLAT
bool "No Platforms"
+   depends on !ARM_V8R
---help---
Do not enable specific support for any platform.
 
-- 
2.25.1

[PATCH v2 03/40] xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA

2023-01-12 Thread Penny Zheng

From: Wei Chen 

>From Arm ARM Supplement of Armv8-R AArch64 (DDI 0600A) [1],
section D1.6.2 TLB maintenance instructions, we know that
Armv8-R AArch64 permits an implementation to cache stage 1
VMSAv8-64 and stage 2 PMSAv8-64 attributes as a common entry
for the Secure EL1&0 translation regime. But for Xen itself,
it's running with stage 1 PMSAv8-64 on Armv8-R AArch64. The
EL2 MPU updates for stage 1 PMSAv8-64 will not be cached in
TLB entries. So we don't need any TLB invalidation for Xen
itself in EL2.

So in this patch, we use empty macros to stub Xen TLB helpers
for MPU system (PMSA), but still keep the Guest TLB helpers.
Because when a guest running in EL1 with VMSAv8-64 (MMU), guest
TLB invalidation is still needed. But we need some policy to
distinguish MPU and MMU guest, this will be done in guest
support of Armv8-R AArch64 later.

[1] https://developer.arm.com/documentation/ddi0600/ac

Signed-off-by: Wei Chen 
---
v1 -> v2:
1. No change.
---
 xen/arch/arm/include/asm/arm64/flushtlb.h | 25 +++
 xen/arch/arm/include/asm/flushtlb.h   | 22 
 2 files changed, 47 insertions(+)

diff --git a/xen/arch/arm/include/asm/arm64/flushtlb.h 
b/xen/arch/arm/include/asm/arm64/flushtlb.h
index 7c54315187..fe445f6831 100644
--- a/xen/arch/arm/include/asm/arm64/flushtlb.h
+++ b/xen/arch/arm/include/asm/arm64/flushtlb.h
@@ -51,6 +51,8 @@ TLB_HELPER(flush_all_guests_tlb_local, alle1);
 /* Flush innershareable TLBs, all VMIDs, non-hypervisor mode */
 TLB_HELPER(flush_all_guests_tlb, alle1is);
 
+#ifndef CONFIG_HAS_MPU
+
 /* Flush all hypervisor mappings from the TLB of the local processor. */
 TLB_HELPER(flush_xen_tlb_local, alle2);
 
@@ -66,6 +68,29 @@ static inline void __flush_xen_tlb_one(vaddr_t va)
 asm volatile("tlbi vae2is, %0;" : : "r" (va>>PAGE_SHIFT) : "memory");
 }
 
+#else
+
+/*
+ * When Xen is running with stage 1 PMSAv8-64 on MPU systems. The EL2 MPU
+ * updates for stage1 PMSAv8-64 will not be cached in TLB entries. So we
+ * don't need any TLB invalidation for Xen itself in EL2. See Arm ARM
+ * Supplement of Armv8-R AArch64 (DDI 0600A), section D1.6.2 TLB maintenance
+ * instructions for more details.
+ */
+static inline void flush_xen_tlb_local(void)
+{
+}
+
+static inline void  __flush_xen_tlb_one_local(vaddr_t va)
+{
+}
+
+static inline void __flush_xen_tlb_one(vaddr_t va)
+{
+}
+
+#endif /* CONFIG_HAS_MPU */
+
 #endif /* __ASM_ARM_ARM64_FLUSHTLB_H__ */
 /*
  * Local variables:
diff --git a/xen/arch/arm/include/asm/flushtlb.h 
b/xen/arch/arm/include/asm/flushtlb.h
index 125a141975..4b8bf65281 100644
--- a/xen/arch/arm/include/asm/flushtlb.h
+++ b/xen/arch/arm/include/asm/flushtlb.h
@@ -28,6 +28,7 @@ static inline void page_set_tlbflush_timestamp(struct 
page_info *page)
 /* Flush specified CPUs' TLBs */
 void arch_flush_tlb_mask(const cpumask_t *mask);
 
+#ifndef CONFIG_HAS_MPU
 /*
  * Flush a range of VA's hypervisor mappings from the TLB of the local
  * processor.
@@ -66,6 +67,27 @@ static inline void flush_xen_tlb_range_va(vaddr_t va,
 isb();
 }
 
+#else
+
+/*
+ * When Xen is running with stage 1 PMSAv8-64 on MPU systems. The EL2 MPU
+ * updates for stage1 PMSAv8-64 will not be cached in TLB entries. So we
+ * don't need any TLB invalidation for Xen itself in EL2. See Arm ARM
+ * Supplement of Armv8-R AArch64 (DDI 0600A), section D1.6.2 TLB maintenance
+ * instructions for more details.
+ */
+static inline void flush_xen_tlb_range_va_local(vaddr_t va,
+unsigned long size)
+{
+}
+
+static inline void flush_xen_tlb_range_va(vaddr_t va,
+  unsigned long size)
+{
+}
+
+#endif /* CONFIG_HAS_MPU */
+
 #endif /* __ASM_ARM_FLUSHTLB_H__ */
 /*
  * Local variables:
-- 
2.25.1

[PATCH v2 00/41] xen/arm: Add Armv8-R64 MPU support to Xen - Part#1

2023-01-12 Thread Penny Zheng

The Armv8-R architecture profile was designed to support use cases
that have a high sensitivity to deterministic execution. (e.g.
Fuel Injection, Brake control, Drive trains, Motor control etc)

Arm announced Armv8-R in 2013, it is the latest generation Arm
architecture targeted at the Real-time profile. It introduces
virtualization at the highest security level while retaining the
Protected Memory System Architecture (PMSA) based on a Memory
Protection Unit (MPU). In 2020, Arm announced Cortex-R82,
which is the first Arm 64-bit Cortex-R processor based on Armv8-R64.
The latest Armv8-R64 document can be found [1]. And the features of
Armv8-R64 architecture:
  - An exception model that is compatible with the Armv8-A model
  - Virtualization with support for guest operating systems
  - PMSA virtualization using MPUs In EL2.
  - Adds support for the 64-bit A64 instruction set.
  - Supports up to 48-bit physical addressing.
  - Supports three Exception Levels (ELs)
- Secure EL2 - The Highest Privilege
- Secure EL1 - RichOS (MMU) or RTOS (MPU)
- Secure EL0 - Application Workloads
 - Supports only a single Security state - Secure.
 - MPU in EL1 & EL2 is configurable, MMU in EL1 is configurable.

These patch series are implementing the Armv8-R64 MPU support
for Xen, which are based on the discussion of
"Proposal for Porting Xen to Armv8-R64 - DraftC" [2].

We will implement the Armv8-R64 and MPU support in three stages:
1. Boot Xen itself to idle thread, do not create any guests on it.
2. Support to boot MPU and MMU domains on Armv8-R64 Xen.
3. SMP and other advanced features of Xen support on Armv8-R64.

As we have not implemented guest support in part#1 series of MPU
support, Xen can not create any guest in boot time. So in this
patch serie, we provide an extra DNM-commit in the last for users
to test Xen boot to idle on MPU system.

We will split these patches to several parts, this series is the
part#1, v1 is in [3], the full PoC can be found in [4]. More software for
Armv8-R64 can be found in [5];

[1] https://developer.arm.com/documentation/ddi0600/latest
[2] https://lists.xenproject.org/archives/html/xen-devel/2022-05/msg00643.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2022-11/msg00289.html
[4] https://gitlab.com/xen-project/people/weic/xen/-/tree/integration/mpu_v2
[5] https://armv8r64-refstack.docs.arm.com/en/v5.0/

Penny Zheng (28):
  xen/mpu: build up start-of-day Xen MPU memory region map
  xen/mpu: introduce helpers for MPU enablement
  xen/mpu: introduce unified function setup_early_uart to map early UART
  xen/arm64: head: Jump to the runtime mapping in enable_mm()
  xen/arm: introduce setup_mm_mappings
  xen/mpu: plump virt/maddr/mfn convertion in MPU system
  xen/mpu: introduce helper access_protection_region
  xen/mpu: populate a new region in Xen MPU mapping table
  xen/mpu: plump early_fdt_map in MPU systems
  xen/arm: move MMU-specific setup_mm to setup_mmu.c
  xen/mpu: implement MPU version of setup_mm in setup_mpu.c
  xen/mpu: initialize frametable in MPU system
  xen/mpu: introduce "mpu,xxx-memory-section"
  xen/mpu: map MPU guest memory section before static memory
initialization
  xen/mpu: destroy an existing entry in Xen MPU memory mapping table
  xen/mpu: map device memory resource in MPU system
  xen/mpu: map boot module section in MPU system
  xen/mpu: introduce mpu_memory_section_contains for address range check
  xen/mpu: disable VMAP sub-system for MPU systems
  xen/mpu: disable FIXMAP in MPU system
  xen/mpu: implement MPU version of ioremap_xxx
  xen/mpu: free init memory in MPU system
  xen/mpu: destroy boot modules and early FDT mapping in MPU system
  xen/mpu: Use secure hypervisor timer for AArch64v8R
  xen/mpu: move MMU specific P2M code to p2m_mmu.c
  xen/mpu: implement setup_virt_paging for MPU system
  xen/mpu: re-order xen_mpumap in arch_init_finialize
  xen/mpu: add Kconfig option to enable Armv8-R AArch64 support

Wei Chen (13):
  xen/arm: remove xen_phys_start and xenheap_phys_end from config.h
  xen/arm: make ARM_EFI selectable for Arm64
  xen/arm: adjust Xen TLB helpers for Armv8-R64 PMSA
  xen/arm: add an option to define Xen start address for Armv8-R
  xen/arm64: prepare for moving MMU related code from head.S
  xen/arm64: move MMU related code from head.S to head_mmu.S
  xen/arm64: add .text.idmap for Xen identity map sections
  xen/arm: use PA == VA for EARLY_UART_VIRTUAL_ADDRESS on Armv-8R
  xen/arm: decouple copy_from_paddr with FIXMAP
  xen/arm: split MMU and MPU config files from config.h
  xen/arm: move MMU-specific memory management code to mm_mmu.c/mm_mmu.h
  xen/arm: check mapping status and attributes for MPU copy_from_paddr
  xen/mpu: make Xen boot to idle on MPU systems(DNM)

 xen/arch/arm/Kconfig  |   44 +-
 xen/arch/arm/Makefile |   17 +-
 xen/arch/arm/arm64/Makefile   |5 +
 xen/arch/arm/arm64/head.S |  466 +

[PATCH v2 01/40] xen/arm: remove xen_phys_start and xenheap_phys_end from config.h

2023-01-12 Thread Penny Zheng

From: Wei Chen 

These two variables are stale variables, they only have declarations
in config.h, they don't have any definition and no any code is using
these two variables. So in this patch, we remove them from config.h.

Signed-off-by: Wei Chen 
Acked-by: Julien Grall 
---
v1 -> v2:
1. Add Ab.
---
 xen/arch/arm/include/asm/config.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/xen/arch/arm/include/asm/config.h 
b/xen/arch/arm/include/asm/config.h
index 0fefed1b8a..25a625ff08 100644
--- a/xen/arch/arm/include/asm/config.h
+++ b/xen/arch/arm/include/asm/config.h
@@ -172,8 +172,6 @@
 #define STACK_SIZE  (PAGE_SIZE << STACK_ORDER)
 
 #ifndef __ASSEMBLY__
-extern unsigned long xen_phys_start;
-extern unsigned long xenheap_phys_end;
 extern unsigned long frametable_virt_end;
 #endif
 
-- 
2.25.1

[PATCH v2 02/40] xen/arm: make ARM_EFI selectable for Arm64

2023-01-12 Thread Penny Zheng

From: Wei Chen 

Currently, ARM_EFI will mandatorily selected by Arm64.
Even if the user knows for sure that their images will not
start in the EFI environment, they can't disable the EFI
support for Arm64. This means there will be about 3K lines
unused code in their images.

So in this patch, we make ARM_EFI selectable for Arm64, and
based on that, we can use CONFIG_ARM_EFI to gate the EFI
specific code in head.S for those images that will not be
booted in EFI environment.

Signed-off-by: Wei Chen 
---
v1 -> v2:
1. New patch
---
 xen/arch/arm/Kconfig  | 10 --
 xen/arch/arm/arm64/head.S | 15 +--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 239d3aed3c..ace7178c9a 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -7,7 +7,6 @@ config ARM_64
def_bool y
depends on !ARM_32
select 64BIT
-   select ARM_EFI
select HAS_FAST_MULTIPLY
 
 config ARM
@@ -37,7 +36,14 @@ config ACPI
  an alternative to device tree on ARM64.
 
 config ARM_EFI
-   bool
+   bool "UEFI boot service support"
+   depends on ARM_64
+   default y
+   help
+ This option provides support for boot services through
+ UEFI firmware. A UEFI stub is provided to allow Xen to
+ be booted as an EFI application. This is only useful for
+ Xen that may run on systems that have UEFI firmware.
 
 config GICV3
bool "GICv3 driver"
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index ad014716db..93f9b0b9d5 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -22,8 +22,11 @@
 
 #include 
 #include 
+
+#ifdef CONFIG_ARM_EFI
 #include 
 #include 
+#endif
 
 #define PT_PT 0xf7f /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=1 P=1 */
 #define PT_MEM0xf7d /* nG=1 AF=1 SH=11 AP=01 NS=1 ATTR=111 T=0 P=1 */
@@ -172,8 +175,10 @@ efi_head:
 .byte   0x52
 .byte   0x4d
 .byte   0x64
-.long   pe_header - efi_head/* Offset to the PE header. */
-
+#ifndef CONFIG_ARM_EFI
+.long   0/* 0 means no PE header. */
+#else
+.long   pe_header - efi_head /* Offset to the PE header. */
 /*
  * Add the PE/COFF header to the file.  The address of this header
  * is at offset 0x3c in the file, and is part of Linux "Image"
@@ -279,6 +284,8 @@ section_table:
 .short  0/* NumberOfLineNumbers  (0 for executables) */
 .long   0xe0500020   /* Characteristics (section flags) */
 .align  5
+#endif /* CONFIG_ARM_EFI */
+
 real_start:
 /* BSS should be zeroed when booting without EFI */
 mov   x26, #0/* x26 := skip_zero_bss */
@@ -913,6 +920,8 @@ putn:   ret
 ENTRY(lookup_processor_type)
 mov  x0, #0
 ret
+
+#ifdef CONFIG_ARM_EFI
 /*
  *  Function to transition from EFI loader in C, to Xen entry point.
  *  void noreturn efi_xen_start(void *fdt_ptr, uint32_t fdt_size);
@@ -971,6 +980,8 @@ ENTRY(efi_xen_start)
 b real_start_efi
 ENDPROC(efi_xen_start)
 
+#endif /* CONFIG_ARM_EFI */
+
 /*
  * Local variables:
  * mode: ASM
-- 
2.25.1

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-12 Thread Chuck Zmudzinski

On 1/12/23 6:03 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow wrote:
>> I think the change Michael suggests is very minimalistic: Move the if
>> condition around xen_igd_reserve_slot() into the function itself and
>> always call it there unconditionally -- basically turning three lines
>> into one. Since xen_igd_reserve_slot() seems very problem specific,
>> Michael further suggests to rename it to something more general. All
>> in all no big changes required.
> 
> yes, exactly.
> 

OK, got it. I can do that along with the other suggestions.

Thanks.

S3 under Xen regression between 6.1.1 and 6.1.3

2023-01-12 Thread Marek Marczykowski-Górecki

Hi,

6.1.3 as PV dom0 crashes when attempting to suspend. 6.1.1 works. The
crash:

[  348.284004] PM: suspend entry (deep)
[  348.289532] Filesystems sync: 0.005 seconds
[  348.291545] Freezing user space processes ... (elapsed 0.000 seconds) 
done.
[  348.292457] OOM killer disabled.
[  348.292462] Freezing remaining freezable tasks ... (elapsed 0.104 
seconds) done.
[  348.396612] printk: Suspending console(s) (use no_console_suspend to 
debug)
[  348.749228] PM: suspend devices took 0.352 seconds
[  348.769713] ACPI: EC: interrupt blocked
[  348.816077] BUG: kernel NULL pointer dereference, address: 
001c
[  348.816080] #PF: supervisor read access in kernel mode
[  348.816081] #PF: error_code(0x) - not-present page
[  348.816083] PGD 0 P4D 0 
[  348.816086] Oops:  [#1] PREEMPT SMP NOPTI
[  348.816089] CPU: 0 PID: 6764 Comm: systemd-sleep Not tainted 
6.1.3-1.fc32.qubes.x86_64 #1
[  348.816092] Hardware name: Star Labs StarBook/StarBook, BIOS 8.01 
07/03/2022
[  348.816093] RIP: e030:acpi_get_wakeup_address+0xc/0x20
[  348.816100] Code: 44 00 00 48 8b 05 04 a3 82 02 c3 cc cc cc cc cc cc cc 
cc cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 48 8b 05 fc 9d 82 02 <8b> 40 
1c c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f
[  348.816103] RSP: e02b:c90042537d08 EFLAGS: 00010246
[  348.816105] RAX:  RBX: 0003 RCX: 
20c49ba5e353f7cf
[  348.816106] RDX: cd19 RSI: 0002ee9a RDI: 
002a051ed42d7694
[  348.816108] RBP: 0003 R08: c90042537ca0 R09: 
82c5e468
[  348.816110] R10: 7ff0 R11:  R12: 

[  348.816111] R13: fff2 R14: 88812206e6c0 R15: 
88812206e6e0
[  348.816121] FS:  7cb49b01eb80() GS:88818940() 
knlGS:
[  348.816123] CS:  e030 DS:  ES:  CR0: 80050033
[  348.816124] CR2: 001c CR3: 00012231a000 CR4: 
00050660
[  348.816131] Call Trace:
[  348.816133]  
[  348.816134]  acpi_pm_prepare+0x1a/0x50
[  348.816141]  suspend_enter+0x94/0x360
[  348.816146]  suspend_devices_and_enter+0x198/0x2b0
[  348.816150]  enter_state+0x18d/0x1f5
[  348.816155]  pm_suspend.cold+0x20/0x6b
[  348.816159]  state_store+0x27/0x60
[  348.816163]  kernfs_fop_write_iter+0x125/0x1c0
[  348.816169]  new_sync_write+0x105/0x190
[  348.816176]  vfs_write+0x211/0x2a0
[  348.816180]  ksys_write+0x67/0xe0
[  348.816183]  do_syscall_64+0x59/0x90
[  348.816188]  ? do_syscall_64+0x69/0x90
[  348.816192]  ? exc_page_fault+0x76/0x170
[  348.816195]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  348.816200] RIP: 0033:0x7cb49c1412f7
[  348.816203] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 
00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 
00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  348.816204] RSP: 002b:7ffc125f63f8 EFLAGS: 0246 ORIG_RAX: 
0001
[  348.816206] RAX: ffda RBX: 0004 RCX: 
7cb49c1412f7
[  348.816208] RDX: 0004 RSI: 7ffc125f64e0 RDI: 
0004
[  348.816209] RBP: 7ffc125f64e0 R08: 5c83d772bca0 R09: 
000d
[  348.816210] R10: 5c83d7727eb0 R11: 0246 R12: 
0004
[  348.816211] R13: 5c83d77272d0 R14: 0004 R15: 
7cb49c213700
[  348.816213]  
[  348.816214] Modules linked in: loop vfat fat snd_hda_codec_hdmi 
snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel 
soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci 
snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core 
snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_hda_codec_realtek 
snd_hda_codec_generic ledtrig_audio snd_soc_core snd_compress ac97_bus 
snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi iTCO_wdt 
intel_pmc_bxt ee1004 iTCO_vendor_support intel_rapl_msr snd_hda_codec 
snd_hda_core snd_hwdep snd_seq snd_seq_device iwlwifi snd_pcm pcspkr joydev 
processor_thermal_device_pci_legacy processor_thermal_device snd_timer snd 
cfg80211 processor_thermal_rfim i2c_i801 processor_thermal_mbox i2c_smbus 
idma64 rfkill processor_thermal_rapl soundcore intel_rapl_common 
int340x_thermal_zone intel_soc_dts_iosf igen6_edac intel_hid intel_pmc_core 
intel_scu_pltdrv sparse_keymap fuse xenfs ip_tables dm_thin_pool
ic#2 Part1
[  348.816259]  dm_persistent_data dm_bio_prison dm_crypt i915 
crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic 
drm_buddy nvme video wmi drm_display_helper nvme_core xhci_pci xhci_pci_renesas 
ghash_clmulni_intel hid_multitouch sha512_ssse3 serio_raw nvme_common cec 
xhci_hcd ttm i2c_hid_acpi i2c_hid pinctrl_tigerlake

[xen-unstable test] 175739: regressions - trouble: broken/fail/pass

2023-01-12 Thread osstest service owner

flight 175739 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175739/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl-multivcpu broken
 test-armhf-armhf-xl-multivcpu  5 host-install(5)   broken REGR. vs. 175734
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 175734
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 175734

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 175734
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175734
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175734
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175734
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 175734
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175734
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 175734
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175734
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175734
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 175734
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175734
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175734
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass

[xen-unstable-smoke test] 175748: trouble: broken/pass

2023-01-12 Thread osstest service owner

flight 175748 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175748/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-xl  broken
 test-armhf-armhf-xl   5 host-install(5)broken REGR. vs. 175746

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3edca52ce736297d7fcf293860cd94ef62638052
baseline version:
 xen  6bec713f871f21c6254a5783c1e39867ea828256

Last test of basis   175746  2023-01-12 16:03:41 Z0 days
Testing same since   175748  2023-01-12 20:01:56 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  broken  
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary

broken-job test-armhf-armhf-xl broken
broken-step test-armhf-armhf-xl host-install(5)

Not pushing.


commit 3edca52ce736297d7fcf293860cd94ef62638052
Author: Andrew Cooper 
Date:   Mon Jan 9 10:58:31 2023 +

x86/vmx: Support for CPUs without model-specific LBR

Ice Lake (server at least) has both architectural LBR and model-specific 
LBR.
Sapphire Rapids does not have model-specific LBR at all.  I.e. On SPR and
later, model_specific_lbr will always be NULL, so we must make changes to
avoid reliably hitting the domain_crash().

The Arch LBR spec states that CPUs without model-specific LBR implement
MSR_DBG_CTL.LBR by discarding writes and always returning 0.

Do this for any CPU for which we lack model-specific LBR information.

Adjust the now-stale comment, now that the Arch LBR spec has created a way 
to
signal "no model specific LBR" to guests.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 

commit e94af0d58f86c3a914b9cbbf4d9ed3d43b974771
Author: Andrew Cooper 
Date:   Mon Jan 9 11:42:22 2023 +

x86/vmx: Calculate model-specific LBRs once at start of day

There is no point repeating this calculation at runtime, especially as it is
in the fallback path of the WRSMR/RDMSR handlers.

Move the infrastructure higher in vmx.c to avoid forward declarations,
renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
these are model-specific only.

No practical change.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
Reviewed-by: Kevin Tian 

commit e6ee01ad24b6a1c3b922579964deebb119a90a48
Author: Andrew Cooper 
Date:   Tue Jan 3 15:08:56 2023 +

xen/version: Drop compat/kernel.c

kernel.c is mostly in an #ifndef COMPAT guard, because compat/kernel.c
re-includes kernel.c to recompile xen_version() in a compat form.

However, the xen_version hypercall is almost guest-ABI-agnostic; only
XENVER_platform_parameters has a compat split.  Handle this locally, and do
away with the re-include entirely.  Also drop the CHECK_TYPE()'s between 
types
that are simply char-arrays in their native and compat form.

In particular, this removed the final instances of obfuscation via the DO()
macro.

No functional change.  Also saves 2k of of .text in the x86 build.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 

commit 73f0696dc1d31a987563184ce1d01cbf5d12d6ab
Author: Andrew Cooper 
Date:   Tue Dec 20 15:51:07 2022 +

public/version: Change xen_feature_info to have a fixed size

Re: [GIT PULL] xen: branch for v6.2-rc4

2023-01-12 Thread pr-tracker-bot

The pull request you sent on Wed, 11 Jan 2023 13:25:01 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
> for-linus-6.2-rc4-tag

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/bad8c4a850eaf386df681d951e3afc06bf1c7cf8

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

[qemu-mainline test] 175743: tolerable FAIL - PUSHED

2023-01-12 Thread osstest service owner

flight 175743 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175743/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemuu-rhel6hvm-amd 7 xen-install fail in 175735 pass in 175743
 test-amd64-i386-pair 11 xen-install/dst_host fail in 175735 pass in 175743
 test-amd64-i386-xl-vhd 21 guest-start/debian.repeat fail in 175735 pass in 
175743
 test-amd64-i386-pair 10 xen-install/src_host   fail pass in 175735
 test-armhf-armhf-xl-rtds  8 xen-boot   fail pass in 175735
 test-armhf-armhf-xl-multivcpu 12 debian-installfail pass in 175735
 test-armhf-armhf-libvirt-qcow2 12 debian-di-installfail pass in 175735
 test-armhf-armhf-libvirt-raw 12 debian-di-install  fail pass in 175735

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail in 175735 
like 175623
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail in 175735 like 
175623
 test-armhf-armhf-xl-rtds15 migrate-support-check fail in 175735 never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-check fail in 175735 never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 175735 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 175735 
never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-check fail in 175735 never 
pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-check fail in 175735 never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175623
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175623
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175623
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175623
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175623
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175623
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-check

Re: [PATCH 06/22] x86: map/unmap pages in restore_all_guests

2023-01-12 Thread Julien Grall


Hi Jan,

On 04/01/2023 10:27, Jan Beulich wrote:

On 23.12.2022 13:22, Julien Grall wrote:

Hi,

On 22/12/2022 11:12, Jan Beulich wrote:

On 16.12.2022 12:48, Julien Grall wrote:

--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -165,7 +165,24 @@ restore_all_guest:
   and   %rsi, %rdi
   and   %r9, %rsi
   add   %rcx, %rdi
-add   %rcx, %rsi
+
+ /*
+  * Without a direct map, we have to map first before copying. We only
+  * need to map the guest root table but not the per-CPU root_pgt,
+  * because the latter is still a xenheap page.
+  */
+pushq %r9
+pushq %rdx
+pushq %rax
+pushq %rdi
+mov   %rsi, %rdi
+shr   $PAGE_SHIFT, %rdi
+callq map_domain_page
+mov   %rax, %rsi
+popq  %rdi
+/* Stash the pointer for unmapping later. */
+pushq %rax
+
   mov   $ROOT_PAGETABLE_FIRST_XEN_SLOT, %ecx
   mov   root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rsi), %r8
   mov   %r8, root_table_offset(SH_LINEAR_PT_VIRT_START)*8(%rdi)
@@ -177,6 +194,14 @@ restore_all_guest:
   sub   $(ROOT_PAGETABLE_FIRST_XEN_SLOT - \
   ROOT_PAGETABLE_LAST_XEN_SLOT - 1) * 8, %rdi
   rep movsq
+
+/* Unmap the page. */
+popq  %rdi
+callq unmap_domain_page
+popq  %rax
+popq  %rdx
+popq  %r9


While the PUSH/POP are part of what I dislike here, I think this wants
doing differently: Establish a mapping when putting in place a new guest
page table, and use the pointer here. This could be a new per-domain
mapping, to limit its visibility.


I have looked at a per-domain approach and this looks way more complex
than the few concise lines here (not mentioning the extra amount of
memory).


Yes, I do understand that would be a more intrusive change.


I could be persuaded to look at a more intrusive change if there are a 
good reason to do it. To me, at the moment, it mostly seem a matter of 
taste.


So what would we gain from a perdomain mapping?




So I am not convinced this is worth the effort here.

I don't have an other approach in mind. So are you disliking this
approach to the point this will be nacked?


I guess I wouldn't nack it, but I also wouldn't provide an ack.
I'm curious
what Andrew or Roger think here...


Unfortunately Roger is on parental leaves for the next couple of months. 
It would be good to make some progress before hand. Andrew, what do you 
think?


--
Julien Grall

Re: [PATCH 05/22] x86/srat: vmap the pages for acpi_slit

2023-01-12 Thread Julien Grall


Hi,

On 04/01/2023 10:23, Jan Beulich wrote:

On 23.12.2022 12:31, Julien Grall wrote:

On 20/12/2022 15:30, Jan Beulich wrote:

On 16.12.2022 12:48, Julien Grall wrote:

From: Hongyan Xia 

This avoids the assumption that boot pages are in the direct map.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 


Reviewed-by: Jan Beulich 

However, ...


--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -139,7 +139,8 @@ void __init acpi_numa_slit_init(struct acpi_table_slit 
*slit)
return;
}
mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
-   acpi_slit = mfn_to_virt(mfn_x(mfn));
+   acpi_slit = vmap_contig_pages(mfn, PFN_UP(slit->header.length));


... with the increased use of vmap space the VA range used will need
growing. And that's perhaps better done ahead of time than late.


I will have a look to increase the vmap().




+   BUG_ON(!acpi_slit);


Similarly relevant for the earlier patch: It would be nice if boot
failure for optional things like NUMA data could be avoided.


If you can't map (or allocate the memory), then you are probably in a
very bad situation because both should really not fail at boot.

So I think this is correct to crash early because the admin will be able
to look what went wrong. Otherwise, it may be missed in the noise.


Well, I certainly can see one taking this view. However, at least in
principle allocation (or mapping) may fail _because_ of NUMA issues.


Right. I read this as the user will likely want to add "numa=off" on the 
command line.



At which point it would be better to boot with NUMA support turned off
I have to disagree with "better" here. This may work for a user with a 
handful of hosts. But for large scale setup, you will really want a 
failure early rather than having a host booting with an expected feature 
disabled (the NUMA issues may be a broken HW).


It is better to fail and then ask the user to specify "numa=off". At
least the person made a conscientious decision to turn off the feature.

I am curious to hear the opinion from the others.

Cheers,

--
Julien Grall

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-12 Thread Michael S. Tsirkin

On Thu, Jan 12, 2023 at 10:55:25PM +, Bernhard Beschow wrote:
> I think the change Michael suggests is very minimalistic: Move the if
> condition around xen_igd_reserve_slot() into the function itself and
> always call it there unconditionally -- basically turning three lines
> into one. Since xen_igd_reserve_slot() seems very problem specific,
> Michael further suggests to rename it to something more general. All
> in all no big changes required.

yes, exactly.

-- 
MST

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-12 Thread Bernhard Beschow




Am 12. Januar 2023 20:11:54 UTC schrieb Chuck Zmudzinski :
>On 1/12/23 2:18 PM, Bernhard Beschow wrote:
>> 
>> 
>> Am 11. Januar 2023 15:40:24 UTC schrieb Chuck Zmudzinski :
>>>On 1/10/23 3:16 AM, Michael S. Tsirkin wrote:
 On Tue, Jan 10, 2023 at 02:08:34AM -0500, Chuck Zmudzinski wrote:
> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
> as noted in docs/igd-assign.txt in the Qemu source code.
> 
> Currently, when the xl toolstack is used to configure a Xen HVM guest with
> Intel IGD passthrough to the guest with the Qemu upstream device model,
> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will 
> occupy
> a different slot. This problem often prevents the guest from booting.
> 
> The only available workaround is not good: Configure Xen HVM guests to use
> the old and no longer maintained Qemu traditional device model available
> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
> 
> To implement this feature in the Qemu upstream device model for Xen HVM
> guests, introduce the following new functions, types, and macros:
> 
> * XEN_PT_DEVICE_CLASS declaration, based on the existing 
> TYPE_XEN_PT_DEVICE
> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
> * typedef XenPTQdevRealize function pointer
> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
> * xen_igd_reserve_slot and xen_igd_clear_slot functions
> 
> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
> the xl toolstack with the gfx_passthru option enabled, which sets the
> igd-passthru=on option to Qemu for the Xen HVM machine type.
> 
> The new xen_igd_reserve_slot function also needs to be implemented in
> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
> when Qemu is configured with --enable-xen and 
> --disable-xen-pci-passthrough,
> in which case it does nothing.
> 
> The new xen_igd_clear_slot function overrides qdev->realize of the parent
> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
> 
> Move the call to xen_host_pci_device_get, and the associated error
> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
> initialize the device class and vendor values which enables the checks for
> the Intel IGD to succeed. The verification that the host device is an
> Intel IGD to be passed through is done by checking the domain, bus, slot,
> and function values as well as by checking that gfx_passthru is enabled,
> the device class is VGA, and the device vendor in Intel.
> 
> Signed-off-by: Chuck Zmudzinski 
> ---
> Notes that might be helpful to reviewers of patched code in hw/xen:
> 
> The new functions and types are based on recommendations from Qemu docs:
> https://qemu.readthedocs.io/en/latest/devel/qom.html
> 
> Notes that might be helpful to reviewers of patched code in hw/i386:
> 
> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
> not affect builds that do not have CONFIG_XEN defined.
> 
> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
> existing function that is only true when Qemu is built with
> xen-pci-passthrough enabled and the administrator has configured the Xen
> HVM guest with Qemu's igd-passthru=on option.
> 
> v2: Remove From:  tag at top of commit message
> 
> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
> 
> if (is_igd_vga_passthrough(>real_device) &&
> (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
> 
> is changed to
> 
> if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
> && (s->hostaddr.function == 0)) {
> 
> I hoped that I could use the test in v2, since it matches the
> other tests for the Intel IGD in Qemu and Xen, but those tests
> do not work because the necessary data structures are not set with
> their values yet. So instead use the test that the administrator
> has enabled gfx_passthru and the device address on the host is
> 02.0. This test does detect the Intel IGD correctly.
> 
> v4: Use brchu...@aol.com instead of brchu...@netscape.net for the author's
> email address to match the address used by the same author in commits
> be9c61da and c0e86b76
> 
> Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc
> 
> v5: The patch of xen_pt.c was re-worked to allow a more consistent test
>

Re: [PATCH v3 15/18] xen/arm64: mm: Introduce helpers to prepare/enable/disable the identity mapping

2023-01-12 Thread Julien Grall


Hi,

On 13/12/2022 01:41, Stefano Stabellini wrote:

diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index fdbf68aadcaa..e7a80fecec14 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -168,6 +168,17 @@ int map_range_to_domain(const struct dt_device_node *dev,
  
  extern const char __ro_after_init_start[], __ro_after_init_end[];
  
+extern DEFINE_BOOT_PAGE_TABLE(boot_pgtable);

+
+#ifdef CONFIG_ARM_64
+extern DEFINE_BOOT_PAGE_TABLE(boot_first_id);
+#endif
+extern DEFINE_BOOT_PAGE_TABLE(boot_second_id);
+extern DEFINE_BOOT_PAGE_TABLE(boot_third_id);


This is more a matter of taste but I would either:
- define extern all BOOT_PAGE_TABLEs here both ARM64 and ARM32 with
   #ifdefs


A grep of BOOT_PAGE_TABLE shows that they are all defined in setup.h.


- or define all the ARM64 only BOOT_PAGE_TABLE in arm64/mm.h and all the
   ARM32 only BOOT_PAGE_TABLE in arm32/mm.h >
Right now we have a mix, as we have boot_first_id with a #ifdef here
and we have xen_pgtable in arm64/mm.h
We are talking about two distinct set of page-tables. One is used at 
runtime (i.e. xen_pgtable) and the other are for boot/smp-bring up.


So adding the boot_* in setup.h is correct. As I wrote earlier, setup.h 
would need a split. But this is not something I really want to handle 
here...




Also we are missing boot_second and boot_third. We might as well be
consistent and declare them all?


My plan is really to kill boot_second and boot_third. So I don't really 
want to export them right now (even temporarily).


In any case, I don't think such change belongs in this patch (it is 
already complex enough).



+/* Find where Xen will be residing at runtime and return an PT entry */
+lpae_t pte_of_xenaddr(vaddr_t);
+
  #endif
  /*
   * Local variables:
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 0cf7ad4f0e8c..39e0d9e03c9c 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -93,7 +93,7 @@ DEFINE_BOOT_PAGE_TABLE(boot_third);
  
  #ifdef CONFIG_ARM_64

  #define HYP_PT_ROOT_LEVEL 0
-static DEFINE_PAGE_TABLE(xen_pgtable);
+DEFINE_PAGE_TABLE(xen_pgtable);
  static DEFINE_PAGE_TABLE(xen_first);
  #define THIS_CPU_PGTABLE xen_pgtable
  #else
@@ -388,7 +388,7 @@ void flush_page_to_ram(unsigned long mfn, bool sync_icache)
  invalidate_icache();
  }
  
-static inline lpae_t pte_of_xenaddr(vaddr_t va)

+lpae_t pte_of_xenaddr(vaddr_t va)
  {
  paddr_t ma = va + phys_offset;
  
@@ -495,6 +495,8 @@ void __init setup_pagetables(unsigned long boot_phys_offset)
  
  phys_offset = boot_phys_offset;
  
+arch_setup_page_tables();

+
  #ifdef CONFIG_ARM_64
  pte = pte_of_xenaddr((uintptr_t)xen_first);
  pte.pt.table = 1;
--
2.38.1



Cheers,

--
Julien Grall

Re: [RFC][PATCH 2/6] x86/power: Inline write_cr[04]()

2023-01-12 Thread Kees Cook

On Thu, Jan 12, 2023 at 03:31:43PM +0100, Peter Zijlstra wrote:
> Since we can't do CALL/RET until GS is restored and CR[04] pinning is
> of dubious value in this code path, simply write the stored values.
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Reviewed-by: Kees Cook 

-- 
Kees Cook

[linux-linus test] 175737: regressions - FAIL

2023-01-12 Thread osstest service owner

flight 175737 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175737/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-libvirt-xsm  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-libvirt-qcow2  8 xen-boot   fail REGR. vs. 173462
 test-amd64-amd64-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-libvirt-pair 12 xen-boot/src_host   fail REGR. vs. 173462
 test-amd64-amd64-libvirt-pair 13 xen-boot/dst_host   fail REGR. vs. 173462
 test-amd64-coresched-amd64-xl  8 xen-bootfail REGR. vs. 173462
 test-amd64-amd64-qemuu-nested-intel  8 xen-boot  fail REGR. vs. 173462
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 173462
 test-amd64-amd64-xl-qemuu-ws16-amd64  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-pair12 xen-boot/src_hostfail REGR. vs. 173462
 test-amd64-amd64-pair13 xen-boot/dst_hostfail REGR. vs. 173462
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-examine-bios  8 reboot  fail REGR. vs. 173462
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 8 xen-boot fail REGR. 
vs. 173462
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 8 xen-boot fail REGR. 
vs. 173462
 test-amd64-amd64-xl-qemut-win7-amd64  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 173462
 test-amd64-amd64-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-qemuu-nested-amd  8 xen-bootfail REGR. vs. 173462
 test-amd64-amd64-xl-xsm   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-pvhv2-amd  8 xen-bootfail REGR. vs. 173462
 test-amd64-amd64-freebsd12-amd64  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-examine-uefi  8 reboot  fail REGR. vs. 173462
 test-amd64-amd64-xl-qemut-ws16-amd64  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-qemut-debianhvm-amd64  8 xen-bootfail REGR. vs. 173462
 test-amd64-amd64-freebsd11-amd64  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-seattle   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-pygrub   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-multivcpu  8 xen-bootfail REGR. vs. 173462
 test-amd64-amd64-xl-shadow8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-libvirt  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-pvhv2-intel  8 xen-boot  fail REGR. vs. 173462
 test-arm64-arm64-xl-xsm   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-pvshim8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  8 xen-bootfail REGR. vs. 173462
 test-amd64-amd64-xl   8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 8 xen-boot fail REGR. vs. 
173462
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt-qcow2  8 xen-boot   fail REGR. vs. 173462
 test-amd64-amd64-xl-qemuu-ovmf-amd64  8 xen-boot fail REGR. vs. 173462
 test-amd64-amd64-examine  8 reboot   fail REGR. vs. 173462
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 8 xen-boot fail REGR. vs. 
173462
 test-amd64-amd64-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 173462
 test-arm64-arm64-libvirt-xsm  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-vhd   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl-credit1   8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-libvirt-raw  8 xen-boot fail REGR. vs. 173462
 test-arm64-arm64-xl   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-multivcpu  8 xen-bootfail REGR. vs. 173462
 test-armhf-armhf-xl-arndale   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-xl-credit2   8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-libvirt  8 xen-boot fail REGR. vs. 173462
 test-armhf-armhf-examine  8 reboot

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-12 Thread Chuck Zmudzinski

On 1/12/23 2:18 PM, Bernhard Beschow wrote:
> 
> 
> Am 11. Januar 2023 15:40:24 UTC schrieb Chuck Zmudzinski :
>>On 1/10/23 3:16 AM, Michael S. Tsirkin wrote:
>>> On Tue, Jan 10, 2023 at 02:08:34AM -0500, Chuck Zmudzinski wrote:
 Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
 as noted in docs/igd-assign.txt in the Qemu source code.

 Currently, when the xl toolstack is used to configure a Xen HVM guest with
 Intel IGD passthrough to the guest with the Qemu upstream device model,
 a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
 a different slot. This problem often prevents the guest from booting.

 The only available workaround is not good: Configure Xen HVM guests to use
 the old and no longer maintained Qemu traditional device model available
 from xenbits.xen.org which does reserve slot 2 for the Intel IGD.

 To implement this feature in the Qemu upstream device model for Xen HVM
 guests, introduce the following new functions, types, and macros:

 * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
 * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
 * typedef XenPTQdevRealize function pointer
 * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
 * xen_igd_reserve_slot and xen_igd_clear_slot functions

 The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
 member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
 the xl toolstack with the gfx_passthru option enabled, which sets the
 igd-passthru=on option to Qemu for the Xen HVM machine type.

 The new xen_igd_reserve_slot function also needs to be implemented in
 hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
 when Qemu is configured with --enable-xen and 
 --disable-xen-pci-passthrough,
 in which case it does nothing.

 The new xen_igd_clear_slot function overrides qdev->realize of the parent
 PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
 since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
 created in hw/i386/pc_piix.c for the case when igd-passthru=on.

 Move the call to xen_host_pci_device_get, and the associated error
 handling, from xen_pt_realize to the new xen_igd_clear_slot function to
 initialize the device class and vendor values which enables the checks for
 the Intel IGD to succeed. The verification that the host device is an
 Intel IGD to be passed through is done by checking the domain, bus, slot,
 and function values as well as by checking that gfx_passthru is enabled,
 the device class is VGA, and the device vendor in Intel.

 Signed-off-by: Chuck Zmudzinski 
 ---
 Notes that might be helpful to reviewers of patched code in hw/xen:

 The new functions and types are based on recommendations from Qemu docs:
 https://qemu.readthedocs.io/en/latest/devel/qom.html

 Notes that might be helpful to reviewers of patched code in hw/i386:

 The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
 not affect builds that do not have CONFIG_XEN defined.

 xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
 existing function that is only true when Qemu is built with
 xen-pci-passthrough enabled and the administrator has configured the Xen
 HVM guest with Qemu's igd-passthru=on option.

 v2: Remove From:  tag at top of commit message

 v3: Changed the test for the Intel IGD in xen_igd_clear_slot:

 if (is_igd_vga_passthrough(>real_device) &&
 (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {

 is changed to

 if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
 && (s->hostaddr.function == 0)) {

 I hoped that I could use the test in v2, since it matches the
 other tests for the Intel IGD in Qemu and Xen, but those tests
 do not work because the necessary data structures are not set with
 their values yet. So instead use the test that the administrator
 has enabled gfx_passthru and the device address on the host is
 02.0. This test does detect the Intel IGD correctly.

 v4: Use brchu...@aol.com instead of brchu...@netscape.net for the author's
 email address to match the address used by the same author in commits
 be9c61da and c0e86b76

 Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc

 v5: The patch of xen_pt.c was re-worked to allow a more consistent test
 for the Intel IGD that uses the same criteria as in other places.
 This involved moving the call to xen_host_pci_device_get from
 xen_pt_realize to

Re: [PATCH] x86/paravirt: merge activate_mm and dup_mmap callbacks

2023-01-12 Thread Boris Ostrovsky




On 1/12/23 10:21 AM, Juergen Gross wrote:

The two paravirt callbacks .mmu.activate_mm and .mmu.dup_mmap are
sharing the same implementations in all cases: for Xen PV guests they
are pinning the PGD of the new mm_struct, and for all other cases
they are a NOP.

So merge them to a common callback .mmu.enter_mmap (in contrast to the
corresponding already existing .mmu.exit_mmap).

As the first parameter of the old callbacks isn't used, drop it from
the replacement one.

Signed-off-by: Juergen Gross 



Reviewed-by: Boris Ostrovsky

Re: [PATCH v3 08/18] xen/arm32: head: Introduce an helper to flush the TLBs

2023-01-12 Thread Julien Grall





On 14/12/2022 14:24, Michal Orzel wrote:

Hi Julien,


Hi Michal,


On 12/12/2022 10:55, Julien Grall wrote:



From: Julien Grall 

The sequence for flushing the TLBs is 4 instruction long and often
requires an explanation how it works.

So create an helper and use it in the boot code (switch_ttbr() is left

Here and in title: s/an helper/a helper/


Done.


alone for now).

Could you explain why?


So we need to decide how we expect switch_ttbr(). E.g. if Xen is 
relocated at a different, should the caller take care of the 
instruction/branch predictor flush?


I have expanded to "switch_ttbr() is left alone until we decided the 
semantic of the call".




Note that in secondary_switched, we were also flushing the instruction
cache and branch predictor. Neither of them was necessary because:
 * We are only supporting IVIPT cache on arm32, so the instruction
   cache flush is only necessary when executable code is modified.
   None of the boot code is doing that.
 * The instruction cache is not invalidated and misprediction is not
   a problem at boot.

Signed-off-by: Julien Grall 


Apart from that, the patch is good, so:
Reviewed-by: Michal Orzel 

Thanks!





---
 Changes in v3:
 * Fix typo
 * Update the documentation
 * Rename the argument from tmp1 to tmp
---
  xen/arch/arm/arm32/head.S | 30 +-
  1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 40c1d7502007..315abbbaebec 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -66,6 +66,20 @@
  add   \rb, \rb, r10
  .endm

+/*
+ * Flush local TLBs
+ *
+ * @tmp:Scratch register

As you are respinning a series anyway, could you add just one space after @tmp:?


Ok.

Cheers,

--
Julien Grall

[xen-unstable-smoke test] 175746: tolerable all pass - PUSHED

2023-01-12 Thread osstest service owner

flight 175746 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175746/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  6bec713f871f21c6254a5783c1e39867ea828256
baseline version:
 xen  661489874e87c0f6e21ac298b039aab9379f6ee0

Last test of basis   175741  2023-01-12 11:01:58 Z0 days
Testing same since   175746  2023-01-12 16:03:41 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   661489874e..6bec713f87  6bec713f871f21c6254a5783c1e39867ea828256 -> smoke

Re: [PATCH v8] xen/pt: reserve PCI slot 2 for Intel igd-passthru

2023-01-12 Thread Bernhard Beschow




Am 11. Januar 2023 15:40:24 UTC schrieb Chuck Zmudzinski :
>On 1/10/23 3:16 AM, Michael S. Tsirkin wrote:
>> On Tue, Jan 10, 2023 at 02:08:34AM -0500, Chuck Zmudzinski wrote:
>>> Intel specifies that the Intel IGD must occupy slot 2 on the PCI bus,
>>> as noted in docs/igd-assign.txt in the Qemu source code.
>>> 
>>> Currently, when the xl toolstack is used to configure a Xen HVM guest with
>>> Intel IGD passthrough to the guest with the Qemu upstream device model,
>>> a Qemu emulated PCI device will occupy slot 2 and the Intel IGD will occupy
>>> a different slot. This problem often prevents the guest from booting.
>>> 
>>> The only available workaround is not good: Configure Xen HVM guests to use
>>> the old and no longer maintained Qemu traditional device model available
>>> from xenbits.xen.org which does reserve slot 2 for the Intel IGD.
>>> 
>>> To implement this feature in the Qemu upstream device model for Xen HVM
>>> guests, introduce the following new functions, types, and macros:
>>> 
>>> * XEN_PT_DEVICE_CLASS declaration, based on the existing TYPE_XEN_PT_DEVICE
>>> * XEN_PT_DEVICE_GET_CLASS macro helper function for XEN_PT_DEVICE_CLASS
>>> * typedef XenPTQdevRealize function pointer
>>> * XEN_PCI_IGD_SLOT_MASK, the value of slot_reserved_mask to reserve slot 2
>>> * xen_igd_reserve_slot and xen_igd_clear_slot functions
>>> 
>>> The new xen_igd_reserve_slot function uses the existing slot_reserved_mask
>>> member of PCIBus to reserve PCI slot 2 for Xen HVM guests configured using
>>> the xl toolstack with the gfx_passthru option enabled, which sets the
>>> igd-passthru=on option to Qemu for the Xen HVM machine type.
>>> 
>>> The new xen_igd_reserve_slot function also needs to be implemented in
>>> hw/xen/xen_pt_stub.c to prevent FTBFS during the link stage for the case
>>> when Qemu is configured with --enable-xen and --disable-xen-pci-passthrough,
>>> in which case it does nothing.
>>> 
>>> The new xen_igd_clear_slot function overrides qdev->realize of the parent
>>> PCI device class to enable the Intel IGD to occupy slot 2 on the PCI bus
>>> since slot 2 was reserved by xen_igd_reserve_slot when the PCI bus was
>>> created in hw/i386/pc_piix.c for the case when igd-passthru=on.
>>> 
>>> Move the call to xen_host_pci_device_get, and the associated error
>>> handling, from xen_pt_realize to the new xen_igd_clear_slot function to
>>> initialize the device class and vendor values which enables the checks for
>>> the Intel IGD to succeed. The verification that the host device is an
>>> Intel IGD to be passed through is done by checking the domain, bus, slot,
>>> and function values as well as by checking that gfx_passthru is enabled,
>>> the device class is VGA, and the device vendor in Intel.
>>> 
>>> Signed-off-by: Chuck Zmudzinski 
>>> ---
>>> Notes that might be helpful to reviewers of patched code in hw/xen:
>>> 
>>> The new functions and types are based on recommendations from Qemu docs:
>>> https://qemu.readthedocs.io/en/latest/devel/qom.html
>>> 
>>> Notes that might be helpful to reviewers of patched code in hw/i386:
>>> 
>>> The small patch to hw/i386/pc_piix.c is protected by CONFIG_XEN so it does
>>> not affect builds that do not have CONFIG_XEN defined.
>>> 
>>> xen_igd_gfx_pt_enabled() in the patched hw/i386/pc_piix.c file is an
>>> existing function that is only true when Qemu is built with
>>> xen-pci-passthrough enabled and the administrator has configured the Xen
>>> HVM guest with Qemu's igd-passthru=on option.
>>> 
>>> v2: Remove From:  tag at top of commit message
>>> 
>>> v3: Changed the test for the Intel IGD in xen_igd_clear_slot:
>>> 
>>> if (is_igd_vga_passthrough(>real_device) &&
>>> (s->real_device.vendor_id == PCI_VENDOR_ID_INTEL)) {
>>> 
>>> is changed to
>>> 
>>> if (xen_igd_gfx_pt_enabled() && (s->hostaddr.slot == 2)
>>> && (s->hostaddr.function == 0)) {
>>> 
>>> I hoped that I could use the test in v2, since it matches the
>>> other tests for the Intel IGD in Qemu and Xen, but those tests
>>> do not work because the necessary data structures are not set with
>>> their values yet. So instead use the test that the administrator
>>> has enabled gfx_passthru and the device address on the host is
>>> 02.0. This test does detect the Intel IGD correctly.
>>> 
>>> v4: Use brchu...@aol.com instead of brchu...@netscape.net for the author's
>>> email address to match the address used by the same author in commits
>>> be9c61da and c0e86b76
>>> 
>>> Change variable for XEN_PT_DEVICE_CLASS: xptc changed to xpdc
>>> 
>>> v5: The patch of xen_pt.c was re-worked to allow a more consistent test
>>> for the Intel IGD that uses the same criteria as in other places.
>>> This involved moving the call to xen_host_pci_device_get from
>>> xen_pt_realize to xen_igd_clear_slot and updating the checks for the
>>> Intel IGD in xen_igd_clear_slot:
>>> 
>>> if (xen_igd_gfx_pt_enabled() &&

[ovmf test] 175747: all pass - PUSHED

2023-01-12 Thread osstest service owner

flight 175747 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175747/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 9d70d8f20d0feee1d232cbf86fc87147ce92c2cb
baseline version:
 ovmf e5ec3ba409b5baa9cf429cc25fdf3c8d1b8dcef0

Last test of basis   175740  2023-01-12 10:40:46 Z0 days
Testing same since   175747  2023-01-12 16:10:44 Z0 days1 attempts


People who touched revisions under test:
  Dionna Glaze 
  Jiewen Yao 
  Sophia Wolf 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   e5ec3ba409..9d70d8f20d  9d70d8f20d0feee1d232cbf86fc87147ce92c2cb -> 
xen-tested-master

Re: [PATCH v2 3/8] x86/iommu: iommu_igfx, iommu_qinval and iommu_snoop are VT-d specific

2023-01-12 Thread Andrew Cooper

On 12/01/2023 3:43 pm, Xenia Ragiadakou wrote:
>
> On 1/12/23 13:49, Xenia Ragiadakou wrote:
>>
>> On 1/12/23 13:31, Jan Beulich wrote:
>>> On 04.01.2023 09:44, Xenia Ragiadakou wrote:
>>>
 --- a/xen/include/xen/iommu.h
 +++ b/xen/include/xen/iommu.h
 @@ -74,9 +74,13 @@ extern enum __packed iommu_intremap {
  iommu_intremap_restricted,
  iommu_intremap_full,
   } iommu_intremap;
 -extern bool iommu_igfx, iommu_qinval, iommu_snoop;
   #else
   # define iommu_intremap false
 +#endif
 +
 +#ifdef CONFIG_INTEL_IOMMU
 +extern bool iommu_igfx, iommu_qinval, iommu_snoop;
 +#else
   # define iommu_snoop false
   #endif
>>>
>>> Do these declarations really need touching? In patch 2 you didn't move
>>> amd_iommu_perdev_intremap's either.
>>
>> Ok, I will revert this change (as I did in v2 of patch 2) since it is
>> not needed.
>
> Actually, my patch was altering the current behavior by defining
> iommu_snoop as false when !INTEL_IOMMU.
>
> IIUC, there is no control over snoop behavior when using the AMD
> iommu. Hence, iommu_snoop should evaluate to true for AMD iommu.
> However, when using the INTEL iommu the user can disable it via the
> "iommu" param, right?
>
> If that's the case then iommu_snoop needs to be moved from vtd/iommu.c
> to x86/iommu.c and iommu_snoop assignment via iommu param needs to be
> guarded by CONFIG_INTEL_IOMMU.
>

Pretty much everything Xen thinks it knows about iommu_snoop is broken.

AMD IOMMUs have had this capability since the outset, but it's the FC
bit (Force Coherent).  On Intel, the capability is optional, and
typically differs between IOMMUs in the same system.

Treating iommu_snoop as a single global is buggy, because (when
available) it's always a per-SBDF control.  It is used to take a TLP and
force it to be coherent even when the device was trying to issue a
non-coherent access.

Intel systems typically have a dedicated IOMMU for the IGD, which always
issues coherent accesses (its memory access happens as an adjunct to the
LLC, not as something that communicates with the memory controller
directly), so the IOMMU doesn't offer snoop control, and Xen "levels"
this down to "the system can't do snoop control".

Xen is very confused when it comes to cacheability correctness.  I still
have a pile of post-XSA-402 work pending, and it needs to start with
splitting Xen's idea of "domain can use reduced cacheability" from
"domain has a device", and work incrementally from there.

But in terms of snoop_control, it's strictly necessary for the cases
where the guest kernel thinks it is using reduced cacheability, but it
isn't because of something the hypervisor has done.  But beyond that,
forcing snoop behind the back of a guest which is using reduced
cacheability is just a waste of performance.

~Andrew

Re: [PATCH] include/types: move stdlib.h-kind types to common header

2023-01-12 Thread Julien Grall


Hi Jan,

On 12/01/2023 14:01, Jan Beulich wrote:

size_t, ssize_t, and ptrdiff_t are all expected to be uniformly defined
on any ports Xen might gain. In particular I hope new ports can rely on
__SIZE_TYPE__ and __PTRDIFF_TYPE__ being made available by the compiler.

Signed-off-by: Jan Beulich 


Acked-by: Julien Grall 

I also don't have any strong opinion either way about continuing to use 
types.h or introduce stddef.h.


Cheers,

--
Julien Grall

Re: [PATCH v2 1/8] x86/boot: Sanitise PKRU on boot

2023-01-12 Thread Andrew Cooper

On 12/01/2023 12:47 pm, Jan Beulich wrote:
> On 10.01.2023 18:18, Andrew Cooper wrote:
>> While the reset value of the register is 0, it might not be after kexec/etc.
>> If PKEY0.{WD,AD} have leaked in from an earlier context, construction of a PV
>> dom0 will explode.
>>
>> Sequencing wise, this must come after setting CR4.PKE, and before we touch 
>> any
>> user mappings.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Roger Pau Monné 
>> CC: Wei Liu 
>>
>> For sequencing, it could also come after setting XCR0.PKRU too, but then we'd
>> need to construct an empty XSAVE area to XRSTOR from, and that would be even
>> more horrible to arrange.
> That would be ugly for other reasons as well, I think.

Yeah - I absolutely don't want to go down this route.

>
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -936,6 +936,9 @@ void cpu_init(void)
>>  write_debugreg(6, X86_DR6_DEFAULT);
>>  write_debugreg(7, X86_DR7_DEFAULT);
>>  
>> +if (cpu_has_pku)
>> +wrpkru(0);
> What about the BSP during S3 resume? Shouldn't we play safe there too, just
> in case?

Out of S3, I think it's reasonable to rely on proper reset values, and
for pkru, and any issues of it being "wrong" should be fixed when we
reload d0v0's XSAVE state.

That said, I'm wanting to try and merge parts of the boot and S3 paths
because we're finding no end of errors/oversights, not least because we
have no automated testing of S3 suspend/resume.  Servers typically don't
implement it, and fixes either come from code inspection, or Qubes
noticing (which is absolutely better than nothing, but not a great
reflection on Xen).

But to merge these things, I first need to finish the work to make
microcode loading properly early, and then fix up some of the feature
detection paths, and cleanly separate feature detection from applying
the chosen configuration, at which point I hope the latter part will be
reusable on the S3 resume path.

I don't expect this work to happen imminently...

~Andrew

Proposal for consistent Kconfig usage by the hypervisor build system

2023-01-12 Thread Jan Beulich

(re-sending with REST on Cc, as requested at the community call)

At present we use a mix of Makefile and Kconfig driven capability checks for
tool chain components involved in the building of the hypervisor.  What approach
is used where is in some part a result of the relatively late introduction of
Kconfig into the build system, but in other places also simply a result of
different taste of different contributors.  Switching to a uniform model,
however, has drawbacks as well:
 - A uniformly Makefile based model is not in line with Linux, where Kconfig is
   actually coming from (at least as far as we're concerned; there may be
   earlier origins).  This model is also being disliked by some community
   members.
 - A uniformly Kconfig based model suffers from a weakness of Kconfig in that
   dependent options are silently turned off when dependencies aren't met.  This
   has the undesirable effect that a carefully crafted .config may be silently
   converted to one with features turned off which were intended to be on.
   While this could be deemed expected behavior when a dependency is also an
   option which was selected by the person configuring the hypervisor, it
   certainly can be surprising when the dependency is an auto-detected tool
   chain capability.  Furthermore there's no automatic re-running of kconfig if
   any part of the tool chain changed.  (Despite knowing of this in principle,
   I've still been hit by this more than once in the past: If one rebuilds a
   tree which wasn't touched for a while, and if some time has already passed
   since the updating to the newer component, one may not immediately make the
   connection.)

Therefore I'd like to propose that we use an intermediate model: Detected tool
chain capabilities (and alike) may only be used to control optimization (i.e.
including their use as dependencies for optimization controls) and to establish
the defaults of options.  They may not be used to control functionality, i.e.
they may in particular not be specified as a dependency of an option controlling
functionality.  This way unless defaults were overridden things will build, and
non-default settings will be honored (albeit potentially resulting in a build
failure).

For example

config AS_VMX
def_bool $(as-instr,vmcall)

would be okay (as long as we have fallback code to deal with the case of too
old an assembler; raising the baseline there is a separate topic), but instead
of what we have currently

config XEN_SHSTK
bool "Supervisor Shadow Stacks"
default HAS_AS_CET_SS

would be the way to go.

It was additionally suggested that, for a better user experience, unmet
dependencies which are known to result in build failures (which at times may be
hard to associate back with the original cause) would be re-checked by Makefile
based logic, leading to an early build failure with a comprehensible error
message.  Personally I'd prefer this to be just warnings (first and foremost to
avoid failing the build just because of a broken or stale check), but I can see
that they might be overlooked when there's a lot of other output.  In any event
we may want to try to figure an approach which would make sufficiently sure that
Makefile and Kconfig checks don't go out of sync.

Jan

Re: [PATCH v2 5/8] x86/hvm: Context switch MSR_PKRS

2023-01-12 Thread Andrew Cooper

On 12/01/2023 1:10 pm, Jan Beulich wrote:
> On 10.01.2023 18:18, Andrew Cooper wrote:
>> +static inline void wrpkrs(uint32_t pkrs)
>> +{
>> +uint32_t *this_pkrs = _cpu(pkrs);
>> +
>> +if ( *this_pkrs != pkrs )
>> +{
>> +*this_pkrs = pkrs;
>> +
>> +wrmsr_ns(MSR_PKRS, pkrs, 0);
>> +}
>> +}
>> +
>> +static inline void wrpkrs_and_cache(uint32_t pkrs)
>> +{
>> +this_cpu(pkrs) = pkrs;
>> +wrmsr_ns(MSR_PKRS, pkrs, 0);
>> +}
> Just to confirm - there's no anticipation of uses of this in async
> contexts, i.e. there's no concern about the ordering of cache vs hardware
> writes?

No.  The only thing modifying MSR_PKRS does is change how the pagewalk
works for the current thread (specifically, the determination of Access
Rights).  Their is no relevance outside of the core, especially for
Xen's local copy of the register value.

What WRMSRNS does guarantee is that older instructions will complete
before the MSR gets updated, and that subsequent instructions won't
start, so WRMSRNS acts "atomically" with respect to instruction order.

Also remember that not all WRMSRs are serialising.  e.g. the X2APIC MSRs
are explicitly not, and this is an oversight in practice for
MSR_X2APIC_ICR at least.

>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -54,6 +54,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  
>>  /* opt_nosmp: If true, secondary processors are ignored. */
>> @@ -1804,6 +1805,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>  if ( opt_invpcid && cpu_has_invpcid )
>>  use_invpcid = true;
>>  
>> +if ( cpu_has_pks )
>> +wrpkrs_and_cache(0); /* Must be before setting CR4.PKS */
> Same question here as for PKRU wrt the BSP during S3 resume.

I had reasoned not, but it turns out that I'm wrong.

It's important to reset the cache back to 0 here.  (Handling PKRU is
different - I'll follow up on the other email..)

~Andrew

Re: [PATCH v2 3/8] x86/iommu: iommu_igfx, iommu_qinval and iommu_snoop are VT-d specific

2023-01-12 Thread Jan Beulich

On 12.01.2023 16:43, Xenia Ragiadakou wrote:
> On 1/12/23 13:49, Xenia Ragiadakou wrote:
>> On 1/12/23 13:31, Jan Beulich wrote:
>>> On 04.01.2023 09:44, Xenia Ragiadakou wrote:
 --- a/xen/include/xen/iommu.h
 +++ b/xen/include/xen/iommu.h
 @@ -74,9 +74,13 @@ extern enum __packed iommu_intremap {
  iommu_intremap_restricted,
  iommu_intremap_full,
   } iommu_intremap;
 -extern bool iommu_igfx, iommu_qinval, iommu_snoop;
   #else
   # define iommu_intremap false
 +#endif
 +
 +#ifdef CONFIG_INTEL_IOMMU
 +extern bool iommu_igfx, iommu_qinval, iommu_snoop;
 +#else
   # define iommu_snoop false
   #endif
>>>
>>> Do these declarations really need touching? In patch 2 you didn't move
>>> amd_iommu_perdev_intremap's either.
>>
>> Ok, I will revert this change (as I did in v2 of patch 2) since it is 
>> not needed.
> 
> Actually, my patch was altering the current behavior by defining 
> iommu_snoop as false when !INTEL_IOMMU.
> 
> IIUC, there is no control over snoop behavior when using the AMD iommu. 
> Hence, iommu_snoop should evaluate to true for AMD iommu.
> However, when using the INTEL iommu the user can disable it via the 
> "iommu" param, right?

That's the intended behavior, yes, but right now we allow the option
to also affect behavior on AMD - perhaps wrongly so, as there's one
use outside of VT-x and VT-d code. But of course the option is
documented to be there for VT-d only, so one can view it as user
error if it's used on a non-VT-d system.

> If that's the case then iommu_snoop needs to be moved from vtd/iommu.c 
> to x86/iommu.c and iommu_snoop assignment via iommu param needs to be 
> guarded by CONFIG_INTEL_IOMMU.

Or #define to true when !INTEL_IOMMU and keep the variable where it
is.

Jan

Re: [PATCH v2 3/8] x86/iommu: iommu_igfx, iommu_qinval and iommu_snoop are VT-d specific

2023-01-12 Thread Xenia Ragiadakou




On 1/12/23 13:49, Xenia Ragiadakou wrote:


On 1/12/23 13:31, Jan Beulich wrote:

On 04.01.2023 09:44, Xenia Ragiadakou wrote:

--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -82,11 +82,13 @@ static int __init cf_check 
parse_iommu_param(const char *s)
  else if ( ss == s + 23 && !strncmp(s, 
"quarantine=scratch-page", 23) )

  iommu_quarantine = IOMMU_quarantine_scratch_page;
  #endif
-#ifdef CONFIG_X86
+#ifdef CONFIG_INTEL_IOMMU
  else if ( (val = parse_boolean("igfx", s, ss)) >= 0 )
  iommu_igfx = val;
  else if ( (val = parse_boolean("qinval", s, ss)) >= 0 )
  iommu_qinval = val;
+#endif


You want to use no_config_param() here as well then.


Yes. I will fix it.




--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -74,9 +74,13 @@ extern enum __packed iommu_intremap {
 iommu_intremap_restricted,
 iommu_intremap_full,
  } iommu_intremap;
-extern bool iommu_igfx, iommu_qinval, iommu_snoop;
  #else
  # define iommu_intremap false
+#endif
+
+#ifdef CONFIG_INTEL_IOMMU
+extern bool iommu_igfx, iommu_qinval, iommu_snoop;
+#else
  # define iommu_snoop false
  #endif


Do these declarations really need touching? In patch 2 you didn't move
amd_iommu_perdev_intremap's either.


Ok, I will revert this change (as I did in v2 of patch 2) since it is 
not needed.


Actually, my patch was altering the current behavior by defining 
iommu_snoop as false when !INTEL_IOMMU.


IIUC, there is no control over snoop behavior when using the AMD iommu. 
Hence, iommu_snoop should evaluate to true for AMD iommu.
However, when using the INTEL iommu the user can disable it via the 
"iommu" param, right?


If that's the case then iommu_snoop needs to be moved from vtd/iommu.c 
to x86/iommu.c and iommu_snoop assignment via iommu param needs to be 
guarded by CONFIG_INTEL_IOMMU.


--
Xenia

[PATCH] x86/paravirt: merge activate_mm and dup_mmap callbacks

2023-01-12 Thread Juergen Gross

The two paravirt callbacks .mmu.activate_mm and .mmu.dup_mmap are
sharing the same implementations in all cases: for Xen PV guests they
are pinning the PGD of the new mm_struct, and for all other cases
they are a NOP.

So merge them to a common callback .mmu.enter_mmap (in contrast to the
corresponding already existing .mmu.exit_mmap).

As the first parameter of the old callbacks isn't used, drop it from
the replacement one.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/mmu_context.h|  4 ++--
 arch/x86/include/asm/paravirt.h   | 14 +++---
 arch/x86/include/asm/paravirt_types.h |  7 ++-
 arch/x86/kernel/paravirt.c|  3 +--
 arch/x86/xen/mmu_pv.c | 12 ++--
 5 files changed, 10 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index b8d40ddeab00..6a14b6c2165c 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -134,7 +134,7 @@ extern void switch_mm_irqs_off(struct mm_struct *prev, 
struct mm_struct *next,
 
 #define activate_mm(prev, next)\
 do {   \
-   paravirt_activate_mm((prev), (next));   \
+   paravirt_enter_mmap(next);  \
switch_mm((prev), (next), NULL);\
 } while (0);
 
@@ -167,7 +167,7 @@ static inline void arch_dup_pkeys(struct mm_struct *oldmm,
 static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
 {
arch_dup_pkeys(oldmm, mm);
-   paravirt_arch_dup_mmap(oldmm, mm);
+   paravirt_enter_mmap(mm);
return ldt_dup_context(oldmm, mm);
 }
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 73e9522db7c1..07bbdceaf35a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -332,16 +332,9 @@ static inline void tss_update_io_bitmap(void)
 }
 #endif
 
-static inline void paravirt_activate_mm(struct mm_struct *prev,
-   struct mm_struct *next)
+static inline void paravirt_enter_mmap(struct mm_struct *next)
 {
-   PVOP_VCALL2(mmu.activate_mm, prev, next);
-}
-
-static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
-{
-   PVOP_VCALL2(mmu.dup_mmap, oldmm, mm);
+   PVOP_VCALL1(mmu.enter_mmap, next);
 }
 
 static inline int paravirt_pgd_alloc(struct mm_struct *mm)
@@ -787,8 +780,7 @@ extern void default_banner(void);
 
 #ifndef __ASSEMBLY__
 #ifndef CONFIG_PARAVIRT_XXL
-static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
+static inline void paravirt_enter_mmap(struct mm_struct *mm)
 {
 }
 #endif
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 8c1da419260f..71bf64b963df 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -164,11 +164,8 @@ struct pv_mmu_ops {
unsigned long (*read_cr3)(void);
void (*write_cr3)(unsigned long);
 
-   /* Hooks for intercepting the creation/use of an mm_struct. */
-   void (*activate_mm)(struct mm_struct *prev,
-   struct mm_struct *next);
-   void (*dup_mmap)(struct mm_struct *oldmm,
-struct mm_struct *mm);
+   /* Hook for intercepting the creation/use of an mm_struct. */
+   void (*enter_mmap)(struct mm_struct *mm);
 
/* Hooks for allocating and freeing a pagetable top-level */
int  (*pgd_alloc)(struct mm_struct *mm);
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 327757afb027..ff1109b9c6cd 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -352,8 +352,7 @@ struct paravirt_patch_template pv_ops = {
.mmu.make_pte   = PTE_IDENT,
.mmu.make_pgd   = PTE_IDENT,
 
-   .mmu.dup_mmap   = paravirt_nop,
-   .mmu.activate_mm= paravirt_nop,
+   .mmu.enter_mmap = paravirt_nop,
 
.mmu.lazy_mode = {
.enter  = paravirt_nop,
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index ee29fb558f2e..b3b8d289b9ab 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -885,14 +885,7 @@ void xen_mm_unpin_all(void)
spin_unlock(_lock);
 }
 
-static void xen_activate_mm(struct mm_struct *prev, struct mm_struct *next)
-{
-   spin_lock(>page_table_lock);
-   xen_pgd_pin(next);
-   spin_unlock(>page_table_lock);
-}
-
-static void xen_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
+static void xen_enter_mmap(struct mm_struct *mm)
 {
spin_lock(>page_table_lock);
xen_pgd_pin(mm);
@@ -2153,8 +2146,7 @@ static const typeof(pv_ops) xen_mmu_ops __initconst = {
.make_p4d = PV_CALLEE_SAVE(xen_make_p4d),
 #endif
 
-

Re: [PATCH v2 6/8] x86/hvm: Enable guest access to MSR_PKRS

2023-01-12 Thread Jan Beulich

On 12.01.2023 15:16, Andrew Cooper wrote:
> On 12/01/2023 1:26 pm, Jan Beulich wrote:
>> The other thing I'd like to understand (and having an answer to this
>> would have been better before re-applying my R-b to this re-based
>> logic) is towards the lack of feature checks here. hvm_get_reg()
>> can be called from other than guest_rdmsr(), for an example see
>> arch_get_info_guest().
> 
> The point is to separate auditing logic (wants to be implemented only
> once) from data shuffling logic (is the value in a register, or the MSR
> lists, or VMCB/VMCS or struct vcpu, etc).  It is always the caller's
> responsibility to confirm that REG exists, and that VAL is suitable for REG.
> 
> arch_get_info_guest() passes MSR_SHADOW_GS_BASE which exists
> unilaterally (because we don't technically do !LM correctly.)
> 
> 
> But this is all discussed in the comment by the function prototypes. 
> I'm not sure how to make that any clearer than it already is.

Okay, and I'm sorry for having looked at the definitions without finding
any helpful comment, but not at the declarations. Certainly sufficient
to confirm that my R-b can remain as you already had it.

Jan

Re: [PATCH v2 4/8] x86: Initial support for WRMSRNS

2023-01-12 Thread Jan Beulich

On 12.01.2023 14:58, Andrew Cooper wrote:
> On 12/01/2023 12:58 pm, Jan Beulich wrote:
>> Do you have any indications towards a CS prefix being the least risky
>> one to use here (or in general)?
> 
> Yes.
> 
> Remember it's the prefix recommended for, and used by,
> -mbranches-within-32B-boundaries to work around the Skylake jmp errata.
> 
> And based on this justification, its also the prefix we use for padding
> on various jmp/call's for retpoline inlining purposes.

While I'm okay with the reply, I'd like to point out that in those cases
address or operand size prefix simply could not have been used, for the
insns in question having explicit operands which would be affected. Which
is unlike the case here.

Jan

Re: [PATCH] include/types: move stdlib.h-kind types to common header

2023-01-12 Thread Jan Beulich

On 12.01.2023 15:22, Andrew Cooper wrote:
> On 12/01/2023 2:01 pm, Jan Beulich wrote:
>> size_t, ssize_t, and ptrdiff_t are all expected to be uniformly defined
>> on any ports Xen might gain. In particular I hope new ports can rely on
>> __SIZE_TYPE__ and __PTRDIFF_TYPE__ being made available by the compiler.
>>
>> Signed-off-by: Jan Beulich 
> 
> Acked-by: Andrew Cooper 

Thanks.

>> ---
>> This is just to start with some hopefully uncontroversial low hanging fruit.
> 
> However, I'd advocate going one step further and making real a
> xen/stddef.h header to match our existing stdboot and stdarg, now that
> we have fully divorced ourselves from the compiler-provided freestanding
> headers.

Hmm, to be honest I'm not convinced. It'll be interesting to see what
other maintainers think about such further moving away from Linux'es
basic model.

> This way, the type are declared in the usual place in a C environment.
> 
> I was then also going to use this approach to start breaking up
> xen/lib.h which is a dumping ground of far too much stuff.  In
> particular, when we have a stddef.h, I think it is entirely reasonable
> to move things like ARRAY_SIZE/count_args()/etc into it, because they
> are entirely standard in the Xen codebase.

Yet these aren't what people would expect to live there. If we
introduce further std*.h, then I think these would better strictly
conform to what the C standard expects to be put there.

Jan

[ovmf test] 175740: all pass - PUSHED

2023-01-12 Thread osstest service owner

flight 175740 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175740/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf e5ec3ba409b5baa9cf429cc25fdf3c8d1b8dcef0
baseline version:
 ovmf fe405f08a09e9f2306c72aa23d8edfbcfaa23bff

Last test of basis   175711  2023-01-10 21:40:39 Z1 days
Testing same since   175740  2023-01-12 10:40:46 Z0 days1 attempts


People who touched revisions under test:
  Gerd Hoffmann 
  Laszlo Ersek 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   fe405f08a0..e5ec3ba409  e5ec3ba409b5baa9cf429cc25fdf3c8d1b8dcef0 -> 
xen-tested-master

[RFC][PATCH 1/6] x86/power: De-paravirt restore_processor_state()

2023-01-12 Thread Peter Zijlstra

Since Xen PV doesn't use restore_processor_state(), and we're going to
have to avoid CALL/RET until at least GS is restored, de-paravirt the
easy bits.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/power/cpu.c |   24 
 1 file changed, 12 insertions(+), 12 deletions(-)

--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -197,25 +197,25 @@ static void notrace __restore_processor_
struct cpuinfo_x86 *c;
 
if (ctxt->misc_enable_saved)
-   wrmsrl(MSR_IA32_MISC_ENABLE, ctxt->misc_enable);
+   native_wrmsrl(MSR_IA32_MISC_ENABLE, ctxt->misc_enable);
/*
 * control registers
 */
/* cr4 was introduced in the Pentium CPU */
 #ifdef CONFIG_X86_32
if (ctxt->cr4)
-   __write_cr4(ctxt->cr4);
+   native_write_cr4(ctxt->cr4);
 #else
 /* CONFIG X86_64 */
-   wrmsrl(MSR_EFER, ctxt->efer);
-   __write_cr4(ctxt->cr4);
+   native_wrmsrl(MSR_EFER, ctxt->efer);
+   native_write_cr4(ctxt->cr4);
 #endif
-   write_cr3(ctxt->cr3);
-   write_cr2(ctxt->cr2);
-   write_cr0(ctxt->cr0);
+   native_write_cr3(ctxt->cr3);
+   native_write_cr2(ctxt->cr2);
+   native_write_cr0(ctxt->cr0);
 
/* Restore the IDT. */
-   load_idt(>idt);
+   native_load_idt(>idt);
 
/*
 * Just in case the asm code got us here with the SS, DS, or ES
@@ -230,7 +230,7 @@ static void notrace __restore_processor_
 * handlers or in complicated helpers like load_gs_index().
 */
 #ifdef CONFIG_X86_64
-   wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
+   native_wrmsrl(MSR_GS_BASE, ctxt->kernelmode_gs_base);
 #else
loadsegment(fs, __KERNEL_PERCPU);
 #endif
@@ -246,15 +246,15 @@ static void notrace __restore_processor_
loadsegment(ds, ctxt->es);
loadsegment(es, ctxt->es);
loadsegment(fs, ctxt->fs);
-   load_gs_index(ctxt->gs);
+   native_load_gs_index(ctxt->gs);
 
/*
 * Restore FSBASE and GSBASE after restoring the selectors, since
 * restoring the selectors clobbers the bases.  Keep in mind
 * that MSR_KERNEL_GS_BASE is horribly misnamed.
 */
-   wrmsrl(MSR_FS_BASE, ctxt->fs_base);
-   wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
+   native_wrmsrl(MSR_FS_BASE, ctxt->fs_base);
+   native_wrmsrl(MSR_KERNEL_GS_BASE, ctxt->usermode_gs_base);
 #else
loadsegment(gs, ctxt->gs);
 #endif

[RFC][PATCH 4/6] x86/power: Sprinkle some noinstr

2023-01-12 Thread Peter Zijlstra

Ensure no compiler instrumentation sneaks in while restoring the CPU
state. Specifically we can't handle CALL/RET until GS is restored.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/power/cpu.c |   13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -192,7 +192,7 @@ static void fix_processor_context(void)
  * The asm code that gets us here will have restored a usable GDT, although
  * it will be pointing to the wrong alias.
  */
-static void notrace __restore_processor_state(struct saved_context *ctxt)
+static __always_inline void __restore_processor_state(struct saved_context 
*ctxt)
 {
struct cpuinfo_x86 *c;
 
@@ -235,6 +235,13 @@ static void notrace __restore_processor_
loadsegment(fs, __KERNEL_PERCPU);
 #endif
 
+   /*
+* Definitely wrong, but at this point we should have at least enough
+* to do CALL/RET (consider SKL callthunks) and this avoids having
+* to deal with the noinstr explosion for now :/
+*/
+   instrumentation_begin();
+
/* Restore the TSS, RO GDT, LDT, and usermode-relevant MSRs. */
fix_processor_context();
 
@@ -276,10 +283,12 @@ static void notrace __restore_processor_
 * because some of the MSRs are "emulated" in microcode.
 */
msr_restore_context(ctxt);
+
+   instrumentation_end();
 }
 
 /* Needed by apm.c */
-void notrace restore_processor_state(void)
+void noinstr restore_processor_state(void)
 {
__restore_processor_state(_context);
 }

[RFC][PATCH 2/6] x86/power: Inline write_cr[04]()

2023-01-12 Thread Peter Zijlstra

Since we can't do CALL/RET until GS is restored and CR[04] pinning is
of dubious value in this code path, simply write the stored values.

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/power/cpu.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -208,11 +208,11 @@ static void notrace __restore_processor_
 #else
 /* CONFIG X86_64 */
native_wrmsrl(MSR_EFER, ctxt->efer);
-   native_write_cr4(ctxt->cr4);
+   asm volatile("mov %0,%%cr4": "+r" (ctxt->cr4) : : "memory");
 #endif
native_write_cr3(ctxt->cr3);
native_write_cr2(ctxt->cr2);
-   native_write_cr0(ctxt->cr0);
+   asm volatile("mov %0,%%cr0": "+r" (ctxt->cr0) : : "memory");
 
/* Restore the IDT. */
native_load_idt(>idt);

[RFC][PATCH 3/6] x86/callthunk: No callthunk for restore_processor_state()

2023-01-12 Thread Peter Zijlstra

From: Joan Bruguera 

When resuming from suspend we don't have coherent CPU state, trying to
do callthunks here isn't going to work. Specifically GS isn't set yet.

Signed-off-by: Joan Bruguera 
Signed-off-by: Peter Zijlstra (Intel) 
Link: https://lkml.kernel.org/r/20230109040531.7888-1-joanbrugue...@gmail.com
---
 arch/x86/kernel/callthunks.c |5 +
 arch/x86/power/cpu.c |3 +++
 2 files changed, 8 insertions(+)

--- a/arch/x86/kernel/callthunks.c
+++ b/arch/x86/kernel/callthunks.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -151,6 +152,10 @@ static bool skip_addr(void *dest)
dest < (void*)hypercall_page + PAGE_SIZE)
return true;
 #endif
+#ifdef CONFIG_PM_SLEEP
+   if (dest == restore_processor_state)
+   return true;
+#endif
return false;
 }

[RFC][PATCH 6/6] x86/power: Seal restore_processor_state()

2023-01-12 Thread Peter Zijlstra

Disallow indirect branches to restore_processor_state().

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/suspend_64.h |4 
 arch/x86/power/cpu.c  |2 +-
 arch/x86/power/hibernate_asm_64.S |4 
 include/linux/suspend.h   |4 
 4 files changed, 13 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/suspend_64.h
+++ b/arch/x86/include/asm/suspend_64.h
@@ -9,6 +9,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Image of the saved processor state, used by the low level ACPI suspend to
@@ -61,4 +62,7 @@ struct saved_context {
 extern char core_restore_code[];
 extern char restore_registers[];
 
+#define restore_processor_state restore_processor_state
+extern __noendbr void restore_processor_state(void);
+
 #endif /* _ASM_X86_SUSPEND_64_H */
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -288,7 +288,7 @@ static __always_inline void __restore_pr
 }
 
 /* Needed by apm.c */
-void noinstr restore_processor_state(void)
+void __noendbr noinstr restore_processor_state(void)
 {
__restore_processor_state(_context);
 }
--- a/arch/x86/power/hibernate_asm_64.S
+++ b/arch/x86/power/hibernate_asm_64.S
@@ -23,6 +23,10 @@
 #include 
 #include 
 
+.pushsection .discard.noendbr
+.quad  restore_processor_state
+.popsection
+
 /* code below belongs to the image kernel */
.align PAGE_SIZE
 SYM_FUNC_START(restore_registers)
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #ifdef CONFIG_VT
 extern void pm_set_vt_switch(int);
@@ -483,7 +484,10 @@ extern struct mutex system_transition_mu
 
 #ifdef CONFIG_PM_SLEEP
 void save_processor_state(void);
+
+#ifndef restore_processor_state
 void restore_processor_state(void);
+#endif
 
 /* kernel/power/main.c */
 extern int register_pm_notifier(struct notifier_block *nb);

[RFC][PATCH 5/6] PM / hibernate: Add minimal noinstr annotations

2023-01-12 Thread Peter Zijlstra

When resuming there must not be any code between swsusp_arch_suspend()
and restore_processor_state() since the CPU state is ill defined at
this point in time.

Signed-off-by: Peter Zijlstra (Intel) 
---
 kernel/power/hibernate.c |   30 +++---
 1 file changed, 27 insertions(+), 3 deletions(-)

--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -280,6 +280,32 @@ __weak int arch_resume_nosmt(void)
return 0;
 }
 
+static noinstr int suspend_and_restore(void)
+{
+   int error;
+
+   /*
+* Strictly speaking swsusp_arch_suspend() should be noinstr too but it
+* is typically written in asm, as such, assume it is good and shut up
+* the validator.
+*/
+   instrumentation_begin();
+   error = swsusp_arch_suspend();
+   instrumentation_end();
+
+   /*
+* Architecture resume code 'returns' from the swsusp_arch_suspend()
+* call and resumes execution here with some very dodgy machine state.
+*
+* Compiler instrumentation between these two calls (or in
+* restore_processor_state() for that matter) will make life *very*
+* interesting indeed.
+*/
+   restore_processor_state();
+
+   return error;
+}
+
 /**
  * create_image - Create a hibernation image.
  * @platform_mode: Whether or not to use the platform driver.
@@ -323,9 +349,7 @@ static int create_image(int platform_mod
in_suspend = 1;
save_processor_state();
trace_suspend_resume(TPS("machine_suspend"), PM_EVENT_HIBERNATE, true);
-   error = swsusp_arch_suspend();
-   /* Restore control flow magically appears here */
-   restore_processor_state();
+   error = suspend_and_restore();
trace_suspend_resume(TPS("machine_suspend"), PM_EVENT_HIBERNATE, false);
if (error)
pr_err("Error %d creating image\n", error);

[RFC][PATCH 0/6] x86: Fix suspend vs retbleed=stuff

2023-01-12 Thread Peter Zijlstra

Hi,

I'm thinking these few patches should do the trick -- but I've only compiled
them and looked at the resulting asm output, I've not actually ran them.

Joan, could you kindly test?

The last (two) patches are optional fixes and should probably not go into 
/urgent.

Re: [PATCH] include/types: move stdlib.h-kind types to common header

2023-01-12 Thread Andrew Cooper

On 12/01/2023 2:01 pm, Jan Beulich wrote:
> size_t, ssize_t, and ptrdiff_t are all expected to be uniformly defined
> on any ports Xen might gain. In particular I hope new ports can rely on
> __SIZE_TYPE__ and __PTRDIFF_TYPE__ being made available by the compiler.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

Thankyou for starting this.

> ---
> This is just to start with some hopefully uncontroversial low hanging fruit.

However, I'd advocate going one step further and making real a
xen/stddef.h header to match our existing stdboot and stdarg, now that
we have fully divorced ourselves from the compiler-provided freestanding
headers.

This way, the type are declared in the usual place in a C environment.

I was then also going to use this approach to start breaking up
xen/lib.h which is a dumping ground of far too much stuff.  In
particular, when we have a stddef.h, I think it is entirely reasonable
to move things like ARRAY_SIZE/count_args()/etc into it, because they
are entirely standard in the Xen codebase.

~Andrew

Re: [PATCH v2 6/8] x86/hvm: Enable guest access to MSR_PKRS

2023-01-12 Thread Andrew Cooper

On 12/01/2023 1:26 pm, Jan Beulich wrote:
> On 10.01.2023 18:18, Andrew Cooper wrote:
>> @@ -2471,6 +2477,9 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, 
>> unsigned int reg)
>>  }
>>  return val;
>>  
>> +case MSR_PKRS:
>> +return (v == curr) ? rdpkrs() : msrs->pkrs;
> Nothing here or ...
>
>> @@ -2514,6 +2525,12 @@ static void cf_check vmx_set_reg(struct vcpu *v, 
>> unsigned int reg, uint64_t val)
>>  domain_crash(d);
>>  }
>>  return;
>> +
>> +case MSR_PKRS:
>> +msrs->pkrs = val;
>> +if ( v == curr )
>> +wrpkrs(val);
>> +return;
> ... here is VMX or (if we were to support it, just as a abstract
> consideration) HVM specific. Which makes me wonder why this needs
> handling in [gs]et_reg() in the first place. I guess I'm still not
> fully in sync with your longer term plans here ...

If (when) AMD implement it, the AMD form needs will be vmcb->pkrs and
not msrs->pkrs, because like all other paging controls, they'll be
swapped automatically by VMRUN.

(I don't know this for certain, but I'm happy to bet on it, given a
decade of consistency in this regard.)

> The other thing I'd like to understand (and having an answer to this
> would have been better before re-applying my R-b to this re-based
> logic) is towards the lack of feature checks here. hvm_get_reg()
> can be called from other than guest_rdmsr(), for an example see
> arch_get_info_guest().

The point is to separate auditing logic (wants to be implemented only
once) from data shuffling logic (is the value in a register, or the MSR
lists, or VMCB/VMCS or struct vcpu, etc).  It is always the caller's
responsibility to confirm that REG exists, and that VAL is suitable for REG.

arch_get_info_guest() passes MSR_SHADOW_GS_BASE which exists
unilaterally (because we don't technically do !LM correctly.)

But this is all discussed in the comment by the function prototypes. 
I'm not sure how to make that any clearer than it already is.

~Andrew

[xen-unstable-smoke test] 175741: tolerable all pass - PUSHED

2023-01-12 Thread osstest service owner

flight 175741 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/175741/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  661489874e87c0f6e21ac298b039aab9379f6ee0
baseline version:
 xen  83d9679db057d5736c7b5a56db06bb6bb66c3914

Last test of basis   175732  2023-01-12 00:00:28 Z0 days
Testing same since   175741  2023-01-12 11:01:58 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Xenia Ragiadakou 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   83d9679db0..661489874e  661489874e87c0f6e21ac298b039aab9379f6ee0 -> smoke

[PATCH] include/types: move stdlib.h-kind types to common header

2023-01-12 Thread Jan Beulich

size_t, ssize_t, and ptrdiff_t are all expected to be uniformly defined
on any ports Xen might gain. In particular I hope new ports can rely on
__SIZE_TYPE__ and __PTRDIFF_TYPE__ being made available by the compiler.

Signed-off-by: Jan Beulich 
---
This is just to start with some hopefully uncontroversial low hanging fruit.

--- a/xen/arch/arm/include/asm/types.h
+++ b/xen/arch/arm/include/asm/types.h
@@ -54,19 +54,6 @@ typedef u64 register_t;
 #define PRIregister "016lx"
 #endif
 
-#if defined(__SIZE_TYPE__)
-typedef __SIZE_TYPE__ size_t;
-#else
-typedef unsigned long size_t;
-#endif
-typedef signed long ssize_t;
-
-#if defined(__PTRDIFF_TYPE__)
-typedef __PTRDIFF_TYPE__ ptrdiff_t;
-#else
-typedef signed long ptrdiff_t;
-#endif
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* __ARM_TYPES_H__ */
--- a/xen/arch/x86/include/asm/types.h
+++ b/xen/arch/x86/include/asm/types.h
@@ -32,19 +32,6 @@ typedef unsigned long paddr_t;
 #define INVALID_PADDR (~0UL)
 #define PRIpaddr "016lx"
 
-#if defined(__SIZE_TYPE__)
-typedef __SIZE_TYPE__ size_t;
-#else
-typedef unsigned long size_t;
-#endif
-typedef signed long ssize_t;
-
-#if defined(__PTRDIFF_TYPE__)
-typedef __PTRDIFF_TYPE__ ptrdiff_t;
-#else
-typedef signed long ptrdiff_t;
-#endif
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* __X86_TYPES_H__ */
--- a/xen/include/xen/types.h
+++ b/xen/include/xen/types.h
@@ -5,6 +5,19 @@
 
 #include 
 
+#if defined(__SIZE_TYPE__)
+typedef __SIZE_TYPE__ size_t;
+#else
+typedef unsigned long size_t;
+#endif
+typedef signed long ssize_t;
+
+#if defined(__PTRDIFF_TYPE__)
+typedef __PTRDIFF_TYPE__ ptrdiff_t;
+#else
+typedef signed long ptrdiff_t;
+#endif
+
 #define BITS_TO_LONGS(bits) \
 (((bits)+BITS_PER_LONG-1)/BITS_PER_LONG)
 #define DECLARE_BITMAP(name,bits) \

Re: [PATCH v2 4/8] x86: Initial support for WRMSRNS

2023-01-12 Thread Andrew Cooper

On 12/01/2023 12:58 pm, Jan Beulich wrote:
> On 10.01.2023 18:18, Andrew Cooper wrote:
>> WRMSR Non-Serialising is an optimisation intended for cases where an MSR 
>> needs
>> updating, but architectural serialising properties are not needed.
>>
>> In is anticipated that this will apply to most if not all MSRs modified on
>> context switch paths.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 
>
> This will allow me to drop half of what the respective emulator patch
> consists of, which I'm yet to post (but which in turn is sitting on
> top of many other already posted emulator patches). Comparing with
> that patch, one nit though:

I did wonder if you had some stuff queued up.  I do need to get back to
reviewing.

>
>> --- a/tools/misc/xen-cpuid.c
>> +++ b/tools/misc/xen-cpuid.c
>> @@ -189,6 +189,7 @@ static const char *const str_7a1[32] =
>>  
>>  [10] = "fzrm",  [11] = "fsrs",
>>  [12] = "fsrcs",
>> +/* 18 */[19] = "wrmsrns",
>>  };
> We commonly leave a blank line to indicate dis-contiguous entries.

Oops yes.  Will fix.

>
>> --- a/xen/arch/x86/include/asm/msr.h
>> +++ b/xen/arch/x86/include/asm/msr.h
>> @@ -38,6 +38,18 @@ static inline void wrmsrl(unsigned int msr, __u64 val)
>>  wrmsr(msr, lo, hi);
>>  }
>>  
>> +/* Non-serialising WRMSR, when available.  Falls back to a serialising 
>> WRMSR. */
>> +static inline void wrmsr_ns(uint32_t msr, uint32_t lo, uint32_t hi)
>> +{
>> +/*
>> + * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant CS
>> + * prefix to avoid a trailing NOP.
>> + */
>> +alternative_input(".byte 0x2e; wrmsr",
>> +  ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS,
>> +  "c" (msr), "a" (lo), "d" (hi));
>> +}
> No wrmsrl_ns() and/or wrmsr_ns_safe() variants right away?

I still have a branch cleaning up MSR handling, which has been pending
since the Nanjing XenSummit, which makes some of those disappear.

But no - I wasn't planning to introduce helpers ahead of them being needed.

> Do you have any indications towards a CS prefix being the least risky
> one to use here (or in general)?

Yes.

Remember it's the prefix recommended for, and used by,
-mbranches-within-32B-boundaries to work around the Skylake jmp errata.

And based on this justification, its also the prefix we use for padding
on various jmp/call's for retpoline inlining purposes.

~Andrew

[qemu-mainline test] 175735: regressions - FAIL

2023-01-12 Thread osstest service owner

flight 175735 qemu-mainline real [real]
flight 175742 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/175735/
http://logs.test-lab.xenproject.org/osstest/logs/175742/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 175623
 test-amd64-i386-xl-vhd  21 guest-start/debian.repeat fail REGR. vs. 175623

Tests which are failing intermittently (not blocking):
 test-amd64-i386-pair 10 xen-install/src_host fail in 175742 pass in 175735
 test-amd64-i386-pair11 xen-install/dst_host fail pass in 175742-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175623
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175623
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175623
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175623
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175623
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 175623
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 175623
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175623
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never

Re: [PATCH v2 6/8] x86/hvm: Enable guest access to MSR_PKRS

2023-01-12 Thread Jan Beulich

On 10.01.2023 18:18, Andrew Cooper wrote:
> @@ -2471,6 +2477,9 @@ static uint64_t cf_check vmx_get_reg(struct vcpu *v, 
> unsigned int reg)
>  }
>  return val;
>  
> +case MSR_PKRS:
> +return (v == curr) ? rdpkrs() : msrs->pkrs;

Nothing here or ...

> @@ -2514,6 +2525,12 @@ static void cf_check vmx_set_reg(struct vcpu *v, 
> unsigned int reg, uint64_t val)
>  domain_crash(d);
>  }
>  return;
> +
> +case MSR_PKRS:
> +msrs->pkrs = val;
> +if ( v == curr )
> +wrpkrs(val);
> +return;

... here is VMX or (if we were to support it, just as a abstract
consideration) HVM specific. Which makes me wonder why this needs
handling in [gs]et_reg() in the first place. I guess I'm still not
fully in sync with your longer term plans here ...

The other thing I'd like to understand (and having an answer to this
would have been better before re-applying my R-b to this re-based
logic) is towards the lack of feature checks here. hvm_get_reg()
can be called from other than guest_rdmsr(), for an example see
arch_get_info_guest().

Jan

Re: [PATCH v2 5/8] x86/hvm: Context switch MSR_PKRS

2023-01-12 Thread Jan Beulich

On 10.01.2023 18:18, Andrew Cooper wrote:
> +static inline void wrpkrs(uint32_t pkrs)
> +{
> +uint32_t *this_pkrs = _cpu(pkrs);
> +
> +if ( *this_pkrs != pkrs )
> +{
> +*this_pkrs = pkrs;
> +
> +wrmsr_ns(MSR_PKRS, pkrs, 0);
> +}
> +}
> +
> +static inline void wrpkrs_and_cache(uint32_t pkrs)
> +{
> +this_cpu(pkrs) = pkrs;
> +wrmsr_ns(MSR_PKRS, pkrs, 0);
> +}

Just to confirm - there's no anticipation of uses of this in async
contexts, i.e. there's no concern about the ordering of cache vs hardware
writes?

> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -54,6 +54,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  /* opt_nosmp: If true, secondary processors are ignored. */
> @@ -1804,6 +1805,9 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>  if ( opt_invpcid && cpu_has_invpcid )
>  use_invpcid = true;
>  
> +if ( cpu_has_pks )
> +wrpkrs_and_cache(0); /* Must be before setting CR4.PKS */

Same question here as for PKRU wrt the BSP during S3 resume.

Jan

Re: [PATCH v2 4/8] x86: Initial support for WRMSRNS

2023-01-12 Thread Jan Beulich

On 10.01.2023 18:18, Andrew Cooper wrote:
> WRMSR Non-Serialising is an optimisation intended for cases where an MSR needs
> updating, but architectural serialising properties are not needed.
> 
> In is anticipated that this will apply to most if not all MSRs modified on
> context switch paths.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

This will allow me to drop half of what the respective emulator patch
consists of, which I'm yet to post (but which in turn is sitting on
top of many other already posted emulator patches). Comparing with
that patch, one nit though:

> --- a/tools/misc/xen-cpuid.c
> +++ b/tools/misc/xen-cpuid.c
> @@ -189,6 +189,7 @@ static const char *const str_7a1[32] =
>  
>  [10] = "fzrm",  [11] = "fsrs",
>  [12] = "fsrcs",
> +/* 18 */[19] = "wrmsrns",
>  };

We commonly leave a blank line to indicate dis-contiguous entries.

> --- a/xen/arch/x86/include/asm/msr.h
> +++ b/xen/arch/x86/include/asm/msr.h
> @@ -38,6 +38,18 @@ static inline void wrmsrl(unsigned int msr, __u64 val)
>  wrmsr(msr, lo, hi);
>  }
>  
> +/* Non-serialising WRMSR, when available.  Falls back to a serialising 
> WRMSR. */
> +static inline void wrmsr_ns(uint32_t msr, uint32_t lo, uint32_t hi)
> +{
> +/*
> + * WRMSR is 2 bytes.  WRMSRNS is 3 bytes.  Pad WRMSR with a redundant CS
> + * prefix to avoid a trailing NOP.
> + */
> +alternative_input(".byte 0x2e; wrmsr",
> +  ".byte 0x0f,0x01,0xc6", X86_FEATURE_WRMSRNS,
> +  "c" (msr), "a" (lo), "d" (hi));
> +}

No wrmsrl_ns() and/or wrmsr_ns_safe() variants right away?

Do you have any indications towards a CS prefix being the least risky
one to use here (or in general)? Recognizing that segment prefixes have
gained alternative meaning in certain contexts, I would otherwise wonder
whether an address or operand size prefix wouldn't be more suitable.

Jan

Re: [PATCH v2 1/8] x86/boot: Sanitise PKRU on boot

2023-01-12 Thread Jan Beulich

On 10.01.2023 18:18, Andrew Cooper wrote:
> While the reset value of the register is 0, it might not be after kexec/etc.
> If PKEY0.{WD,AD} have leaked in from an earlier context, construction of a PV
> dom0 will explode.
> 
> Sequencing wise, this must come after setting CR4.PKE, and before we touch any
> user mappings.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Wei Liu 
> 
> For sequencing, it could also come after setting XCR0.PKRU too, but then we'd
> need to construct an empty XSAVE area to XRSTOR from, and that would be even
> more horrible to arrange.

That would be ugly for other reasons as well, I think.

> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -936,6 +936,9 @@ void cpu_init(void)
>   write_debugreg(6, X86_DR6_DEFAULT);
>   write_debugreg(7, X86_DR7_DEFAULT);
>  
> + if (cpu_has_pku)
> + wrpkru(0);

What about the BSP during S3 resume? Shouldn't we play safe there too, just
in case?

Jan

Re: [PATCH v2 5/8] x86/iommu: the code addressing CVE-2011-1898 is VT-d specific

2023-01-12 Thread Xenia Ragiadakou




On 1/12/23 14:01, Jan Beulich wrote:

On 04.01.2023 09:44, Xenia Ragiadakou wrote:

The variable untrusted_msi indicates whether the system is vulnerable to
CVE-2011-1898. This vulnerablity is VT-d specific.


As per the reply by Andrew to v1, this vulnerability is generic to intremap-
incapable or intremap-disabled configurations. You want to say so. In turn
I wonder whether instead of the changes you're making you wouldn't want to
move the definition of the variable to xen/drivers/passthrough/x86/iommu.c.
A useful further step might be to guard its definition (not necessarily
its declaration; see replies to earlier patches) by CONFIG_PV instead (of
course I understand that's largely orthogonal to your series here, yet it
would fit easily with moving the definition).


Sure I can do that.




--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -127,7 +127,9 @@ int iommu_identity_mapping(struct domain *d, p2m_access_t 
p2ma,
 unsigned int flag);
  void iommu_identity_map_teardown(struct domain *d);
  
+#ifdef CONFIG_INTEL_IOMMU

  extern bool untrusted_msi;
+#endif


As per above / earlier comments I don't think this part is needed in any
event.


--- a/xen/arch/x86/pv/hypercall.c
+++ b/xen/arch/x86/pv/hypercall.c
@@ -193,8 +193,10 @@ void pv_ring1_init_hypercall_page(void *p)
  
  void do_entry_int82(struct cpu_user_regs *regs)

  {
+#ifdef CONFIG_INTEL_IOMMU
  if ( unlikely(untrusted_msi) )
  check_for_unexpected_msi((uint8_t)regs->entry_vector);
+#endif
  
  _pv_hypercall(regs, true /* compat */);

  }
diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
index ae01285181..8f2fb36770 100644
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -406,11 +406,13 @@ ENTRY(int80_direct_trap)
  .Lint80_cr3_okay:
  sti
  
+#ifdef CONFIG_INTEL_IOMMU

  cmpb  $0,untrusted_msi(%rip)
  UNLIKELY_START(ne, msi_check)
  movl  $0x80,%edi
  call  check_for_unexpected_msi
  UNLIKELY_END(msi_check)
+#endif
  
  movq  STACK_CPUINFO_FIELD(current_vcpu)(%rbx), %rbx
  




--
Xenia

Re: [PATCH v2 4/8] x86/acpi: separate AMD-Vi and VT-d specific functions

2023-01-12 Thread Jan Beulich

On 12.01.2023 13:08, Xenia Ragiadakou wrote:
> On 1/12/23 13:37, Jan Beulich wrote:
>> On 04.01.2023 09:44, Xenia Ragiadakou wrote:
>>> --- a/xen/arch/x86/include/asm/acpi.h
>>> +++ b/xen/arch/x86/include/asm/acpi.h
>>> @@ -140,8 +140,22 @@ extern u32 pmtmr_ioport;
>>>   extern unsigned int pmtmr_width;
>>>   
>>>   void acpi_iommu_init(void);
>>> +
>>> +#ifdef CONFIG_INTEL_IOMMU
>>>   int acpi_dmar_init(void);
>>> +void acpi_dmar_zap(void);
>>> +void acpi_dmar_reinstate(void);
>>> +#else
>>> +static inline int acpi_dmar_init(void) { return -ENODEV; }
>>> +static inline void acpi_dmar_zap(void) {}
>>> +static inline void acpi_dmar_reinstate(void) {}
>>> +#endif
>>
>> Leaving aside my request to drop that part of patch 3, you've kept
>> declarations for VT-d in the common header there. Which I consider
>> correct, knowing that VT-d was also used on IA-64 at the time. As
>> a result I would suppose movement might better be done in the other
>> direction here.
> 
> I moved it to the x86-specific header because acpi_dmar_init() was 
> declared there.
> I can move all of them to the common header.

I would prefer you doing so, yes, of course unless others object.

Jan

Re: [PATCH v2 8/8] x86/iommu: make AMD-Vi and Intel VT-d support configurable

2023-01-12 Thread Jan Beulich

On 04.01.2023 09:45, Xenia Ragiadakou wrote:
> Provide the user with configuration control over the IOMMU support by making
> AMD_IOMMU and INTEL_IOMMU options user selectable and able to be turned off.
> 
> However, there are cases where the IOMMU support is required, for instance for
> a system with more than 254 CPUs. In order to prevent users from unknowingly
> disabling it and ending up with a broken hypervisor, make the support user
> selectable only if EXPERT is enabled.
> 
> To preserve the current default configuration of an x86 system, both options
> depend on X86 and default to Y.
> 
> Signed-off-by: Xenia Ragiadakou 

Acked-by: Jan Beulich

Re: [PATCH v2 6/8] x86/iommu: call pi_update_irte through an hvm_function callback

2023-01-12 Thread Jan Beulich

On 12.01.2023 13:16, Jan Beulich wrote:
> On 04.01.2023 09:45, Xenia Ragiadakou wrote:
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2143,6 +2143,14 @@ static bool cf_check vmx_test_pir(const struct vcpu 
>> *v, uint8_t vec)
>>  return pi_test_pir(vec, >arch.hvm.vmx.pi_desc);
>>  }
>>  
>> +static int cf_check vmx_pi_update_irte(const struct vcpu *v,
>> +   const struct pirq *pirq, uint8_t 
>> gvec)
>> +{
>> +const struct pi_desc *pi_desc = v ? >arch.hvm.vmx.pi_desc : NULL;
>> +
>> +return pi_update_irte(pi_desc, pirq, gvec);
>> +}
> 
> This being the only caller of pi_update_irte(), I don't see the point in
> having the extra wrapper. Adjust pi_update_irte() such that it can be
> used as the intended hook directly. Plus perhaps prefix it with vtd_.

Plus move it to vtd/x86/hvm.c (!HVM builds shouldn't need it), albeit I
realize this could be done independent of your work. In principle the
function shouldn't be VT-d specific (and could hence live in x86/hvm.c),
as msi_msg_write_remap_rte() is already available as IOMMU hook anyway,
provided struct pi_desc turns out compatible with what's going to be
needed for AMD.

Jan

Re: [PATCH v2 7/8] x86/dpci: move hvm_dpci_isairq_eoi() to generic HVM code

2023-01-12 Thread Jan Beulich

On 04.01.2023 09:45, Xenia Ragiadakou wrote:
> The function hvm_dpci_isairq_eoi() has no dependencies on VT-d driver code
> and can be moved from xen/drivers/passthrough/vtd/x86/hvm.c to
> xen/drivers/passthrough/x86/hvm.c, along with the corresponding copyrights.
> 
> Remove the now empty xen/drivers/passthrough/vtd/x86/hvm.c.
> 
> Since the funcion is used only in this file, declare it static.
> 
> No functional change intended.
> 
> Signed-off-by: Xenia Ragiadakou 

Reviewed-by: Jan Beulich 
with a couple of cosmetic suggestions since you're touching this code
anyway:

> @@ -924,6 +925,48 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int 
> gsi)
>  hvm_pirq_eoi(pirq);
>  }
>  
> +static int cf_check _hvm_dpci_isairq_eoi(
> +struct domain *d, struct hvm_pirq_dpci *pirq_dpci, void *arg)
> +{
> +struct hvm_irq *hvm_irq = hvm_domain_irq(d);

I think this could become pointer-to-const.

> +unsigned int isairq = (long)arg;
> +const struct dev_intx_gsi_link *digl;
> +
> +list_for_each_entry ( digl, _dpci->digl_list, list )
> +{
> +unsigned int link = hvm_pci_intx_link(digl->device, digl->intx);
> +
> +if ( hvm_irq->pci_link.route[link] == isairq )
> +{
> +hvm_pci_intx_deassert(d, digl->device, digl->intx);
> +if ( --pirq_dpci->pending == 0 )
> +pirq_guest_eoi(dpci_pirq(pirq_dpci));
> +}
> +}
> +
> +return 0;
> +}
> +
> +static void hvm_dpci_isairq_eoi(struct domain *d, unsigned int isairq)
> +{
> +struct hvm_irq_dpci *dpci = NULL;

And this too.

> +ASSERT(isairq < NR_ISAIRQS);
> +if ( !is_iommu_enabled(d) )

A blank line between the above two would be nice.

> +return;
> +
> +write_lock(>event_lock);
> +
> +dpci = domain_get_irq_dpci(d);
> +
> +if ( dpci && test_bit(isairq, dpci->isairq_map) )
> +{
> +/* Multiple mirq may be mapped to one isa irq */
> +pt_pirq_iterate(d, _hvm_dpci_isairq_eoi, (void *)(long)isairq);
> +}
> +write_unlock(>event_lock);

For symmetry with code above this could to with a blank line ahead of it.

Jan

Re: [PATCH v2 6/8] x86/iommu: call pi_update_irte through an hvm_function callback

2023-01-12 Thread Jan Beulich

On 04.01.2023 09:45, Xenia Ragiadakou wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2143,6 +2143,14 @@ static bool cf_check vmx_test_pir(const struct vcpu 
> *v, uint8_t vec)
>  return pi_test_pir(vec, >arch.hvm.vmx.pi_desc);
>  }
>  
> +static int cf_check vmx_pi_update_irte(const struct vcpu *v,
> +   const struct pirq *pirq, uint8_t gvec)
> +{
> +const struct pi_desc *pi_desc = v ? >arch.hvm.vmx.pi_desc : NULL;
> +
> +return pi_update_irte(pi_desc, pirq, gvec);
> +}

This being the only caller of pi_update_irte(), I don't see the point in
having the extra wrapper. Adjust pi_update_irte() such that it can be
used as the intended hook directly. Plus perhaps prefix it with vtd_.

> @@ -2591,6 +2599,8 @@ static struct hvm_function_table __initdata_cf_clobber 
> vmx_function_table = {
>  .tsc_scaling = {
>  .max_ratio = VMX_TSC_MULTIPLIER_MAX,
>  },
> +
> +.pi_update_irte = vmx_pi_update_irte,

You want to install this hook only when iommu_intpost (i.e. the only case
when it can actually be called, and only when INTEL_IOMMU=y (avoiding the
need for an inline stub of pi_update_irte() or whatever its final name is
going to be.

> @@ -250,6 +252,9 @@ struct hvm_function_table {
>  /* Architecture function to setup TSC scaling ratio */
>  void (*setup)(struct vcpu *v);
>  } tsc_scaling;
> +
> +int (*pi_update_irte)(const struct vcpu *v,
> +  const struct pirq *pirq, uint8_t gvec);
>  };

Please can this be moved higher up, e.g. next to .

> @@ -774,6 +779,16 @@ static inline void hvm_set_nonreg_state(struct vcpu *v,
>  alternative_vcall(hvm_funcs.set_nonreg_state, v, nrs);
>  }
>  
> +static inline int hvm_pi_update_irte(const struct vcpu *v,
> + const struct pirq *pirq, uint8_t gvec)
> +{
> +if ( hvm_funcs.pi_update_irte )
> +return alternative_call(hvm_funcs.pi_update_irte, v, pirq, gvec);
> +
> +return -EOPNOTSUPP;

I don't think the conditional is needed, at least not with the other
suggested adjustments. Plus the way alternative patching works, a NULL
hook will be converted to some equivalent of BUG() anyway, so
ASSERT_UNREACHABLE() should also be unnecessary.

> +}
> +
> +
>  #else  /* CONFIG_HVM */

Please don't add double blank lines.

> --- a/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> +++ b/xen/arch/x86/include/asm/hvm/vmx/vmx.h
> @@ -146,6 +146,17 @@ static inline void pi_clear_sn(struct pi_desc *pi_desc)
>  clear_bit(POSTED_INTR_SN, _desc->control);
>  }
>  
> +#ifdef CONFIG_INTEL_IOMMU
> +int pi_update_irte(const struct pi_desc *pi_desc,
> +   const struct pirq *pirq, const uint8_t gvec);
> +#else
> +static inline int pi_update_irte(const struct pi_desc *pi_desc,
> + const struct pirq *pirq, const uint8_t gvec)
> +{
> +return -EOPNOTSUPP;
> +}
> +#endif

This still is a VT-d function, so I think its declaration would better
remain in asm/iommu.h.

Jan

Re: [PATCH v2 4/8] x86/acpi: separate AMD-Vi and VT-d specific functions

2023-01-12 Thread Xenia Ragiadakou




On 1/12/23 13:37, Jan Beulich wrote:

On 04.01.2023 09:44, Xenia Ragiadakou wrote:

The functions acpi_dmar_init() and acpi_dmar_zap/reinstate() are
VT-d specific while the function acpi_ivrs_init() is AMD-Vi specific.
To eliminate dead code, they need to be guarded under CONFIG_INTEL_IOMMU
and CONFIG_AMD_IOMMU, respectively.

Instead of adding #ifdef guards around the function calls, implement them
as empty static inline functions.

Take the opportunity to move the declarations of acpi_dmar_zap/reinstate() to
the arch specific header.

No functional change intended.

Signed-off-by: Xenia Ragiadakou 


While I'm not opposed to ack the change in this form, I have a question
first:


--- a/xen/arch/x86/include/asm/acpi.h
+++ b/xen/arch/x86/include/asm/acpi.h
@@ -140,8 +140,22 @@ extern u32 pmtmr_ioport;
  extern unsigned int pmtmr_width;
  
  void acpi_iommu_init(void);

+
+#ifdef CONFIG_INTEL_IOMMU
  int acpi_dmar_init(void);
+void acpi_dmar_zap(void);
+void acpi_dmar_reinstate(void);
+#else
+static inline int acpi_dmar_init(void) { return -ENODEV; }
+static inline void acpi_dmar_zap(void) {}
+static inline void acpi_dmar_reinstate(void) {}
+#endif


Leaving aside my request to drop that part of patch 3, you've kept
declarations for VT-d in the common header there. Which I consider
correct, knowing that VT-d was also used on IA-64 at the time. As
a result I would suppose movement might better be done in the other
direction here.


I moved it to the x86-specific header because acpi_dmar_init() was 
declared there.

I can move all of them to the common header.




+#ifdef CONFIG_AMD_IOMMU
  int acpi_ivrs_init(void);
+#else
+static inline int acpi_ivrs_init(void) { return -ENODEV; }
+#endif


For AMD, otoh, without there being a 2nd architecture re-using
their IOMMU, moving into the x86-specific header is certainly fine,
no matter that there's a slim chance that this may need moving the
other direction down the road.


--
Xenia

1 2 >

1 - 100 of 128 matches

Mail list logo