Re: E820 memory allocation issue on Threadripper platforms

2024-01-16 Thread Patrick Plenefisch
On Tue, Jan 16, 2024 at 4:33 AM Jan Beulich  wrote:

> On 16.01.2024 01:22, Patrick Plenefisch wrote:
> > I managed to set up serial access and saved the output with the requested
> > flags as the attached logs
>
> Thanks. While you didn't ...
>
>
> ... fiddle with the Linux message,  ...
>

I last built the kernel over a decade ago, and so was hoping to not have to
look up how to do that again, but I can research how to go about that again
if it would help?


>
> ... as per
>
> (XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x100 -> 0x4a0
>
> there's an overlap with not exactly a hole, but with an
> EfiACPIMemoryNVS region:
>
> (XEN)  00010-003159fff type=2 attr=000f
> (XEN)  00315a000-003ff type=7 attr=000f
> (XEN)  00400-004045fff type=10 attr=000f
> (XEN)  004046000-009afefff type=7 attr=000f
>
> (the 3rd of the 4 lines). Considering there's another region higher
> up:
>
> (XEN)  0a747f000-0a947efff type=10 attr=000f
>
> I'm inclined to say it is poor firmware (or, far less likely, boot
> loader) behavior to clobber a rather low and entirely arbitrary RAM
>

Bootloader is Grub 2.06 EFI platform as packaged by Debian 12



> range, rather than consolidating all such regions near the top of
> RAM below 4Gb. There are further such odd regions, btw:
>
> (XEN)  009aff000-009ff type=0 attr=000f
> ...
> (XEN)  00b00-00b020fff type=0 attr=000f
>
> If the kernel image was sufficiently much larger, these could become
> a problem as well. Otoh if the kernel wasn't built with
> CONFIG_PHYSICAL_START=0x100, i.e. to start at 16Mb, but at, say,
> 2Mb, things should apparently work even with this unusual memory
> layout (until the kernel would grow enough to again run into that
> very region).
>

I'm currently talking to the vendor's support team and testing a beta BIOS
for unrelated reasons, is there something specific I should forward to
them, either as a question or as a request for a fix?

As someone who hasn't built a kernel in over a decade, should I figure out
how to do a kernel build with CONFIG_PHYSICAL_START=0x200 and report
back?


> It remains to be seen in how far it is reasonably possible to work
> around this in the kernel. While (sadly) still unsupported, in the
> meantime you may want to consider running Dom0 in PVH mode.
>

I tried this by adding dom0=pvh, and instead got this boot error:

(XEN) xenoprof: Initialization failed. AMD processor family 25 is not
supported
(XEN) NX (Execute Disable) protection active
(XEN) Dom0 has maximum 1400 PIRQs
(XEN) *** Building a PVH Dom0 ***
(XEN) Failed to load kernel: -1
(XEN) Xen dom0 kernel broken ELF: 
(XEN) Failed to load Dom0 kernel
(XEN)
(XEN) 
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) 
(XEN)
(XEN) Reboot in five seconds...




>
> Jan
>


[linux-5.4 test] 184370: regressions - FAIL

2024-01-16 Thread osstest service owner
flight 184370 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184370/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-pvops 6 kernel-build fail REGR. vs. 184339

Tests which did not succeed, but are not blocking:
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-bios  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-uefi  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-armhf-armhf-xl-credit1  14 guest-start  fail  like 184327
 test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeatfail  like 184327
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeatfail like 184334
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184339
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184339
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184339
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184339
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184339
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184339
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184339
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 

Re: [PATCH v12 14/15] xen/arm: vpci: permit access to guest vpci space

2024-01-16 Thread Stewart Hildebrand
On 1/9/24 16:51, Stewart Hildebrand wrote:
> Move iomem_caps initialization earlier (before arch_domain_create()).
> 
> Signed-off-by: Stewart Hildebrand 

Since the iomem_access_permitted() check over in ("vpci/header: program p2m 
with guest BAR view") was changed to use MFNs (it used GFNs in an earlier rev) 
this whole patch should be dropped. The toolstack already does what this patch 
was trying to do with XEN_DOMCTL_iomem_permission.



Re: [PATCH v12.2 09/15] vpci/header: program p2m with guest BAR view

2024-01-16 Thread Stewart Hildebrand
On 1/15/24 14:44, Stewart Hildebrand wrote:
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index feccd070ddd0..8483404c5e91 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -41,13 +42,24 @@ static int cf_check map_range(
>  unsigned long s, unsigned long e, void *data, unsigned long *c)
>  {
>  const struct map_data *map = data;
> +/* Start address of the BAR as seen by the guest. */
> +unsigned long start_gfn = PFN_DOWN(map->bar->guest_addr);
> +/* Physical start address of the BAR. */
> +unsigned long start_mfn = PFN_DOWN(map->bar->addr);
>  int rc;
>  
>  for ( ; ; )
>  {
>  unsigned long size = e - s + 1;
> +/*
> + * Ranges to be mapped don't always start at the BAR start address, 
> as
> + * there can be holes or partially consumed ranges. Account for the
> + * offset of the current address from the BAR start.
> + */
> +unsigned long map_mfn = start_mfn + s - start_gfn;
> +unsigned long m_end = map_mfn + size - 1;
>  
> -if ( !iomem_access_permitted(map->d, s, e) )
> +if ( !iomem_access_permitted(map->d, map_mfn, m_end) )

Nit: since this check will now use map_mfn and m_end...

>  {
>  printk(XENLOG_G_WARNING
> "%pd denied access to MMIO range [%#lx, %#lx]\n",
> map->d, s, e);

... I'd like to also update the arguments passed to this print statement.



[ovmf test] 184379: all pass - PUSHED

2024-01-16 Thread osstest service owner
flight 184379 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184379/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 59f024c76ee57c2bec84794536302fc770cd6ec2
baseline version:
 ovmf 9971b99461e930008e3d35bc0a4a310b6afa57f6

Last test of basis   184371  2024-01-16 09:42:46 Z0 days
Testing same since   184379  2024-01-16 23:44:09 Z0 days1 attempts


People who touched revisions under test:
  Gua Guo 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   9971b99461..59f024c76e  59f024c76ee57c2bec84794536302fc770cd6ec2 -> 
xen-tested-master



[linux-linus test] 184368: regressions - FAIL

2024-01-16 Thread osstest service owner
flight 184368 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184368/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-xl-credit1  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl-xsm  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl-thunderx 12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-xl-credit2  12 debian-install   fail REGR. vs. 184270
 test-arm64-arm64-libvirt-xsm 12 debian-install   fail REGR. vs. 184270

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl 22 guest-start/debian.repeat fail in 184359 pass in 184368
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 12 debian-hvm-install fail pass 
in 184359

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184270
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184270
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 184270
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184270
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184270
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184270
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184270
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184270
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 linux052d534373b7ed33712a63d5e17b2b6cdbce84fd
baseline version:
 linux0dd3ee31125508cd67f7e7172247f05b7fd1753a

Last test of basis   184270  2024-01-07 20:42:19 Z9 days
Failing since184283  2024-01-08 20:10:43 Z8 days   14 attempts
Testing same since   184338  2024-01-13 05:40:28 Z3 days7 attempts


1701 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 

[PATCH v2 (resend) 21/27] x86/setup: Do not create valid mappings when directmap=no

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Create empty mappings in the second e820 pass. Also, destroy existing
direct map mappings created in the first pass.

To make xenheap pages visible in guests, it is necessary to create empty
L3 tables in the direct map even when directmap=no, since guest cr3s
copy idle domain's L4 entries, which means they will share mappings in
the direct map if we pre-populate idle domain's L4 entries and L3
tables. A helper is introduced for this.

Also, after the direct map is actually gone, we need to stop updating
the direct map in update_xen_mappings().

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3b698c8c41..84c496ac4a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -976,6 +976,57 @@ static struct domain *__init create_dom0(const module_t 
*image,
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
+/*
+ * This either populates a valid direct map, or allocates empty L3 tables and
+ * creates the L4 entries for virtual address between [start, end) in the
+ * direct map depending on has_directmap();
+ *
+ * When directmap=no, we still need to populate empty L3 tables in the
+ * direct map region. The reason is that on-demand xenheap mappings are
+ * created in the idle domain's page table but must be seen by
+ * everyone. Since all domains share the direct map L4 entries, they
+ * will share xenheap mappings if we pre-populate the L4 entries and L3
+ * tables in the direct map region for all RAM. We also rely on the fact
+ * that L3 tables are never freed.
+ */
+static void __init populate_directmap(uint64_t pstart, uint64_t pend,
+  unsigned int flags)
+{
+unsigned long vstart = (unsigned long)__va(pstart);
+unsigned long vend = (unsigned long)__va(pend);
+
+if ( pstart >= pend )
+return;
+
+BUG_ON(vstart < DIRECTMAP_VIRT_START);
+BUG_ON(vend > DIRECTMAP_VIRT_END);
+
+if ( has_directmap() )
+/* Populate valid direct map. */
+BUG_ON(map_pages_to_xen(vstart, maddr_to_mfn(pstart),
+PFN_DOWN(pend - pstart), flags));
+else
+{
+/* Create empty L3 tables. */
+unsigned long vaddr = vstart & ~((1UL << L4_PAGETABLE_SHIFT) - 1);
+
+for ( ; vaddr < vend; vaddr += (1UL << L4_PAGETABLE_SHIFT) )
+{
+l4_pgentry_t *pl4e = _pg_table[l4_table_offset(vaddr)];
+
+if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
+{
+mfn_t mfn = alloc_boot_pages(1, 1);
+void *v = map_domain_page(mfn);
+
+clear_page(v);
+UNMAP_DOMAIN_PAGE(v);
+l4e_write(pl4e, l4e_from_mfn(mfn, __PAGE_HYPERVISOR));
+}
+}
+}
+}
+
 void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
 {
 const char *memmap_type = NULL, *loader, *cmdline = "";
@@ -1596,8 +1647,17 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 map_e = min_t(uint64_t, e,
   ARRAY_SIZE(l2_directmap) << L2_PAGETABLE_SHIFT);
 
-/* Pass mapped memory to allocator /before/ creating new mappings. */
+/*
+ * Pass mapped memory to allocator /before/ creating new mappings.
+ * The direct map for the bottom 4GiB has been populated in the first
+ * e820 pass. In the second pass, we make sure those existing mappings
+ * are destroyed when directmap=no.
+ */
 init_boot_pages(s, min(map_s, e));
+if ( !has_directmap() )
+destroy_xen_mappings((unsigned long)__va(s),
+ (unsigned long)__va(min(map_s, e)));
+
 s = map_s;
 if ( s < map_e )
 {
@@ -1605,6 +1665,9 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 map_s = (s + mask) & ~mask;
 map_e &= ~mask;
 init_boot_pages(map_s, map_e);
+if ( !has_directmap() )
+destroy_xen_mappings((unsigned long)__va(map_s),
+ (unsigned long)__va(map_e));
 }
 
 if ( map_s > map_e )
@@ -1618,8 +1681,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 
 if ( map_e < end )
 {
-map_pages_to_xen((unsigned long)__va(map_e), 
maddr_to_mfn(map_e),
- PFN_DOWN(end - map_e), PAGE_HYPERVISOR);
+populate_directmap(map_e, end, PAGE_HYPERVISOR);
 init_boot_pages(map_e, end);
 map_e = end;
 }
@@ -1628,13 +1690,11 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 {
 /* This range must not be passed to the boot allocator and
  * must also not be mapped with _PAGE_GLOBAL. */
-  

[PATCH v2 (resend) 16/27] xen/x86: Add build assertion for fixmap entries

2024-01-16 Thread Elias El Yandouzi
The early fixed addresses must all fit into the static L1 table.
Introduce a build assertion to this end.

Signed-off-by: Elias El Yandouzi 



 Changes in v2:
 * New patch

diff --git a/xen/arch/x86/include/asm/fixmap.h 
b/xen/arch/x86/include/asm/fixmap.h
index a7ac365fc6..904bee0480 100644
--- a/xen/arch/x86/include/asm/fixmap.h
+++ b/xen/arch/x86/include/asm/fixmap.h
@@ -77,6 +77,11 @@ enum fixed_addresses {
 #define FIXADDR_SIZE  (__end_of_fixed_addresses << PAGE_SHIFT)
 #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
 
+static inline void fixaddr_build_assertion(void)
+{
+BUILD_BUG_ON(FIX_PMAP_END > L1_PAGETABLE_ENTRIES - 1);
+}
+
 extern void __set_fixmap(
 enum fixed_addresses idx, unsigned long mfn, unsigned long flags);
 
-- 
2.40.1




[PATCH v2 (resend) 22/27] Rename mfn_to_virt() calls

2024-01-16 Thread Elias El Yandouzi
Until directmap gets completely removed, we'd still need to
keep some calls to mfn_to_virt() for xenheap pages or when the
directmap is enabled.

Rename the macro to mfn_to_directmap_virt() to flag them and
prevent further use of mfn_to_virt().

Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index cbcf3bf147..9a94d7eaf7 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -336,6 +336,7 @@ static inline uint64_t gvirt_to_maddr(vaddr_t va, paddr_t 
*pa,
  */
 #define virt_to_mfn(va) __virt_to_mfn(va)
 #define mfn_to_virt(mfn)__mfn_to_virt(mfn)
+#define mfn_to_directmap_virt(mfn) mfn_to_virt(mfn)
 
 /* Convert between Xen-heap virtual addresses and page-info structures. */
 static inline struct page_info *virt_to_page(const void *v)
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 89caefc8a2..62d6fee0f4 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -81,14 +81,14 @@ void *map_domain_page(mfn_t mfn)
 
 #ifdef NDEBUG
 if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
-return mfn_to_virt(mfn_x(mfn));
+return mfn_to_directmap_virt(mfn_x(mfn));
 #endif
 
 v = mapcache_current_vcpu();
 if ( !v || !v->domain->arch.mapcache.inuse )
 {
 if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
-return mfn_to_virt(mfn_x(mfn));
+return mfn_to_directmap_virt(mfn_x(mfn));
 else
 {
 BUG_ON(system_state >= SYS_STATE_smp_boot);
@@ -324,7 +324,7 @@ void *map_domain_page_global(mfn_t mfn)
 
 #ifdef NDEBUG
 if ( arch_mfn_in_directmap(mfn_x(mfn, 1)) )
-return mfn_to_virt(mfn_x(mfn));
+return mfn_to_directmap_virt(mfn_x(mfn));
 #endif
 
 return vmap(, 1);
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index e59f6657d9..1b3ebae16f 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -439,7 +439,7 @@ static int __init pvh_populate_p2m(struct domain *d)
  d->arch.e820[i].addr + d->arch.e820[i].size);
 enum hvm_translation_result res =
  hvm_copy_to_guest_phys(mfn_to_maddr(_mfn(addr)),
-mfn_to_virt(addr),
+mfn_to_directmap_virt(addr),
 end - d->arch.e820[i].addr,
 v);
 
@@ -613,7 +613,7 @@ static int __init pvh_load_kernel(struct domain *d, const 
module_t *image,
 
 if ( initrd != NULL )
 {
-rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start),
+rc = hvm_copy_to_guest_phys(last_addr, 
mfn_to_directmap_virt(initrd->mod_start),
 initrd->mod_end, v);
 if ( rc )
 {
diff --git a/xen/arch/x86/include/asm/page.h b/xen/arch/x86/include/asm/page.h
index 350d1fb110..c6891b52d4 100644
--- a/xen/arch/x86/include/asm/page.h
+++ b/xen/arch/x86/include/asm/page.h
@@ -268,7 +268,7 @@ void copy_page_sse2(void *to, const void *from);
  */
 #define mfn_valid(mfn)  __mfn_valid(mfn_x(mfn))
 #define virt_to_mfn(va) __virt_to_mfn(va)
-#define mfn_to_virt(mfn)__mfn_to_virt(mfn)
+#define mfn_to_directmap_virt(mfn)__mfn_to_virt(mfn)
 #define virt_to_maddr(va)   __virt_to_maddr((unsigned long)(va))
 #define maddr_to_virt(ma)   __maddr_to_virt((unsigned long)(ma))
 #define maddr_to_page(ma)   __maddr_to_page(ma)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a72c32d87c..9530c93b68 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -318,8 +318,8 @@ void __init arch_init_memory(void)
 iostart_pfn = max_t(unsigned long, pfn, 1UL << (20 - PAGE_SHIFT));
 ioend_pfn = min(rstart_pfn, 16UL << (20 - PAGE_SHIFT));
 if ( iostart_pfn < ioend_pfn )
-destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
- (unsigned long)mfn_to_virt(ioend_pfn));
+destroy_xen_mappings((unsigned 
long)mfn_to_directmap_virt(iostart_pfn),
+ (unsigned 
long)mfn_to_directmap_virt(ioend_pfn));
 
 /* Mark as I/O up to next RAM region. */
 for ( ; pfn < rstart_pfn; pfn++ )
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 84c496ac4a..de69b7935c 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -400,7 +400,7 @@ void *__init bootstrap_map(const module_t *mod)
 void *ret;
 
 if ( system_state != SYS_STATE_early_boot )
-return mod ? mfn_to_virt(mod->mod_start) : NULL;
+return mod ? mfn_to_directmap_virt(mod->mod_start) : NULL;
 
 if ( !mod )
 {
@@ -1703,7 +1703,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 {
 set_pdx_range(mod[i].mod_start,
   mod[i].mod_start + PFN_UP(mod[i].mod_end));
-

[PATCH v2 (resend) 27/27] xen/arm64: Allow the admin to enable/disable the directmap

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

Implement the same command line option as x86 to enable/disable the
directmap. By default this is kept enabled.

Also modify setup_directmap_mappings() to populate the L0 entries
related to the directmap area.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* Rely on the Kconfig option to enable Secret Hiding on Arm64
* Use generic helper instead of arch_has_directmap()

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 63c946f482..df90b1c4c9 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -799,7 +799,7 @@ that enabling this option cannot guarantee anything beyond 
what underlying
 hardware guarantees (with, where available and known to Xen, respective
 tweaks applied).
 
-### directmap (x86)
+### directmap (arm64, x86)
 > `= `
 
 > Default: `true`
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 278243f0d6..7a19826233 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM_64
depends on !ARM_32
select 64BIT
select HAS_FAST_MULTIPLY
+   select HAS_SECRET_HIDING
 
 config ARM
def_bool y
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index f4a81aa705..22e1e5b9f4 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -157,16 +157,27 @@ void __init switch_ttbr(uint64_t ttbr)
 update_identity_mapping(false);
 }
 
-/* Map the region in the directmap area. */
+/*
+ * This either populate a valid fdirect map, or allocates empty L1 tables
+ * and creates the L0 entries for the given region in the direct map
+ * depending on has_directmap().
+ *
+ * When directmap=no, we still need to populate empty L1 tables in the
+ * directmap region. The reason is that the root page-table (i.e. L0)
+ * is per-CPU and secondary CPUs will initialize their root page-table
+ * based on the pCPU0 one. So L0 entries will be shared if they are
+ * pre-populated. We also rely on the fact that L1 tables are never
+ * freed.
+ */
 static void __init setup_directmap_mappings(unsigned long base_mfn,
 unsigned long nr_mfns)
 {
+unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
 int rc;
 
 /* First call sets the directmap physical and virtual offset. */
 if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
 {
-unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
 
 directmap_mfn_start = _mfn(base_mfn);
 directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
@@ -187,6 +198,24 @@ static void __init setup_directmap_mappings(unsigned long 
base_mfn,
 panic("cannot add directmap mapping at %lx below heap start %lx\n",
   base_mfn, mfn_x(directmap_mfn_start));
 
+if ( !has_directmap() )
+{
+vaddr_t vaddr = (vaddr_t)__mfn_to_virt(base_mfn);
+lpae_t *root = this_cpu(xen_pgtable);
+unsigned int i, slot;
+
+slot = first_table_offset(vaddr);
+nr_mfns += base_mfn - mfn_gb;
+for ( i = 0; i < nr_mfns; i += BIT(XEN_PT_LEVEL_ORDER(0), UL), slot++ )
+{
+lpae_t *entry = [slot];
+
+if ( !lpae_is_valid(*entry) && !create_xen_table(entry) )
+panic("Unable to populate zeroeth slot %u\n", slot);
+}
+return;
+}
+
 rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
   _mfn(base_mfn), nr_mfns,
   PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
diff --git a/xen/arch/arm/include/asm/arm64/mm.h 
b/xen/arch/arm/include/asm/arm64/mm.h
index e0bd23a6ed..5888f29159 100644
--- a/xen/arch/arm/include/asm/arm64/mm.h
+++ b/xen/arch/arm/include/asm/arm64/mm.h
@@ -3,13 +3,10 @@
 
 extern DEFINE_PAGE_TABLE(xen_pgtable);
 
-/*
- * On ARM64, all the RAM is currently direct mapped in Xen.
- * Hence return always true.
- */
+/* On Arm64, the user can chose whether all the RAM is directmap. */
 static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 {
-return true;
+return has_directmap();
 }
 
 void arch_setup_page_tables(void);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b15a18a494..7fb75c5c3e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3dec365c57..2bd060d321 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -748,6 +748,7 @@ void asmlinkage __init start_xen(unsigned long 
boot_phys_offset,
 cmdline_parse(cmdline);
 
 setup_mm();
+printk("Booting with directmap %s\n", has_directmap() ? "on" : "off");
 
 vm_init();
 
-- 
2.40.1




[PATCH v2 (resend) 20/27] x86/setup: vmap heap nodes when they are outside the direct map

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When we do not have a direct map, archs_mfn_in_direct_map() will always
return false, thus init_node_heap() will allocate xenheap pages from an
existing node for the metadata of a new node. This means that the
metadata of a new node is in a different node, slowing down heap
allocation.

Since we now have early vmap, vmap the metadata locally in the new node.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* vmap_contig_pages() was renamed to vmap_contig()
* Fix indentation and coding style

Changes from Hongyan's version:
* arch_mfn_in_direct_map() was renamed to
  arch_mfns_in_direct_map()
* Use vmap_contig_pages() rather than __vmap(...).
* Add missing include (xen/vmap.h) so it compiles on Arm

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 52934ec5c1..42b9aaae1c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -136,6 +136,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -604,22 +605,44 @@ static unsigned long init_node_heap(int node, unsigned 
long mfn,
 needed = 0;
 }
 else if ( *use_tail && nr >= needed &&
-  arch_mfns_in_directmap(mfn + nr - needed, needed) &&
   (!xenheap_bits ||
-   !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
+  !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
 {
-_heap[node] = mfn_to_virt(mfn + nr - needed);
-avail[node] = mfn_to_virt(mfn + nr - 1) +
-  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
+{
+_heap[node] = mfn_to_virt(mfn + nr - needed);
+avail[node] = mfn_to_virt(mfn + nr - 1) +
+  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+}
+else
+{
+mfn_t needed_start = _mfn(mfn + nr - needed);
+
+_heap[node] = vmap_contig(needed_start, needed);
+BUG_ON(!_heap[node]);
+avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
+  sizeof(**avail) * NR_ZONES;
+}
 }
 else if ( nr >= needed &&
-  arch_mfns_in_directmap(mfn, needed) &&
   (!xenheap_bits ||
-   !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )
+  !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )
 {
-_heap[node] = mfn_to_virt(mfn);
-avail[node] = mfn_to_virt(mfn + needed - 1) +
-  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+if ( arch_mfns_in_directmap(mfn, needed) )
+{
+_heap[node] = mfn_to_virt(mfn);
+avail[node] = mfn_to_virt(mfn + needed - 1) +
+  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+}
+else
+{
+mfn_t needed_start = _mfn(mfn);
+
+_heap[node] = vmap_contig(needed_start, needed);
+BUG_ON(!_heap[node]);
+avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
+  sizeof(**avail) * NR_ZONES;
+}
 *use_tail = false;
 }
 else if ( get_order_from_bytes(sizeof(**_heap)) ==
-- 
2.40.1




[PATCH v2 (resend) 24/27] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

The arm32 version of init_secondary_pagetables() will soon be re-used
for arm64 as well where the root table starts at level 0 rather than level 1.

So rename 'first' to 'root'.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changelog in v2:
* Rebase
* Fix typo

diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index b6fc0aae07..fb5df667ba 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -84,32 +84,30 @@ int prepare_secondary_mm(int cpu)
 #else
 int prepare_secondary_mm(int cpu)
 {
-lpae_t *first;
+lpae_t *root = alloc_xenheap_page();
 
-first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level 
trie */
-
-if ( !first )
+if ( !root )
 {
-printk("CPU%u: Unable to allocate the first page-table\n", cpu);
+printk("CPU%u: Unable to allocate the root page-table\n", cpu);
 return -ENOMEM;
 }
 
 /* Initialise root pagetable from root of boot tables */
-memcpy(first, per_cpu(xen_pgtable, 0), PAGE_SIZE);
-per_cpu(xen_pgtable, cpu) = first;
+memcpy(root, per_cpu(xen_pgtable, 0), PAGE_SIZE);
+per_cpu(xen_pgtable, cpu) = root;
 
 if ( !init_domheap_mappings(cpu) )
 {
 printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
 per_cpu(xen_pgtable, cpu) = NULL;
-free_xenheap_page(first);
+free_xenheap_page(root);
 return -ENOMEM;
 }
 
 clear_boot_pagetables();
 
 /* Set init_ttbr for this CPU coming up */
-init_ttbr = __pa(first);
+init_ttbr = __pa(root);
 clean_dcache(init_ttbr);
 
 return 0;
-- 
2.40.1




[PATCH v2 (resend) 23/27] Rename maddr_to_virt() calls

2024-01-16 Thread Elias El Yandouzi
Until directmap gets completely removed, we'd still need to
keep some calls to mmaddr_to_virt() for xenheap pages or when the
directmap is enabled.

Rename the macro to maddr_to_directmap_virt() to flag them and
prevent further use of maddr_to_virt().

Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/x86/dmi_scan.c b/xen/arch/x86/dmi_scan.c
index 81f80c053a..ac016f3a04 100644
--- a/xen/arch/x86/dmi_scan.c
+++ b/xen/arch/x86/dmi_scan.c
@@ -277,7 +277,7 @@ const char *__init dmi_get_table(paddr_t *base, u32 *len)
return "SMBIOS";
}
} else {
-   char __iomem *p = maddr_to_virt(0xF), *q;
+   char __iomem *p = maddr_to_directmap_virt(0xF), *q;
union {
struct dmi_eps dmi;
struct smbios3_eps smbios3;
@@ -364,7 +364,7 @@ static int __init dmi_iterate(void (*decode)(const struct 
dmi_header *))
dmi.size = 0;
smbios3.length = 0;
 
-   p = maddr_to_virt(0xF);
+   p = maddr_to_directmap_virt(0xF);
for (q = p; q < p + 0x1; q += 16) {
if (!dmi.size) {
memcpy_fromio(, q, sizeof(dmi));
diff --git a/xen/arch/x86/include/asm/mach-default/bios_ebda.h 
b/xen/arch/x86/include/asm/mach-default/bios_ebda.h
index 42de6b2a5b..8cfe53d1f2 100644
--- a/xen/arch/x86/include/asm/mach-default/bios_ebda.h
+++ b/xen/arch/x86/include/asm/mach-default/bios_ebda.h
@@ -7,7 +7,7 @@
  */
 static inline unsigned int get_bios_ebda(void)
 {
-   unsigned int address = *(unsigned short *)maddr_to_virt(0x40E);
+   unsigned int address = *(unsigned short 
*)maddr_to_directmap_virt(0x40E);
address <<= 4;
return address; /* 0 means none */
 }
diff --git a/xen/arch/x86/include/asm/page.h b/xen/arch/x86/include/asm/page.h
index c6891b52d4..bf7bf08ba4 100644
--- a/xen/arch/x86/include/asm/page.h
+++ b/xen/arch/x86/include/asm/page.h
@@ -240,11 +240,11 @@ void copy_page_sse2(void *to, const void *from);
 
 /* Convert between Xen-heap virtual addresses and machine addresses. */
 #define __pa(x) (virt_to_maddr(x))
-#define __va(x) (maddr_to_virt(x))
+#define __va(x) (maddr_to_directmap_virt(x))
 
 /* Convert between Xen-heap virtual addresses and machine frame numbers. */
 #define __virt_to_mfn(va)   (virt_to_maddr(va) >> PAGE_SHIFT)
-#define __mfn_to_virt(mfn)  (maddr_to_virt((paddr_t)(mfn) << PAGE_SHIFT))
+#define __mfn_to_virt(mfn)  (maddr_to_directmap_virt((paddr_t)(mfn) << 
PAGE_SHIFT))
 
 /* Convert between machine frame numbers and page-info structures. */
 #define mfn_to_page(mfn)(frame_table + mfn_to_pdx(mfn))
@@ -270,7 +270,7 @@ void copy_page_sse2(void *to, const void *from);
 #define virt_to_mfn(va) __virt_to_mfn(va)
 #define mfn_to_directmap_virt(mfn)__mfn_to_virt(mfn)
 #define virt_to_maddr(va)   __virt_to_maddr((unsigned long)(va))
-#define maddr_to_virt(ma)   __maddr_to_virt((unsigned long)(ma))
+#define maddr_to_directmap_virt(ma)   __maddr_to_directmap_virt((unsigned 
long)(ma))
 #define maddr_to_page(ma)   __maddr_to_page(ma)
 #define page_to_maddr(pg)   __page_to_maddr(pg)
 #define virt_to_page(va)__virt_to_page(va)
diff --git a/xen/arch/x86/include/asm/x86_64/page.h 
b/xen/arch/x86/include/asm/x86_64/page.h
index f49e10475f..b9e47da46e 100644
--- a/xen/arch/x86/include/asm/x86_64/page.h
+++ b/xen/arch/x86/include/asm/x86_64/page.h
@@ -46,7 +46,7 @@ static inline unsigned long __virt_to_maddr(unsigned long va)
 return xen_phys_start + va - XEN_VIRT_START;
 }
 
-static inline void *__maddr_to_virt(unsigned long ma)
+static inline void *__maddr_to_directmap_virt(unsigned long ma)
 {
 /* Offset in the direct map, accounting for pdx compression */
 unsigned long va_offset = maddr_to_directmapoff(ma);
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449..69181b0abe 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -664,7 +664,7 @@ void __init get_smp_config (void)
 
 static int __init smp_scan_config (unsigned long base, unsigned long length)
 {
-   unsigned int *bp = maddr_to_virt(base);
+   unsigned int *bp = maddr_to_directmap_virt(base);
struct intel_mp_floating *mpf;
 
Dprintk("Scan SMP from %p for %ld bytes.\n", bp,length);
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 39aed5845d..1b02e2b6d5 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1764,7 +1764,7 @@ void __init efi_init_memory(void)
 if ( map_pages_to_xen((unsigned 
long)mfn_to_directmap_virt(smfn),
 _mfn(smfn), emfn - smfn, prot) == 0 )
 desc->VirtualStart =
-(unsigned long)maddr_to_virt(desc->PhysicalStart);
+(unsigned 
long)maddr_to_directmap_virt(desc->PhysicalStart);
 else
 printk(XENLOG_ERR "Could not 

[PATCH v2 (resend) 18/27] xen/page_alloc: Add a path for xenheap when there is no direct map

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When there is not an always-mapped direct map, xenheap allocations need
to be mapped and unmapped on-demand.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



I have left the call to map_pages_to_xen() and destroy_xen_mappings()
in the split heap for now. I am not entirely convinced this is necessary
because in that setup only the xenheap would be always mapped and
this doesn't contain any guest memory (aside the grant-table).
So map/unmapping for every allocation seems unnecessary.

Changes in v2:
* Fix remaining wrong indentation in alloc_xenheap_pages()

Changes since Hongyan's version:
* Rebase
* Fix indentation in alloc_xenheap_pages()
* Fix build for arm32

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index a3746cfbcf..52934ec5c1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2237,6 +2237,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
 struct page_info *pg;
+void *ret;
 
 ASSERT_ALLOC_CONTEXT();
 
@@ -2245,17 +2246,36 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 if ( unlikely(pg == NULL) )
 return NULL;
 
+ret = page_to_virt(pg);
+
+if ( !has_directmap() &&
+ map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
+  PAGE_HYPERVISOR) )
+{
+/* Failed to map xenheap pages. */
+free_heap_pages(pg, order, false);
+return NULL;
+}
+
 return page_to_virt(pg);
 }
 
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
+unsigned long va = (unsigned long)v & PAGE_MASK;
+
 ASSERT_ALLOC_CONTEXT();
 
 if ( v == NULL )
 return;
 
+if ( !has_directmap() &&
+ destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
+dprintk(XENLOG_WARNING,
+"Error while destroying xenheap mappings at %p, order %u\n",
+v, order);
+
 free_heap_pages(virt_to_page(v), order, false);
 }
 
@@ -2279,6 +2299,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 {
 struct page_info *pg;
 unsigned int i;
+void *ret;
 
 ASSERT_ALLOC_CONTEXT();
 
@@ -2291,16 +2312,28 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 if ( unlikely(pg == NULL) )
 return NULL;
 
+ret = page_to_virt(pg);
+
+if ( !has_directmap() &&
+ map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
+  PAGE_HYPERVISOR) )
+{
+/* Failed to map xenheap pages. */
+free_domheap_pages(pg, order);
+return NULL;
+}
+
 for ( i = 0; i < (1u << order); i++ )
 pg[i].count_info |= PGC_xen_heap;
 
-return page_to_virt(pg);
+return ret;
 }
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
 struct page_info *pg;
 unsigned int i;
+unsigned long va = (unsigned long)v & PAGE_MASK;
 
 ASSERT_ALLOC_CONTEXT();
 
@@ -2312,6 +2345,12 @@ void free_xenheap_pages(void *v, unsigned int order)
 for ( i = 0; i < (1u << order); i++ )
 pg[i].count_info &= ~PGC_xen_heap;
 
+if ( !has_directmap() &&
+ destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
+dprintk(XENLOG_WARNING,
+"Error while destroying xenheap mappings at %p, order %u\n",
+v, order);
+
 free_heap_pages(pg, order, true);
 }
 
-- 
2.40.1




[PATCH v2 (resend) 19/27] x86/setup: Leave early boot slightly earlier

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When we do not have a direct map, memory for metadata of heap nodes in
init_node_heap() is allocated from xenheap, which needs to be mapped and
unmapped on demand. However, we cannot just take memory from the boot
allocator to create the PTEs while we are passing memory to the heap
allocator.

To solve this race, we leave early boot slightly sooner so that Xen PTE
pages are allocated from the heap instead of the boot allocator. We can
do this because the metadata for the 1st node is statically allocated,
and by the time we need memory to create mappings for the 2nd node, we
already have enough memory in the heap allocator in the 1st node.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b813ea75b5..3b698c8c41 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1746,6 +1746,22 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 
 numa_initmem_init(0, raw_max_page);
 
+/*
+ * When we do not have a direct map, memory for metadata of heap nodes in
+ * init_node_heap() is allocated from xenheap, which needs to be mapped and
+ * unmapped on demand. However, we cannot just take memory from the boot
+ * allocator to create the PTEs while we are passing memory to the heap
+ * allocator during end_boot_allocator().
+ *
+ * To solve this race, we need to leave early boot before
+ * end_boot_allocator() so that Xen PTE pages are allocated from the heap
+ * instead of the boot allocator. We can do this because the metadata for
+ * the 1st node is statically allocated, and by the time we need memory to
+ * create mappings for the 2nd node, we already have enough memory in the
+ * heap allocator in the 1st node.
+ */
+system_state = SYS_STATE_boot;
+
 if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
 {
 unsigned long lo = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
@@ -1777,8 +1793,6 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 else
 end_boot_allocator();
 
-system_state = SYS_STATE_boot;
-
 bsp_stack = cpu_alloc_stack(0);
 if ( !bsp_stack )
 panic("No memory for BSP stack\n");
-- 
2.40.1




[PATCH v2 (resend) 26/27] xen/arm64: Implement a mapcache for arm64

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

At the moment, on arm64, map_domain_page() is implemented using
virt_to_mfn(). Therefore it is relying on the directmap.

In a follow-up patch, we will allow the admin to remove the directmap.
Therefore we want to implement a mapcache.

Thanksfully there is already one for arm32. So select ARCH_ARM_DOMAIN_PAGE
and add the necessary boiler plate to support 64-bit:
- The page-table start at level 0, so we need to allocate the level
  1 page-table
- map_domain_page() should check if the page is in the directmap. If
  yes, then use virt_to_mfn() to limit the performance impact
  when the directmap is still enabled (this will be selectable
  on the command line).

Take the opportunity to replace first_table_offset(...) with offsets[...].

Note that, so far, arch_mfns_in_directmap() always return true on
arm64. So the mapcache is not yet used. This will change in a
follow-up patch.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



There are a few TODOs:
- It is becoming more critical to fix the mapcache
  implementation (this is not compliant with the Arm Arm)
- Evaluate the performance

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 50e9bfae1a..278243f0d6 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -1,7 +1,6 @@
 config ARM_32
def_bool y
depends on "$(ARCH)" = "arm32"
-   select ARCH_MAP_DOMAIN_PAGE
 
 config ARM_64
def_bool y
@@ -12,6 +11,7 @@ config ARM_64
 config ARM
def_bool y
select HAS_ALTERNATIVE
+   select ARCH_MAP_DOMAIN_PAGE
select HAS_DEVICE_TREE
select HAS_PASSTHROUGH
select HAS_UBSAN
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index 4f339efb7b..f4a81aa705 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -236,6 +237,14 @@ void __init setup_mm(void)
 setup_frametable_mappings(ram_start, ram_end);
 max_page = PFN_DOWN(ram_end);
 
+/*
+ * The allocators may need to use map_domain_page() (such as for
+ * scrubbing pages). So we need to prepare the domheap area first.
+ */
+if ( !init_domheap_mappings(smp_processor_id()) )
+panic("CPU%u: Unable to prepare the domheap page-tables\n",
+  smp_processor_id());
+
 init_staticmem_pages();
 }
 
diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
index ac2a6d0332..0f6ba48892 100644
--- a/xen/arch/arm/domain_page.c
+++ b/xen/arch/arm/domain_page.c
@@ -1,4 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
+#include 
 #include 
 #include 
 #include 
@@ -8,6 +9,8 @@
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+#undef mfn_to_virt
+#define mfn_to_virt(va) __mfn_to_virt(mfn_x(mfn))
 
 /* cpu0's domheap page tables */
 static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES);
@@ -31,13 +34,30 @@ bool init_domheap_mappings(unsigned int cpu)
 {
 unsigned int order = get_order_from_pages(DOMHEAP_SECOND_PAGES);
 lpae_t *root = per_cpu(xen_pgtable, cpu);
+lpae_t *first;
 unsigned int i, first_idx;
 lpae_t *domheap;
 mfn_t mfn;
 
+/* Convenience aliases */
+DECLARE_OFFSETS(offsets, DOMHEAP_VIRT_START);
+
 ASSERT(root);
 ASSERT(!per_cpu(xen_dommap, cpu));
 
+/*
+ * On Arm64, the root is at level 0. Therefore we need an extra step
+ * to allocate the first level page-table.
+ */
+#ifdef CONFIG_ARM_64
+if ( create_xen_table([offsets[0]]) )
+return false;
+
+first = xen_map_table(lpae_get_mfn(root[offsets[0]]));
+#else
+first = root;
+#endif
+
 /*
  * The domheap for cpu0 is initialized before the heap is initialized.
  * So we need to use pre-allocated pages.
@@ -58,16 +78,20 @@ bool init_domheap_mappings(unsigned int cpu)
  * domheap mapping pages.
  */
 mfn = virt_to_mfn(domheap);
-first_idx = first_table_offset(DOMHEAP_VIRT_START);
+first_idx = offsets[1];
 for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ )
 {
 lpae_t pte = mfn_to_xen_entry(mfn_add(mfn, i), MT_NORMAL);
 pte.pt.table = 1;
-write_pte([first_idx + i], pte);
+write_pte([first_idx + i], pte);
 }
 
 per_cpu(xen_dommap, cpu) = domheap;
 
+#ifdef CONFIG_ARM_64
+xen_unmap_table(first);
+#endif
+
 return true;
 }
 
@@ -91,6 +115,10 @@ void *map_domain_page(mfn_t mfn)
 lpae_t pte;
 int i, slot;
 
+/* Bypass the mapcache if the page is in the directmap */
+if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
+return mfn_to_virt(mfn);
+
 local_irq_save(flags);
 
 /* The map is laid out as an open-addressed hash table where each
@@ -153,13 +181,25 @@ void *map_domain_page(mfn_t mfn)
 /* Release a mapping taken with map_domain_page() */
 void 

[PATCH v2 (resend) 25/27] xen/arm64: mm: Use per-pCPU page-tables

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

At the moment, on Arm64, every pCPU is sharing the same page-tables.

In a follow-up patch, we will allow the possibility to remove the
direct map and therefore it will be necessary to have a mapcache.

While we have plenty of spare virtual address space to reserve part
for each pCPU, it means that temporary mappings (e.g. guest memory)
could be accessible by every pCPU.

In order to increase our security posture, it would be better if
those mappings are only accessible by the pCPU doing the temporary
mapping.

In addition to that, a per-pCPU page-tables opens the way to have
per-domain mapping area.

Arm32 is already using per-pCPU page-tables so most of the code
can be re-used. Arm64 doesn't yet have support for the mapcache,
so a stub is provided (moved to its own header asm/domain_page.h).

Take the opportunity to fix a typo in a comment that is modified.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changelog since v1:
* Rebase
* Fix typoes

diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index d2651c9486..4f339efb7b 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -75,6 +75,7 @@ static void __init prepare_runtime_identity_mapping(void)
 paddr_t id_addr = virt_to_maddr(_start);
 lpae_t pte;
 DECLARE_OFFSETS(id_offsets, id_addr);
+lpae_t *root = this_cpu(xen_pgtable);
 
 if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
 panic("Cannot handle ID mapping above %uTB\n",
@@ -85,7 +86,7 @@ static void __init prepare_runtime_identity_mapping(void)
 pte.pt.table = 1;
 pte.pt.xn = 0;
 
-write_pte(_pgtable[id_offsets[0]], pte);
+write_pte([id_offsets[0]], pte);
 
 /* Link second ID table */
 pte = pte_of_xenaddr((vaddr_t)xen_second_id);
diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
index 3a43601623..ac2a6d0332 100644
--- a/xen/arch/arm/domain_page.c
+++ b/xen/arch/arm/domain_page.c
@@ -3,6 +3,8 @@
 #include 
 #include 
 
+#include 
+
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
diff --git a/xen/arch/arm/include/asm/arm32/mm.h 
b/xen/arch/arm/include/asm/arm32/mm.h
index 856f2dbec4..87a315db01 100644
--- a/xen/arch/arm/include/asm/arm32/mm.h
+++ b/xen/arch/arm/include/asm/arm32/mm.h
@@ -1,12 +1,6 @@
 #ifndef __ARM_ARM32_MM_H__
 #define __ARM_ARM32_MM_H__
 
-#include 
-
-#include 
-
-DECLARE_PER_CPU(lpae_t *, xen_pgtable);
-
 /*
  * Only a limited amount of RAM, called xenheap, is always mapped on ARM32.
  * For convenience always return false.
@@ -16,8 +10,6 @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, 
unsigned long nr)
 return false;
 }
 
-bool init_domheap_mappings(unsigned int cpu);
-
 static inline void arch_setup_page_tables(void)
 {
 }
diff --git a/xen/arch/arm/include/asm/domain_page.h 
b/xen/arch/arm/include/asm/domain_page.h
new file mode 100644
index 00..e9f52685e2
--- /dev/null
+++ b/xen/arch/arm/include/asm/domain_page.h
@@ -0,0 +1,13 @@
+#ifndef __ASM_ARM_DOMAIN_PAGE_H__
+#define __ASM_ARM_DOMAIN_PAGE_H__
+
+#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
+bool init_domheap_mappings(unsigned int cpu);
+#else
+static inline bool init_domheap_mappings(unsigned int cpu)
+{
+return true;
+}
+#endif
+
+#endif /* __ASM_ARM_DOMAIN_PAGE_H__ */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 9a94d7eaf7..a76578a16f 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -2,6 +2,9 @@
 #define __ARCH_ARM_MM__
 
 #include 
+#include 
+
+#include 
 #include 
 #include 
 #include 
diff --git a/xen/arch/arm/include/asm/mmu/mm.h 
b/xen/arch/arm/include/asm/mmu/mm.h
index c5e03a66bf..c03c3a51e4 100644
--- a/xen/arch/arm/include/asm/mmu/mm.h
+++ b/xen/arch/arm/include/asm/mmu/mm.h
@@ -2,6 +2,8 @@
 #ifndef __ARM_MMU_MM_H__
 #define __ARM_MMU_MM_H__
 
+DECLARE_PER_CPU(lpae_t *, xen_pgtable);
+
 /* Non-boot CPUs use this to find the correct pagetables. */
 extern uint64_t init_ttbr;
 
diff --git a/xen/arch/arm/mmu/pt.c b/xen/arch/arm/mmu/pt.c
index a7755728ae..e772ab4e66 100644
--- a/xen/arch/arm/mmu/pt.c
+++ b/xen/arch/arm/mmu/pt.c
@@ -606,9 +606,9 @@ static int xen_pt_update(unsigned long virt,
 unsigned long left = nr_mfns;
 
 /*
- * For arm32, page-tables are different on each CPUs. Yet, they share
- * some common mappings. It is assumed that only common mappings
- * will be modified with this function.
+ * Page-tables are different on each CPU. Yet, they share some common
+ * mappings. It is assumed that only common mappings will be modified
+ * with this function.
  *
  * XXX: Add a check.
  */
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index 57f1b46499..8c81e26da3 100644
--- a/xen/arch/arm/mmu/setup.c
+++ b/xen/arch/arm/mmu/setup.c
@@ -26,17 +26,15 @@
  * PCPUs.
  */
 
-#ifdef 

[PATCH v2 (resend) 17/27] x86/domain_page: Remove the fast paths when mfn is not in the directmap

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When mfn is not in direct map, never use mfn_to_virt for any mappings.

We replace mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) with
arch_mfns_in_direct_map(mfn, 1) because these two are equivalent. The
extra comparison in arch_mfns_in_direct_map() looks different but because
DIRECTMAP_VIRT_END is always higher, it does not make any difference.

Lastly, domain_page_map_to_mfn() needs to gain to a special case for
the PMAP.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 



Changes since Hongyan's version:
* arch_mfn_in_direct_map() was renamed to arch_mfns_in_directmap()
* add a special case for the PMAP in domain_page_map_to_mfn()

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 55e337aaf7..89caefc8a2 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -14,8 +14,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 
 static DEFINE_PER_CPU(struct vcpu *, override);
@@ -35,10 +37,11 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 /*
  * When using efi runtime page tables, we have the equivalent of the idle
  * domain's page tables but current may point at another domain's VCPU.
- * Return NULL as though current is not properly set up yet.
+ * Return the idle domains's vcpu on that core because the efi per-domain
+ * region (where the mapcache is) is in-sync with the idle domain.
  */
 if ( efi_rs_using_pgtables() )
-return NULL;
+return idle_vcpu[smp_processor_id()];
 
 /*
  * If guest_table is NULL, and we are running a paravirtualised guest,
@@ -77,18 +80,24 @@ void *map_domain_page(mfn_t mfn)
 struct vcpu_maphash_entry *hashent;
 
 #ifdef NDEBUG
-if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
 return mfn_to_virt(mfn_x(mfn));
 #endif
 
 v = mapcache_current_vcpu();
-if ( !v )
-return mfn_to_virt(mfn_x(mfn));
+if ( !v || !v->domain->arch.mapcache.inuse )
+{
+if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
+return mfn_to_virt(mfn_x(mfn));
+else
+{
+BUG_ON(system_state >= SYS_STATE_smp_boot);
+return pmap_map(mfn);
+}
+}
 
 dcache = >domain->arch.mapcache;
 vcache = >arch.mapcache;
-if ( !dcache->inuse )
-return mfn_to_virt(mfn_x(mfn));
 
 perfc_incr(map_domain_page_count);
 
@@ -184,6 +193,12 @@ void unmap_domain_page(const void *ptr)
 if ( !va || va >= DIRECTMAP_VIRT_START )
 return;
 
+if ( va >= FIXADDR_START && va < FIXADDR_TOP )
+{
+pmap_unmap((void *)ptr);
+return;
+}
+
 ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
 v = mapcache_current_vcpu();
@@ -237,7 +252,7 @@ int mapcache_domain_init(struct domain *d)
 unsigned int bitmap_pages;
 
 #ifdef NDEBUG
-if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+if ( !mem_hotplug && arch_mfn_in_directmap(0, max_page) )
 return 0;
 #endif
 
@@ -308,7 +323,7 @@ void *map_domain_page_global(mfn_t mfn)
 local_irq_is_enabled()));
 
 #ifdef NDEBUG
-if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+if ( arch_mfn_in_directmap(mfn_x(mfn, 1)) )
 return mfn_to_virt(mfn_x(mfn));
 #endif
 
@@ -335,6 +350,23 @@ mfn_t domain_page_map_to_mfn(const void *ptr)
 if ( va >= DIRECTMAP_VIRT_START )
 return _mfn(virt_to_mfn(ptr));
 
+/*
+ * The fixmap is stealing the top-end of the VMAP. So the check for
+ * the PMAP *must* happen first.
+ *
+ * Also, the fixmap translate a slot to an address backwards. The
+ * logic will rely on it to avoid any complexity. So check at
+ * compile time this will always hold.
+*/
+BUILD_BUG_ON(fix_to_virt(FIX_PMAP_BEGIN) < fix_to_virt(FIX_PMAP_END));
+
+if ( ((unsigned long)fix_to_virt(FIX_PMAP_END) <= va) &&
+ ((va & PAGE_MASK) <= (unsigned long)fix_to_virt(FIX_PMAP_BEGIN)) )
+{
+BUG_ON(system_state >= SYS_STATE_smp_boot);
+return l1e_get_mfn(l1_fixmap[l1_table_offset(va)]);
+}
+
 if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
 return vmap_to_mfn(va);
 
-- 
2.40.1




[PATCH v2 (resend) 13/27] x86: Add a boot option to enable and disable the direct map

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Also add a helper function to retrieve it. Change arch_mfns_in_direct_map
to check this option before returning.

This is added as a Kconfig option as well as a boot command line option.
While being generic, the Kconfig option is only usable for x86 at the moment.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 



Changes in V2:
* Introduce a Kconfig option
* Reword the commit message
* Make opt_directmap and helper generic

Changes since Hongyan's version:
* Reword the commit message
* opt_directmap is only modified during boot so mark it as
  __ro_after_init

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 8e65f8bd18..63c946f482 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -799,6 +799,18 @@ that enabling this option cannot guarantee anything beyond 
what underlying
 hardware guarantees (with, where available and known to Xen, respective
 tweaks applied).
 
+### directmap (x86)
+> `= `
+
+> Default: `true`
+
+Enable or disable the direct map region in Xen.
+
+By default, Xen creates the direct map region which maps physical memory
+in that region. Setting this to no will remove the direct map, blocking
+exploits that leak secrets via speculative memory access in the direct
+map.
+
 ### dma_bits
 > `= `
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 1acdffc51c..350f41b832 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -29,6 +29,7 @@ config X86
select HAS_UBSAN
select HAS_VPCI if HVM
select NEEDS_LIBELF
+   select HAS_SECRET_HIDING
 
 config ARCH_DEFCONFIG
string
diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index 7d26d9cd2f..4aae270a78 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -620,10 +620,18 @@ void write_32bit_pse_identmap(uint32_t *l2);
 /*
  * x86 maps part of physical memory via the directmap region.
  * Return whether the range of MFN falls in the directmap region.
+ *
+ * When boot command line sets directmap=no, we will not have a direct map at
+ * all so this will always return false.
  */
 static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 {
-unsigned long eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
+unsigned long eva;
+
+if ( !has_directmap() )
+return false;
+
+eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
 
 return (mfn + nr) <= (virt_to_mfn(eva - 1) + 1);
 }
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4d0c90b7a0..b813ea75b5 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1512,6 +1512,8 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 if ( highmem_start )
 xenheap_max_mfn(PFN_DOWN(highmem_start - 1));
 
+printk("Booting with directmap %s\n", has_directmap() ? "on" : "off");
+
 /*
  * Walk every RAM region and map it in its entirety (on x86/64, at least)
  * and notify it to the boot allocator.
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 310ad4229c..9a24c89ac5 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -83,6 +83,23 @@ config HAS_UBSAN
 config MEM_ACCESS_ALWAYS_ON
bool
 
+config HAS_SECRET_HIDING
+   bool
+
+config SECRET_HIDING
+bool "Secret hiding"
+depends on HAS_SECRET_HIDING
+---help---
+The directmap contains mapping for most of the RAM which makes domain
+memory easily accessible. While making the performance better, it also 
makes
+the hypervisor more vulnerable to speculation attacks.
+
+Enabling this feature will allow the user to decide whether the memory
+is always mapped at boot or mapped only on demand (see the command line
+option "directmap").
+
+If unsure, say N.
+
 config MEM_ACCESS
def_bool MEM_ACCESS_ALWAYS_ON
prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 740b6f0ff7..a3746cfbcf 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -173,6 +173,11 @@ paddr_t __ro_after_init mem_hotplug;
 static char __initdata opt_badpage[100] = "";
 string_param("badpage", opt_badpage);
 
+bool __ro_after_init opt_directmap = true;
+#ifdef CONFIG_HAS_SECRET_HIDING
+boolean_param("directmap", opt_directmap);
+#endif
+
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3d9b2d05a5..f860e98ee4 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -165,6 +165,13 @@ extern unsigned long max_page;
 extern unsigned long total_pages;
 extern paddr_t mem_hotplug;
 
+extern bool opt_directmap;
+
+static inline bool has_directmap(void)
+{
+return opt_directmap;
+}
+
 /*
  * Extra fault info types which are used to further describe
  * the source of an 

[PATCH v2 (resend) 11/27] x86: Lift mapcache variable to the arch level

2024-01-16 Thread Elias El Yandouzi
From: Wei Liu 

It is going to be needed by HVM and idle domain as well, because without
the direct map, both need a mapcache to map pages.

This only lifts the mapcache variable up. Whether we populate the
mapcache for a domain is unchanged in this patch.

Signed-off-by: Wei Liu 
Signed-off-by: Wei Wang 
Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8a31d18f69..8ef3f7746f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -843,6 +843,8 @@ int arch_domain_create(struct domain *d,
 
 psr_domain_init(d);
 
+mapcache_domain_init(d);
+
 if ( is_hvm_domain(d) )
 {
 if ( (rc = hvm_domain_initialise(d, config)) != 0 )
@@ -850,8 +852,6 @@ int arch_domain_create(struct domain *d,
 }
 else if ( is_pv_domain(d) )
 {
-mapcache_domain_init(d);
-
 if ( (rc = pv_domain_initialise(d)) != 0 )
 goto fail;
 }
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index eac5e3304f..55e337aaf7 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -82,11 +82,11 @@ void *map_domain_page(mfn_t mfn)
 #endif
 
 v = mapcache_current_vcpu();
-if ( !v || !is_pv_vcpu(v) )
+if ( !v )
 return mfn_to_virt(mfn_x(mfn));
 
-dcache = >domain->arch.pv.mapcache;
-vcache = >arch.pv.mapcache;
+dcache = >domain->arch.mapcache;
+vcache = >arch.mapcache;
 if ( !dcache->inuse )
 return mfn_to_virt(mfn_x(mfn));
 
@@ -187,14 +187,14 @@ void unmap_domain_page(const void *ptr)
 ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
 v = mapcache_current_vcpu();
-ASSERT(v && is_pv_vcpu(v));
+ASSERT(v);
 
-dcache = >domain->arch.pv.mapcache;
+dcache = >domain->arch.mapcache;
 ASSERT(dcache->inuse);
 
 idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
 mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx));
-hashent = >arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)];
+hashent = >arch.mapcache.hash[MAPHASH_HASHFN(mfn)];
 
 local_irq_save(flags);
 
@@ -233,11 +233,9 @@ void unmap_domain_page(const void *ptr)
 
 int mapcache_domain_init(struct domain *d)
 {
-struct mapcache_domain *dcache = >arch.pv.mapcache;
+struct mapcache_domain *dcache = >arch.mapcache;
 unsigned int bitmap_pages;
 
-ASSERT(is_pv_domain(d));
-
 #ifdef NDEBUG
 if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
 return 0;
@@ -261,12 +259,12 @@ int mapcache_domain_init(struct domain *d)
 int mapcache_vcpu_init(struct vcpu *v)
 {
 struct domain *d = v->domain;
-struct mapcache_domain *dcache = >arch.pv.mapcache;
+struct mapcache_domain *dcache = >arch.mapcache;
 unsigned long i;
 unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
 unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
 
-if ( !is_pv_vcpu(v) || !dcache->inuse )
+if ( !dcache->inuse )
 return 0;
 
 if ( ents > dcache->entries )
@@ -293,7 +291,7 @@ int mapcache_vcpu_init(struct vcpu *v)
 BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES);
 for ( i = 0; i < MAPHASH_ENTRIES; i++ )
 {
-struct vcpu_maphash_entry *hashent = >arch.pv.mapcache.hash[i];
+struct vcpu_maphash_entry *hashent = >arch.mapcache.hash[i];
 
 hashent->mfn = ~0UL; /* never valid to map */
 hashent->idx = MAPHASHENT_NOTINUSE;
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index 4d97c68028..85b890b2cb 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -286,9 +286,6 @@ struct pv_domain
 /* Mitigate L1TF with shadow/crashing? */
 bool check_l1tf;
 
-/* map_domain_page() mapping cache. */
-struct mapcache_domain mapcache;
-
 struct cpuidmasks *cpuidmasks;
 };
 
@@ -327,6 +324,9 @@ struct arch_domain
 
 uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
+/* map_domain_page() mapping cache. */
+struct mapcache_domain mapcache;
+
 union {
 struct pv_domain pv;
 struct hvm_domain hvm;
@@ -517,9 +517,6 @@ struct arch_domain
 
 struct pv_vcpu
 {
-/* map_domain_page() mapping cache. */
-struct mapcache_vcpu mapcache;
-
 unsigned int vgc_flags;
 
 struct trap_info *trap_ctxt;
@@ -619,6 +616,9 @@ struct arch_vcpu
 #define async_exception_state(t) async_exception_state[(t)-1]
 uint8_t async_exception_mask;
 
+/* map_domain_page() mapping cache. */
+struct mapcache_vcpu mapcache;
+
 /* Virtual Machine Extensions */
 union {
 struct pv_vcpu pv;
-- 
2.40.1




[PATCH v2 (resend) 12/27] x86/mapcache: Initialise the mapcache for the idle domain

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

In order to use the mapcache in the idle domain, we also have to
populate its page tables in the PERDOMAIN region, and we need to move
mapcache_domain_init() earlier in arch_domain_create().

Note, commit 'x86: lift mapcache variable to the arch level' has
initialised the mapcache for HVM domains. With this patch, PV, HVM,
idle domains now all initialise the mapcache.

Signed-off-by: Wei Wang 
Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
  * Free resources if mapcache initialisation fails
  * Remove `is_idle_domain()` check from `create_perdomain_mappings()`

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8ef3f7746f..d4c125bc14 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -750,9 +750,16 @@ int arch_domain_create(struct domain *d,
 
 spin_lock_init(>arch.e820_lock);
 
+if ( (rc = mapcache_domain_init(d)) != 0)
+{
+free_perdomain_mappings(d);
+return rc;
+}
+
 /* Minimal initialisation for the idle domain. */
 if ( unlikely(is_idle_domain(d)) )
 {
+struct page_info *pg = d->arch.perdomain_l3_pg;
 static const struct arch_csw idle_csw = {
 .from = paravirt_ctxt_switch_from,
 .to   = paravirt_ctxt_switch_to,
@@ -763,6 +770,9 @@ int arch_domain_create(struct domain *d,
 
 d->arch.cpu_policy = ZERO_BLOCK_PTR; /* Catch stray misuses. */
 
+idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
+l4e_from_page(pg, __PAGE_HYPERVISOR_RW);
+
 return 0;
 }
 
@@ -843,8 +853,6 @@ int arch_domain_create(struct domain *d,
 
 psr_domain_init(d);
 
-mapcache_domain_init(d);
-
 if ( is_hvm_domain(d) )
 {
 if ( (rc = hvm_domain_initialise(d, config)) != 0 )
-- 
2.40.1




[PATCH v2 (resend) 14/27] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

At the moment the fixmap slots are prefixed differently between arm and
x86.

Some of them (e.g. the PMAP slots) are used in common code. So it would
be better if they are named the same way to avoid having to create
aliases.

I have decided to use the x86 naming because they are less change. So
all the Arm fixmap slots will now be prefixed with FIX rather than
FIXMAP.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 

Reviewed-by: Henry Wang 
Reviewed-by: Jan Beulich 
Reviewed-by: Stefano Stabellini 



Note that potentially more renaming that could be done to share
more code in future. I have decided to not do that to avoid going
down a rabbit hole.

diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
index 41d521f720..736cf09eca 100644
--- a/xen/arch/arm/acpi/lib.c
+++ b/xen/arch/arm/acpi/lib.c
@@ -40,10 +40,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 return NULL;
 
 offset = phys & (PAGE_SIZE - 1);
-base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN) + offset;
+base = FIXMAP_ADDR(FIX_ACPI_BEGIN) + offset;
 
 /* Check the fixmap is big enough to map the region */
-if ( (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - base) < size )
+if ( (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - base) < size )
 return NULL;
 
 /* With the fixmap, we can only map one region at the time */
@@ -54,7 +54,7 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 
 size += offset;
 mfn = maddr_to_mfn(phys);
-idx = FIXMAP_ACPI_BEGIN;
+idx = FIX_ACPI_BEGIN;
 
 do {
 set_fixmap(idx, mfn, PAGE_HYPERVISOR);
@@ -72,8 +72,8 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
 unsigned int idx;
 
 /* We are only handling fixmap address in the arch code */
-if ( (vaddr < FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) ||
- (vaddr >= (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)) )
+if ( (vaddr < FIXMAP_ADDR(FIX_ACPI_BEGIN)) ||
+ (vaddr >= (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE)) )
 return false;
 
 /*
@@ -81,16 +81,16 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
  * for the ACPI fixmap region. The caller is expected to free with
  * the same address.
  */
-ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIXMAP_ACPI_BEGIN));
+ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIX_ACPI_BEGIN));
 
 /* The region allocated fit in the ACPI fixmap region. */
-ASSERT(size < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - vaddr));
+ASSERT(size < (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - vaddr));
 ASSERT(fixmap_inuse);
 
 fixmap_inuse = false;
 
-size += vaddr - FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
-idx = FIXMAP_ACPI_BEGIN;
+size += vaddr - FIXMAP_ADDR(FIX_ACPI_BEGIN);
+idx = FIX_ACPI_BEGIN;
 
 do
 {
diff --git a/xen/arch/arm/include/asm/early_printk.h 
b/xen/arch/arm/include/asm/early_printk.h
index c1e84f8b00..f444e89a86 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -17,7 +17,7 @@
 
 /* need to add the uart address offset in page to the fixmap address */
 #define EARLY_UART_VIRTUAL_ADDRESS \
-(FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
~PAGE_MASK))
+(FIXMAP_ADDR(FIX_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
 
 #define TEMPORARY_EARLY_UART_VIRTUAL_ADDRESS \
 (TEMPORARY_FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
~PAGE_MASK))
diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index 734eb9b1d4..a823456ecb 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -8,17 +8,17 @@
 #include 
 
 /* Fixmap slots */
-#define FIXMAP_CONSOLE  0  /* The primary UART */
-#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
-#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
-#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* 
End mappings of ACPI tables */
-#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
-#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP 
*/
+#define FIX_CONSOLE  0  /* The primary UART */
+#define FIX_MISC 1  /* Ephemeral mappings of hardware */
+#define FIX_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
+#define FIX_ACPI_END(FIX_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End 
mappings of ACPI tables */
+#define FIX_PMAP_BEGIN (FIX_ACPI_END + 1) /* Start of PMAP */
+#define FIX_PMAP_END (FIX_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
 
-#define FIXMAP_LAST FIXMAP_PMAP_END
+#define FIX_LAST FIX_PMAP_END
 
 #define FIXADDR_START FIXMAP_ADDR(0)
-#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST)
+#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index 72725840b6..57f1b46499 100644
--- a/xen/arch/arm/mmu/setup.c
+++ 

[PATCH v2 (resend) 04/27] acpi: vmap pages in acpi_os_alloc_memory

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Also, introduce a wrapper around vmap that maps a contiguous range for
boot allocations. Unfortunately, the new helper cannot be a static inline
because the dependencies are a mess. We would need to re-include
asm/page.h (was removed in aa4b9d1ee653 "include: don't use asm/page.h
from common headers") and it doesn't look to be enough anymore
because bits from asm/cpufeature.h is used in the definition of PAGE_NX.

Lastly, with the move to vmap(), it is now easier to find the size
of the mapping. So pass the whole area to init_boot_pages() rather than
just the first page.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* Rename vmap_contig_pages() to vmap_contig()
* Rename nr_pages to nr to be consistent with vmap() parameters
* Pass the whole region to init_boot_pages()

Changes since Hongyan's version:
* Rename vmap_boot_pages() to vmap_contig_pages()
* Move the new helper in vmap.c to avoid compilation issue
* Don't use __pa() to translate the virtual address

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 171271fae3..966a7e763f 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -245,6 +245,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
 return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
+void *vmap_contig(mfn_t mfn, unsigned int nr)
+{
+return __vmap(, nr, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
+}
+
 unsigned int vmap_size(const void *va)
 {
 unsigned int pages = vm_size(va, VMAP_DEFAULT);
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 389505f786..ab80d6b2a9 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
void *ptr;
 
if (system_state == SYS_STATE_early_boot)
-   return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
+   {
+   mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
+
+   return vmap_contig(mfn, PFN_UP(sz));
+   }
 
ptr = xmalloc_bytes(sz);
ASSERT(!ptr || is_xmalloc_memory(ptr));
@@ -246,5 +250,11 @@ void __init acpi_os_free_memory(void *ptr)
if (is_xmalloc_memory(ptr))
xfree(ptr);
else if (ptr && system_state == SYS_STATE_early_boot)
-   init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
+   {
+   paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
+   unsigned int nr = vmap_size(ptr);
+
+   vunmap(ptr);
+   init_boot_pages(addr, addr + nr * PAGE_SIZE);
+   }
 }
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 24c85de490..0c16baa85f 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -15,6 +15,7 @@ void vm_init_type(enum vmap_region type, void *start, void 
*end);
 void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
  unsigned int align, unsigned int flags, enum vmap_region type);
 void *vmap(const mfn_t *mfn, unsigned int nr);
+void *vmap_contig(mfn_t mfn, unsigned int nr);
 void vunmap(const void *va);
 
 void *vmalloc(size_t size);
-- 
2.40.1




[PATCH v2 (resend) 06/27] x86/srat: vmap the pages for acpi_slit

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

This avoids the assumption that boot pages are in the direct map.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



There was a discussion with Jan regarding early failure vs
disable NUMA. I am strongly in favor of the latter because
it is more obvious that something went wrong.

From my understanding, Jan seems to be in favor of turning off NUMA
and then continue to boot. But then implied that a panic() would be
fine.

So I went with the panic() version. I am happy to rework it to another
approach if there is a consensus.

Changes in v2:
* vmap_contig_pages() was renamed to vmap_contig()
* Use a panic() rather than BUG_ON()

Changes since Hongyan's version:
* vmap_boot_pages() was renamed to vmap_contig_pages()

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 3f70338e6e..688f410287 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -135,7 +135,9 @@ void __init acpi_numa_slit_init(struct acpi_table_slit 
*slit)
return;
}
mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
-   acpi_slit = mfn_to_virt(mfn_x(mfn));
+   acpi_slit = vmap_contig(mfn, PFN_UP(slit->header.length));
+   if ( !acpi_slit )
+   panic("Unable to map the ACPI SLIT. Retry with numa=off");
memcpy(acpi_slit, slit, slit->header.length);
 }
 
-- 
2.40.1




[PATCH v2 (resend) 09/27] x86/pv: Rewrite how building PV dom0 handles domheap mappings

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. Use the pages as they should be.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
* Clarify the commit message
* Break the patch in two parts

Changes since Hongyan's version:
* Rebase
* Remove spurious newline

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5659814e0c..dc5e9fe117 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -382,6 +382,10 @@ int __init dom0_construct_pv(struct domain *d,
 l3_pgentry_t *l3tab = NULL, *l3start = NULL;
 l2_pgentry_t *l2tab = NULL, *l2start = NULL;
 l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+mfn_t l4start_mfn = INVALID_MFN;
+mfn_t l3start_mfn = INVALID_MFN;
+mfn_t l2start_mfn = INVALID_MFN;
+mfn_t l1start_mfn = INVALID_MFN;
 
 /*
  * This fully describes the memory layout of the initial domain. All
@@ -708,22 +712,32 @@ int __init dom0_construct_pv(struct domain *d,
 v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
 }
 
+#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
+do {\
+unmap_domain_page(virt_var);\
+mfn_var = maddr_to_mfn(maddr);  \
+maddr += PAGE_SIZE; \
+virt_var = map_domain_page(mfn_var);\
+} while ( false )
+
 if ( !compat )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
-l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc);
+l4tab = l4start;
 clear_page(l4tab);
-init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
-  d, INVALID_MFN, true);
-v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
+init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true);
+v->arch.guest_table = pagetable_from_mfn(l4start_mfn);
 }
 else
 {
 /* Monitor table already created by switch_compat(). */
-l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
+l4start_mfn = pagetable_get_mfn(v->arch.guest_table);
+l4start = l4tab = map_domain_page(l4start_mfn);
 /* See public/xen.h on why the following is needed. */
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
 l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
 }
 
 l4tab += l4_table_offset(v_start);
@@ -733,14 +747,16 @@ int __init dom0_construct_pv(struct domain *d,
 if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table;
-l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc);
+l1tab = l1start;
 clear_page(l1tab);
 if ( count == 0 )
 l1tab += l1_table_offset(v_start);
 if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = 
PGT_l2_page_table;
-l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+l2tab = l2start;
 clear_page(l2tab);
 if ( count == 0 )
 l2tab += l2_table_offset(v_start);
@@ -750,19 +766,19 @@ int __init dom0_construct_pv(struct domain *d,
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info =
 PGT_l3_page_table;
-l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
 }
 l3tab = l3start;
 clear_page(l3tab);
 if ( count == 0 )
 l3tab += l3_table_offset(v_start);
-*l4tab = l4e_from_paddr(__pa(l3start), L4_PROT);
+*l4tab = l4e_from_mfn(l3start_mfn, L4_PROT);
 l4tab++;
 }
-*l3tab = l3e_from_paddr(__pa(l2start), L3_PROT);
+*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
 l3tab++;
 }
-*l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
+*l2tab = l2e_from_mfn(l1start_mfn, L2_PROT);
 l2tab++;
 }
 if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
@@ -781,30 +797,34 @@ int __init dom0_construct_pv(struct domain *d,
 
 if ( compat )
 {
-

[PATCH v2 (resend) 05/27] xen/numa: vmap the pages for memnodemap

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

This avoids the assumption that there is a direct map and boot pages
fall inside the direct map.

Clean up the variables so that mfn actually stores a type-safe mfn.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



See the discussion in the next patch about using panic().

Changes in v2:
* vmap_contig_pages() was renamed to vmap_contig()
* Replace the BUG_ON() with a panic()

Changes compare to Hongyan's version:
* The function modified was moved to common code. So rebase it
* vmap_boot_pages() was renamed to vmap_contig_pages()

diff --git a/xen/common/numa.c b/xen/common/numa.c
index f454c4d894..ef13ec2255 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -424,13 +424,14 @@ static int __init populate_memnodemap(const struct node 
*nodes,
 static int __init allocate_cachealigned_memnodemap(void)
 {
 unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
-unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
+mfn_t mfn = alloc_boot_pages(size, 1);
 
-memnodemap = mfn_to_virt(mfn);
-mfn <<= PAGE_SHIFT;
+memnodemap = vmap_contig(mfn, size);
+if ( !memnodemap )
+panic("Unable to map the ACPI SLIT. Retry with numa=off");
 size <<= PAGE_SHIFT;
 printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
-   mfn, mfn + size);
+   mfn_to_maddr(mfn), mfn_to_maddr(mfn) + size);
 memnodemapsize = size / sizeof(*memnodemap);
 
 return 0;
-- 
2.40.1




[PATCH v2 (resend) 15/27] xen/x86: Add support for the PMAP

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

PMAP will be used in a follow-up patch to bootstrap map domain
page infrastructure -- we need some way to map pages to setup the
mapcache without a direct map.

The functions pmap_{map, unmap} open code {set, clear}_fixmap to break
the loop.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



The PMAP infrastructure was upstream separately for Arm since
Hongyan sent the secret-free hypervisor series. So this is a new
patch to plumb the feature on x86.

Changes in v2:
* Declare PMAP entries earlier in fixed_addresses
* Reword the commit message

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 350f41b832..16b2a32469 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -25,6 +25,7 @@ config X86
select HAS_PASSTHROUGH
select HAS_PCI
select HAS_PCI_MSI
+   select HAS_PMAP
select HAS_SCHED_GRANULARITY
select HAS_UBSAN
select HAS_VPCI if HVM
diff --git a/xen/arch/x86/include/asm/fixmap.h 
b/xen/arch/x86/include/asm/fixmap.h
index 516ec3fa6c..a7ac365fc6 100644
--- a/xen/arch/x86/include/asm/fixmap.h
+++ b/xen/arch/x86/include/asm/fixmap.h
@@ -21,6 +21,8 @@
 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -53,6 +55,8 @@ enum fixed_addresses {
 FIX_PV_CONSOLE,
 FIX_XEN_SHARED_INFO,
 #endif /* CONFIG_XEN_GUEST */
+FIX_PMAP_BEGIN,
+FIX_PMAP_END = FIX_PMAP_BEGIN + NUM_FIX_PMAP,
 /* Everything else should go further down. */
 FIX_APIC_BASE,
 FIX_IO_APIC_BASE_0,
diff --git a/xen/arch/x86/include/asm/pmap.h b/xen/arch/x86/include/asm/pmap.h
new file mode 100644
index 00..62746e191d
--- /dev/null
+++ b/xen/arch/x86/include/asm/pmap.h
@@ -0,0 +1,25 @@
+#ifndef __ASM_PMAP_H__
+#define __ASM_PMAP_H__
+
+#include 
+
+static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
+{
+unsigned long linear = (unsigned long)fix_to_virt(slot);
+l1_pgentry_t *pl1e = _fixmap[l1_table_offset(linear)];
+
+ASSERT(!(l1e_get_flags(*pl1e) & _PAGE_PRESENT));
+
+l1e_write_atomic(pl1e, l1e_from_mfn(mfn, PAGE_HYPERVISOR));
+}
+
+static inline void arch_pmap_unmap(unsigned int slot)
+{
+unsigned long linear = (unsigned long)fix_to_virt(slot);
+l1_pgentry_t *pl1e = _fixmap[l1_table_offset(linear)];
+
+l1e_write_atomic(pl1e, l1e_empty());
+flush_tlb_one_local(linear);
+}
+
+#endif /* __ASM_PMAP_H__ */
-- 
2.40.1




[PATCH v2 (resend) 08/27] x86/pv: Domheap pages should be mapped while relocating initrd

2024-01-16 Thread Elias El Yandouzi
From: Wei Liu 

Xen shouldn't use domheap page as if they were xenheap pages. Map and
unmap pages accordingly.

Signed-off-by: Wei Liu 
Signed-off-by: Wei Wang 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
* Get rid of mfn_to_virt
* Don't open code copy_domain_page()

Changes since Hongyan's version:
* Add missing newline after the variable declaration

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5bbed3a36a..5659814e0c 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -615,18 +615,25 @@ int __init dom0_construct_pv(struct domain *d,
 if ( d->arch.physaddr_bitsize &&
  ((mfn + count - 1) >> (d->arch.physaddr_bitsize - PAGE_SHIFT)) )
 {
+unsigned long nr_pages;
+
 order = get_order_from_pages(count);
 page = alloc_domheap_pages(d, order, MEMF_no_scrub);
 if ( !page )
 panic("Not enough RAM for domain 0 initrd\n");
+
+nr_pages = 1UL << order;
 for ( count = -count; order--; )
 if ( count & (1UL << order) )
 {
 free_domheap_pages(page, order);
 page += 1UL << order;
+nr_pages -= 1UL << order;
 }
-memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start),
-   initrd_len);
+
+for ( i = 0; i < nr_pages; i++ )
+copy_domain_page(page_to_mfn(page + i), _mfn(initrd_mfn + i));
+
 mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT;
 init_domheap_pages(mpt_alloc,
mpt_alloc + PAGE_ALIGN(initrd_len));
-- 
2.40.1




[PATCH v2 (resend) 00/27] Remove the directmap

2024-01-16 Thread Elias El Yandouzi
Hi all,

A few years ago, Wei Liu implemented a PoC to remove the directmap
from Xen. The last version was sent by Hongyan Xia [1].

I will start with thanking both Wei and Hongyan for the initial work
to upstream the feature. A lot of patches already went in and this is
the last few patches missing to effectively enable the feature.

=== What is the directmap? ===

At the moment, on both arm64 and x86, most of the RAM is mapped
in Xen address space. This means that domain memory is easily
accessible in Xen.

=== Why do we want to remove the directmap? ===

(Summarizing my understanding of the previous discussion)

Speculation attacks (like Spectre SP1) rely on loading piece of memory
in the cache. If the region is not mapped then it can't be loaded.

So removing reducing the amount of memory mapped in Xen will also
reduce the surface attack.

=== What's the performance impact? ===

As the guest memory is not always mapped, then the cost of mapping
will increase. I haven't done the numbers with this new version, but
some measurement were provided in the previous version for x86.

=== Improvement possible ===

The known area to improve on x86 are:
   * Mapcache: There was a patch sent by Hongyan:
 
https://lore.kernel.org/xen-devel/4058e92ce21627731c49b588a95809dc0affd83a.1581015491.git.hongy...@amazon.com/
   * EPT: At the moment an guest page-tabel walk requires about 20 map/unmap.
 This will have an very high impact on the performance. We need to decide
 whether keep the EPT always mapped is a problem

The original series didn't have support for Arm64. But as there were
some interest, I have provided a PoC.

There are more extra work for Arm64:
   * The mapcache is quite simple. We would investigate the performance
   * The mapcache should be made compliant to the Arm Arm (this is now
 more critical).
   * We will likely have the same problem as for the EPT.
   * We have no support for merging table to a superpage, neither
 free empty page-tables. (See more below)

=== Implementation ===

The subject is probably a misnomer. The directmap is still present but
the RAM is not mapped by default. Instead, the region will still be used
to map pages allocate via alloc_xenheap_pages().

The advantage is the solution is simple (so IHMO good enough for been
merged as a tech preview). The disadvantage is the page allocator is not
trying to keep all the xenheap pages together. So we may end up to have
an increase of page table usage.

In the longer term, we should consider to remove the direct map
completely and switch to vmap(). The main problem with this approach
is it is frequent to use mfn_to_virt() in the code. So we would need
to cache the mapping (maybe in the struct page_info).

=== Why arm32 is not covered? ===

On Arm32, the domheap and xenheap is always separated. So by design
the guest memory is not mapped by default.

At this stage, it seems unnecessary to have to map/unmap xenheap pages
every time they are allocated.

=== Why not using a separate domheap and xenheap? ===

While a separate xenheap/domheap reduce the page-table usage (all
xenheap pages are contiguous and could be always mapped), it is also
currently less scalable because the split is fixed at boot time (XXX:
Can this be dynamic?).

=== Future of secret-free hypervisor ===

There are some information in an e-mail from Andrew a few years ago:

https://lore.kernel.org/xen-devel/e3219697-0759-39fc-2486-715cdec1c...@citrix.com/

Cheers,

[1] https://lore.kernel.org/xen-devel/cover.1588278317.git.hongy...@amazon.com/

*** BLURB HERE ***

Elias El Yandouzi (3):
  xen/x86: Add build assertion for fixmap entries
  Rename mfn_to_virt() calls
  Rename maddr_to_virt() calls

Hongyan Xia (13):
  acpi: vmap pages in acpi_os_alloc_memory
  xen/numa: vmap the pages for memnodemap
  x86/srat: vmap the pages for acpi_slit
  x86: Map/unmap pages in restore_all_guests
  x86/pv: Rewrite how building PV dom0 handles domheap mappings
  x86/pv: Map L4 page table for shim domain
  x86/mapcache: Initialise the mapcache for the idle domain
  x86: Add a boot option to enable and disable the direct map
  x86/domain_page: Remove the fast paths when mfn is not in the
directmap
  xen/page_alloc: Add a path for xenheap when there is no direct map
  x86/setup: Leave early boot slightly earlier
  x86/setup: vmap heap nodes when they are outside the direct map
  x86/setup: Do not create valid mappings when directmap=no

Julien Grall (8):
  xen/vmap: Check the page has been mapped in vm_init_type()
  xen/vmap: Introduce vmap_size() and use it
  xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  xen/x86: Add support for the PMAP
  xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
  xen/arm64: mm: Use per-pCPU page-tables
  xen/arm64: Implement a mapcache for arm64
  xen/arm64: Allow the admin to enable/disable the directmap

Wei Liu (3):
  x86/setup: Move vm_init() before acpi calls
  x86/pv: Domheap pages 

[PATCH v2 (resend) 07/27] x86: Map/unmap pages in restore_all_guests

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Before, it assumed the pv cr3 could be accessed via a direct map. This
is no longer true.

Note that we do not map and unmap root_pgt for now since it is still a
xenheap page.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
* Rework the shadow perdomain mapping solution in the follow-up patches

Changes since Hongyan's version:
* Remove the final dot in the commit title

diff --git a/xen/arch/x86/include/asm/config.h 
b/xen/arch/x86/include/asm/config.h
index bbced338be..7cf1f33dc0 100644
--- a/xen/arch/x86/include/asm/config.h
+++ b/xen/arch/x86/include/asm/config.h
@@ -202,7 +202,7 @@ extern unsigned char boot_edid_info[128];
 /* Slot 260: per-domain mappings (including map cache). */
 #define PERDOMAIN_VIRT_START(PML4_ADDR(260))
 #define PERDOMAIN_SLOT_MBYTES   (PML4_ENTRY_BYTES >> (20 + PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS 3
+#define PERDOMAIN_SLOTS 4
 #define PERDOMAIN_VIRT_SLOT(s)  (PERDOMAIN_VIRT_START + (s) * \
  (PERDOMAIN_SLOT_MBYTES << 20))
 /* Slot 4: mirror of per-domain mappings (for compat xlat area accesses). */
@@ -316,6 +316,16 @@ extern unsigned long xen_phys_start;
 #define ARG_XLAT_START(v)\
 (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
 
+/* root_pt shadow mapping area. The fourth per-domain-mapping sub-area */
+#define SHADOW_ROOT_PT_VIRT_START   PERDOMAIN_VIRT_SLOT(3)
+#define SHADOW_ROOT_PT_ENTRIES  MAX_VIRT_CPUS
+#define SHADOW_ROOT_PT_VIRT_END (SHADOW_ROOT_PT_VIRT_START +\
+ (SHADOW_ROOT_PT_ENTRIES * PAGE_SIZE))
+
+/* The address of a particular VCPU's ROOT_PT */
+#define SHADOW_ROOT_PT_VCPU_VIRT_START(v) \
+(SHADOW_ROOT_PT_VIRT_START + ((v)->vcpu_id * PAGE_SIZE))
+
 #define ELFSIZE 64
 
 #define ARCH_CRASH_SAVE_VMCOREINFO
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index 622d22bef2..4d97c68028 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -273,6 +273,7 @@ struct time_scale {
 struct pv_domain
 {
 l1_pgentry_t **gdt_ldt_l1tab;
+l1_pgentry_t **shadow_root_pt_l1tab;
 
 atomic_t nr_l4_pages;
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b56e0d8065..a72c32d87c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -505,6 +505,13 @@ void share_xen_page_with_guest(struct page_info *page, 
struct domain *d,
 spin_unlock(>page_alloc_lock);
 }
 
+#define shadow_root_pt_idx(v) \
+((v)->vcpu_id >> PAGETABLE_ORDER)
+
+#define pv_shadow_root_pt_pte(v) \
+((v)->domain->arch.pv.shadow_root_pt_l1tab[shadow_root_pt_idx(v)] + \
+ ((v)->vcpu_id & (L1_PAGETABLE_ENTRIES - 1)))
+
 void make_cr3(struct vcpu *v, mfn_t mfn)
 {
 struct domain *d = v->domain;
@@ -524,6 +531,13 @@ void write_ptbase(struct vcpu *v)
 
 if ( is_pv_vcpu(v) && v->domain->arch.pv.xpti )
 {
+mfn_t guest_root_pt = _mfn(v->arch.cr3 >> PAGE_SHIFT);
+l1_pgentry_t *pte = pv_shadow_root_pt_pte(v);
+
+ASSERT(v == current);
+
+l1e_write(pte, l1e_from_mfn(guest_root_pt, __PAGE_HYPERVISOR_RW));
+
 cpu_info->root_pgt_changed = true;
 cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
 if ( new_cr4 & X86_CR4_PCIDE )
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 2a445bb17b..fef9ae2352 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -288,6 +288,19 @@ static void pv_destroy_gdt_ldt_l1tab(struct vcpu *v)
   1U << GDT_LDT_VCPU_SHIFT);
 }
 
+static int pv_create_shadow_root_pt_l1tab(struct vcpu *v)
+{
+return create_perdomain_mapping(v->domain, 
SHADOW_ROOT_PT_VCPU_VIRT_START(v),
+1, v->domain->arch.pv.shadow_root_pt_l1tab,
+NULL);
+}
+
+static void pv_destroy_shadow_root_pt_l1tab(struct vcpu *v)
+
+{
+destroy_perdomain_mapping(v->domain, SHADOW_ROOT_PT_VCPU_VIRT_START(v), 1);
+}
+
 void pv_vcpu_destroy(struct vcpu *v)
 {
 if ( is_pv_32bit_vcpu(v) )
@@ -297,6 +310,7 @@ void pv_vcpu_destroy(struct vcpu *v)
 }
 
 pv_destroy_gdt_ldt_l1tab(v);
+pv_destroy_shadow_root_pt_l1tab(v);
 XFREE(v->arch.pv.trap_ctxt);
 }
 
@@ -311,6 +325,13 @@ int pv_vcpu_initialise(struct vcpu *v)
 if ( rc )
 return rc;
 
+if ( v->domain->arch.pv.xpti )
+{
+rc = pv_create_shadow_root_pt_l1tab(v);
+if ( rc )
+goto done;
+}
+
 BUILD_BUG_ON(X86_NR_VECTORS * sizeof(*v->arch.pv.trap_ctxt) >
  PAGE_SIZE);
 v->arch.pv.trap_ctxt = xzalloc_array(struct trap_info, X86_NR_VECTORS);
@@ -346,10 +367,12 @@ void pv_domain_destroy(struct domain *d)
 
 destroy_perdomain_mapping(d, GDT_LDT_VIRT_START,
   GDT_LDT_MBYTES << (20 - PAGE_SHIFT));
+destroy_perdomain_mapping(d, 

[PATCH v2 (resend) 03/27] xen/vmap: Introduce vmap_size() and use it

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

vunmap() and vfree() currently duplicate the (small) logic to find the
size of an vmap area. In a follow-up patch, we will want to introduce
another one (this time externally).

So introduce a new helper vmap_size() that will return the number of
pages in the area starting at the given address. Take the opportunity
to replace the open-coded version.

Note that vfree() was storing the type of the area in a local variable.
But this seems to have never been used (even when it was introduced).

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* Patch added

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index fc5c70da4d..171271fae3 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -245,14 +245,21 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
 return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
-void vunmap(const void *va)
+unsigned int vmap_size(const void *va)
 {
-unsigned long addr = (unsigned long)va;
 unsigned int pages = vm_size(va, VMAP_DEFAULT);
 
 if ( !pages )
 pages = vm_size(va, VMAP_XEN);
 
+return pages;
+}
+
+void vunmap(const void *va)
+{
+unsigned long addr = (unsigned long)va;
+unsigned pages = vmap_size(va);
+
 #ifndef _PAGE_NONE
 destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
 #else /* Avoid tearing down intermediate page tables. */
@@ -328,17 +335,11 @@ void vfree(void *va)
 unsigned int i, pages;
 struct page_info *pg;
 PAGE_LIST_HEAD(pg_list);
-enum vmap_region type = VMAP_DEFAULT;
 
 if ( !va )
 return;
 
-pages = vm_size(va, type);
-if ( !pages )
-{
-type = VMAP_XEN;
-pages = vm_size(va, type);
-}
+pages = vmap_size(va);
 ASSERT(pages);
 
 for ( i = 0; i < pages; i++ )
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 2b7369e062..24c85de490 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -25,6 +25,9 @@ void vfree(void *va);
 
 void __iomem *ioremap(paddr_t pa, size_t len);
 
+/* Return the number of pages in the mapping starting at address 'va' */
+unsigned int vmap_size(const void *va);
+
 static inline void iounmap(void __iomem *va)
 {
 unsigned long addr = (unsigned long)(void __force *)va;
-- 
2.40.1




[PATCH v2 (resend) 02/27] x86/setup: Move vm_init() before acpi calls

2024-01-16 Thread Elias El Yandouzi
From: Wei Liu 

After the direct map removal, pages from the boot allocator are not
going to be mapped in the direct map. Although we have map_domain_page,
they are ephemeral and are less helpful for mappings that are more than a
page, so we want a mechanism to globally map a range of pages, which is
what vmap is for. Therefore, we bring vm_init into early boot stage.

To allow vmap to be initialised and used in early boot, we need to
modify vmap to receive pages from the boot allocator during early boot
stage.

Signed-off-by: Wei Liu 
Signed-off-by: David Woodhouse 
Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
- The return of map_pages_to_xen() is now checked in a separate
  patch
- Clarify the commit message
- Group the new boolean with the others

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 59dd9bb25a..7e28f62d09 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -748,6 +748,8 @@ void asmlinkage __init start_xen(unsigned long 
boot_phys_offset,
 
 setup_mm();
 
+vm_init();
+
 /* Parse the ACPI tables for possible boot-time configuration */
 acpi_boot_table_init();
 
@@ -759,8 +761,6 @@ void asmlinkage __init start_xen(unsigned long 
boot_phys_offset,
  */
 system_state = SYS_STATE_boot;
 
-vm_init();
-
 if ( acpi_disabled )
 {
 printk("Booting using Device Tree\n");
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 897b7e9208..4d0c90b7a0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -989,6 +989,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 int i, j, e820_warn = 0, bytes = 0;
 unsigned long eb_start, eb_end;
 bool acpi_boot_table_init_done = false, relocated = false;
+bool vm_init_done = false;
 int ret;
 struct ns16550_defaults ns16550 = {
 .data_bits = 8,
@@ -1531,12 +1532,23 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 continue;
 
 if ( !acpi_boot_table_init_done &&
- s >= (1ULL << 32) &&
- !acpi_boot_table_init() )
+ s >= (1ULL << 32) )
 {
-acpi_boot_table_init_done = true;
-srat_parse_regions(s);
-setup_max_pdx(raw_max_page);
+/*
+ * We only initialise vmap and acpi after going through the bottom
+ * 4GiB, so that we have enough pages in the boot allocator.
+ */
+if ( !vm_init_done )
+{
+vm_init();
+vm_init_done = true;
+}
+if ( !acpi_boot_table_init() )
+{
+acpi_boot_table_init_done = true;
+srat_parse_regions(s);
+setup_max_pdx(raw_max_page);
+}
 }
 
 if ( pfn_to_pdx((e - 1) >> PAGE_SHIFT) >= max_pdx )
@@ -1722,6 +1734,9 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 
 init_frametable();
 
+if ( !vm_init_done )
+vm_init();
+
 if ( !acpi_boot_table_init_done )
 acpi_boot_table_init();
 
@@ -1761,12 +1776,6 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 end_boot_allocator();
 
 system_state = SYS_STATE_boot;
-/*
- * No calls involving ACPI code should go between the setting of
- * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
- * will break).
- */
-vm_init();
 
 bsp_stack = cpu_alloc_stack(0);
 if ( !bsp_stack )
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 830f64c5ef..fc5c70da4d 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -34,10 +34,19 @@ void __init vm_init_type(enum vmap_region type, void 
*start, void *end)
 
 for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += 
PAGE_SIZE )
 {
-struct page_info *pg = alloc_domheap_page(NULL, 0);
+mfn_t mfn;
 int rc;
 
-rc = map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+if ( system_state == SYS_STATE_early_boot )
+mfn = alloc_boot_pages(1, 1);
+else
+{
+struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+BUG_ON(!pg);
+mfn = page_to_mfn(pg);
+}
+rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
 BUG_ON(rc);
 
 clear_page((void *)va);
@@ -65,7 +74,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
 spin_lock(_lock);
 for ( ; ; )
 {
-struct page_info *pg;
+mfn_t mfn;
 
 ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
 for ( start = vm_low[t]; start < vm_top[t]; )
@@ -100,9 +109,16 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
 if ( vm_top[t] >= vm_end[t] )
 return NULL;
 
-pg = alloc_domheap_page(NULL, 0);
- 

[PATCH v2 (resend) 10/27] x86/pv: Map L4 page table for shim domain

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

The root page table is allocated from the domheap and isn't
mapped by default. Map it on demand to build pv shim domain.

Signed-off-by: Hongyan Xia 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* New patch

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc5e9fe117..fc51c7d362 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -991,8 +991,12 @@ do {\
  * !CONFIG_VIDEO case so the logic here can be simplified.
  */
 if ( pv_shim )
+{
+l4start = map_domain_page(l4start_mfn);
 pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
   vphysmap_start, si);
+UNMAP_DOMAIN_PAGE(l4start);
+}
 
 #ifdef CONFIG_COMPAT
 if ( compat )
-- 
2.40.1




[PATCH v2 (resend) 01/27] xen/vmap: Check the page has been mapped in vm_init_type()

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

The function map_pages_to_xen() could fail if it can't allocate the
underlying page tables or (at least on Arm) if the area was already
mapped.

The first error is caught by clear_page() because it would fault.
However, the second error while very unlikely is not caught at all.

As this is boot code, use BUG_ON() to check if map_pages_to_xen() has
succeeded.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
- New patch

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 330e2ba897..830f64c5ef 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -35,8 +35,11 @@ void __init vm_init_type(enum vmap_region type, void *start, 
void *end)
 for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += 
PAGE_SIZE )
 {
 struct page_info *pg = alloc_domheap_page(NULL, 0);
+int rc;
+
+rc = map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+BUG_ON(rc);
 
-map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
 clear_page((void *)va);
 }
 bitmap_fill(vm_bitmap(type), vm_low[type]);
-- 
2.40.1




Re: [PATCH v2] Remove the directmap

2024-01-16 Thread Elias El Yandouzi

Hi,

Newbie mistake, I didn't number the patches, I'll resend the series.

Sorry for the noise.

On 16/01/2024 18:50, Elias El Yandouzi wrote:

Hi all,

A few years ago, Wei Liu implemented a PoC to remove the directmap
from Xen. The last version was sent by Hongyan Xia [1].

I will start with thanking both Wei and Hongyan for the initial work
to upstream the feature. A lot of patches already went in and this is
the last few patches missing to effectively enable the feature.

=== What is the directmap? ===

At the moment, on both arm64 and x86, most of the RAM is mapped
in Xen address space. This means that domain memory is easily
accessible in Xen.

=== Why do we want to remove the directmap? ===

(Summarizing my understanding of the previous discussion)

Speculation attacks (like Spectre SP1) rely on loading piece of memory
in the cache. If the region is not mapped then it can't be loaded.

So removing reducing the amount of memory mapped in Xen will also
reduce the surface attack.

=== What's the performance impact? ===

As the guest memory is not always mapped, then the cost of mapping
will increase. I haven't done the numbers with this new version, but
some measurement were provided in the previous version for x86.

=== Improvement possible ===

The known area to improve on x86 are:
* Mapcache: There was a patch sent by Hongyan:
  
https://lore.kernel.org/xen-devel/4058e92ce21627731c49b588a95809dc0affd83a.1581015491.git.hongy...@amazon.com/
* EPT: At the moment an guest page-tabel walk requires about 20 map/unmap.
  This will have an very high impact on the performance. We need to decide
  whether keep the EPT always mapped is a problem

The original series didn't have support for Arm64. But as there were
some interest, I have provided a PoC.

There are more extra work for Arm64:
* The mapcache is quite simple. We would investigate the performance
* The mapcache should be made compliant to the Arm Arm (this is now
  more critical).
* We will likely have the same problem as for the EPT.
* We have no support for merging table to a superpage, neither
  free empty page-tables. (See more below)

=== Implementation ===

The subject is probably a misnomer. The directmap is still present but
the RAM is not mapped by default. Instead, the region will still be used
to map pages allocate via alloc_xenheap_pages().

The advantage is the solution is simple (so IHMO good enough for been
merged as a tech preview). The disadvantage is the page allocator is not
trying to keep all the xenheap pages together. So we may end up to have
an increase of page table usage.

In the longer term, we should consider to remove the direct map
completely and switch to vmap(). The main problem with this approach
is it is frequent to use mfn_to_virt() in the code. So we would need
to cache the mapping (maybe in the struct page_info).

=== Why arm32 is not covered? ===

On Arm32, the domheap and xenheap is always separated. So by design
the guest memory is not mapped by default.

At this stage, it seems unnecessary to have to map/unmap xenheap pages
every time they are allocated.

=== Why not using a separate domheap and xenheap? ===

While a separate xenheap/domheap reduce the page-table usage (all
xenheap pages are contiguous and could be always mapped), it is also
currently less scalable because the split is fixed at boot time (XXX:
Can this be dynamic?).

=== Future of secret-free hypervisor ===

There are some information in an e-mail from Andrew a few years ago:

https://lore.kernel.org/xen-devel/e3219697-0759-39fc-2486-715cdec1c...@citrix.com/

Cheers,

[1] https://lore.kernel.org/xen-devel/cover.1588278317.git.hongy...@amazon.com/

*** BLURB HERE ***

Elias El Yandouzi (3):
   xen/x86: Add build assertion for fixmap entries
   Rename mfn_to_virt() calls
   Rename maddr_to_virt() calls

Hongyan Xia (13):
   acpi: vmap pages in acpi_os_alloc_memory
   xen/numa: vmap the pages for memnodemap
   x86/srat: vmap the pages for acpi_slit
   x86: Map/unmap pages in restore_all_guests
   x86/pv: Rewrite how building PV dom0 handles domheap mappings
   x86/pv: Map L4 page table for shim domain
   x86/mapcache: Initialise the mapcache for the idle domain
   x86: Add a boot option to enable and disable the direct map
   x86/domain_page: Remove the fast paths when mfn is not in the
 directmap
   xen/page_alloc: Add a path for xenheap when there is no direct map
   x86/setup: Leave early boot slightly earlier
   x86/setup: vmap heap nodes when they are outside the direct map
   x86/setup: Do not create valid mappings when directmap=no

Julien Grall (8):
   xen/vmap: Check the page has been mapped in vm_init_type()
   xen/vmap: Introduce vmap_size() and use it
   xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
   xen/x86: Add support for the PMAP
   xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
   xen/arm64: mm: Use per-pCPU page-tables
   

[PATCH v2] x86/domain_page: Remove the fast paths when mfn is not in the directmap

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When mfn is not in direct map, never use mfn_to_virt for any mappings.

We replace mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) with
arch_mfns_in_direct_map(mfn, 1) because these two are equivalent. The
extra comparison in arch_mfns_in_direct_map() looks different but because
DIRECTMAP_VIRT_END is always higher, it does not make any difference.

Lastly, domain_page_map_to_mfn() needs to gain to a special case for
the PMAP.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 



Changes since Hongyan's version:
* arch_mfn_in_direct_map() was renamed to arch_mfns_in_directmap()
* add a special case for the PMAP in domain_page_map_to_mfn()

diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 55e337aaf7..89caefc8a2 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -14,8 +14,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 
 static DEFINE_PER_CPU(struct vcpu *, override);
@@ -35,10 +37,11 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 /*
  * When using efi runtime page tables, we have the equivalent of the idle
  * domain's page tables but current may point at another domain's VCPU.
- * Return NULL as though current is not properly set up yet.
+ * Return the idle domains's vcpu on that core because the efi per-domain
+ * region (where the mapcache is) is in-sync with the idle domain.
  */
 if ( efi_rs_using_pgtables() )
-return NULL;
+return idle_vcpu[smp_processor_id()];
 
 /*
  * If guest_table is NULL, and we are running a paravirtualised guest,
@@ -77,18 +80,24 @@ void *map_domain_page(mfn_t mfn)
 struct vcpu_maphash_entry *hashent;
 
 #ifdef NDEBUG
-if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
 return mfn_to_virt(mfn_x(mfn));
 #endif
 
 v = mapcache_current_vcpu();
-if ( !v )
-return mfn_to_virt(mfn_x(mfn));
+if ( !v || !v->domain->arch.mapcache.inuse )
+{
+if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
+return mfn_to_virt(mfn_x(mfn));
+else
+{
+BUG_ON(system_state >= SYS_STATE_smp_boot);
+return pmap_map(mfn);
+}
+}
 
 dcache = >domain->arch.mapcache;
 vcache = >arch.mapcache;
-if ( !dcache->inuse )
-return mfn_to_virt(mfn_x(mfn));
 
 perfc_incr(map_domain_page_count);
 
@@ -184,6 +193,12 @@ void unmap_domain_page(const void *ptr)
 if ( !va || va >= DIRECTMAP_VIRT_START )
 return;
 
+if ( va >= FIXADDR_START && va < FIXADDR_TOP )
+{
+pmap_unmap((void *)ptr);
+return;
+}
+
 ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
 v = mapcache_current_vcpu();
@@ -237,7 +252,7 @@ int mapcache_domain_init(struct domain *d)
 unsigned int bitmap_pages;
 
 #ifdef NDEBUG
-if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+if ( !mem_hotplug && arch_mfn_in_directmap(0, max_page) )
 return 0;
 #endif
 
@@ -308,7 +323,7 @@ void *map_domain_page_global(mfn_t mfn)
 local_irq_is_enabled()));
 
 #ifdef NDEBUG
-if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+if ( arch_mfn_in_directmap(mfn_x(mfn, 1)) )
 return mfn_to_virt(mfn_x(mfn));
 #endif
 
@@ -335,6 +350,23 @@ mfn_t domain_page_map_to_mfn(const void *ptr)
 if ( va >= DIRECTMAP_VIRT_START )
 return _mfn(virt_to_mfn(ptr));
 
+/*
+ * The fixmap is stealing the top-end of the VMAP. So the check for
+ * the PMAP *must* happen first.
+ *
+ * Also, the fixmap translate a slot to an address backwards. The
+ * logic will rely on it to avoid any complexity. So check at
+ * compile time this will always hold.
+*/
+BUILD_BUG_ON(fix_to_virt(FIX_PMAP_BEGIN) < fix_to_virt(FIX_PMAP_END));
+
+if ( ((unsigned long)fix_to_virt(FIX_PMAP_END) <= va) &&
+ ((va & PAGE_MASK) <= (unsigned long)fix_to_virt(FIX_PMAP_BEGIN)) )
+{
+BUG_ON(system_state >= SYS_STATE_smp_boot);
+return l1e_get_mfn(l1_fixmap[l1_table_offset(va)]);
+}
+
 if ( va >= VMAP_VIRT_START && va < VMAP_VIRT_END )
 return vmap_to_mfn(va);
 
-- 
2.40.1




[PATCH v2] x86/setup: Leave early boot slightly earlier

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When we do not have a direct map, memory for metadata of heap nodes in
init_node_heap() is allocated from xenheap, which needs to be mapped and
unmapped on demand. However, we cannot just take memory from the boot
allocator to create the PTEs while we are passing memory to the heap
allocator.

To solve this race, we leave early boot slightly sooner so that Xen PTE
pages are allocated from the heap instead of the boot allocator. We can
do this because the metadata for the 1st node is statically allocated,
and by the time we need memory to create mappings for the 2nd node, we
already have enough memory in the heap allocator in the 1st node.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b813ea75b5..3b698c8c41 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1746,6 +1746,22 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 
 numa_initmem_init(0, raw_max_page);
 
+/*
+ * When we do not have a direct map, memory for metadata of heap nodes in
+ * init_node_heap() is allocated from xenheap, which needs to be mapped and
+ * unmapped on demand. However, we cannot just take memory from the boot
+ * allocator to create the PTEs while we are passing memory to the heap
+ * allocator during end_boot_allocator().
+ *
+ * To solve this race, we need to leave early boot before
+ * end_boot_allocator() so that Xen PTE pages are allocated from the heap
+ * instead of the boot allocator. We can do this because the metadata for
+ * the 1st node is statically allocated, and by the time we need memory to
+ * create mappings for the 2nd node, we already have enough memory in the
+ * heap allocator in the 1st node.
+ */
+system_state = SYS_STATE_boot;
+
 if ( max_page - 1 > virt_to_mfn(HYPERVISOR_VIRT_END - 1) )
 {
 unsigned long lo = virt_to_mfn(HYPERVISOR_VIRT_END - 1);
@@ -1777,8 +1793,6 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 else
 end_boot_allocator();
 
-system_state = SYS_STATE_boot;
-
 bsp_stack = cpu_alloc_stack(0);
 if ( !bsp_stack )
 panic("No memory for BSP stack\n");
-- 
2.40.1




[PATCH v2] xen/arm64: mm: Use per-pCPU page-tables

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

At the moment, on Arm64, every pCPU is sharing the same page-tables.

In a follow-up patch, we will allow the possibility to remove the
direct map and therefore it will be necessary to have a mapcache.

While we have plenty of spare virtual address space to reserve part
for each pCPU, it means that temporary mappings (e.g. guest memory)
could be accessible by every pCPU.

In order to increase our security posture, it would be better if
those mappings are only accessible by the pCPU doing the temporary
mapping.

In addition to that, a per-pCPU page-tables opens the way to have
per-domain mapping area.

Arm32 is already using per-pCPU page-tables so most of the code
can be re-used. Arm64 doesn't yet have support for the mapcache,
so a stub is provided (moved to its own header asm/domain_page.h).

Take the opportunity to fix a typo in a comment that is modified.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changelog since v1:
* Rebase
* Fix typoes

diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index d2651c9486..4f339efb7b 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -75,6 +75,7 @@ static void __init prepare_runtime_identity_mapping(void)
 paddr_t id_addr = virt_to_maddr(_start);
 lpae_t pte;
 DECLARE_OFFSETS(id_offsets, id_addr);
+lpae_t *root = this_cpu(xen_pgtable);
 
 if ( id_offsets[0] >= IDENTITY_MAPPING_AREA_NR_L0 )
 panic("Cannot handle ID mapping above %uTB\n",
@@ -85,7 +86,7 @@ static void __init prepare_runtime_identity_mapping(void)
 pte.pt.table = 1;
 pte.pt.xn = 0;
 
-write_pte(_pgtable[id_offsets[0]], pte);
+write_pte([id_offsets[0]], pte);
 
 /* Link second ID table */
 pte = pte_of_xenaddr((vaddr_t)xen_second_id);
diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
index 3a43601623..ac2a6d0332 100644
--- a/xen/arch/arm/domain_page.c
+++ b/xen/arch/arm/domain_page.c
@@ -3,6 +3,8 @@
 #include 
 #include 
 
+#include 
+
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
diff --git a/xen/arch/arm/include/asm/arm32/mm.h 
b/xen/arch/arm/include/asm/arm32/mm.h
index 856f2dbec4..87a315db01 100644
--- a/xen/arch/arm/include/asm/arm32/mm.h
+++ b/xen/arch/arm/include/asm/arm32/mm.h
@@ -1,12 +1,6 @@
 #ifndef __ARM_ARM32_MM_H__
 #define __ARM_ARM32_MM_H__
 
-#include 
-
-#include 
-
-DECLARE_PER_CPU(lpae_t *, xen_pgtable);
-
 /*
  * Only a limited amount of RAM, called xenheap, is always mapped on ARM32.
  * For convenience always return false.
@@ -16,8 +10,6 @@ static inline bool arch_mfns_in_directmap(unsigned long mfn, 
unsigned long nr)
 return false;
 }
 
-bool init_domheap_mappings(unsigned int cpu);
-
 static inline void arch_setup_page_tables(void)
 {
 }
diff --git a/xen/arch/arm/include/asm/domain_page.h 
b/xen/arch/arm/include/asm/domain_page.h
new file mode 100644
index 00..e9f52685e2
--- /dev/null
+++ b/xen/arch/arm/include/asm/domain_page.h
@@ -0,0 +1,13 @@
+#ifndef __ASM_ARM_DOMAIN_PAGE_H__
+#define __ASM_ARM_DOMAIN_PAGE_H__
+
+#ifdef CONFIG_ARCH_MAP_DOMAIN_PAGE
+bool init_domheap_mappings(unsigned int cpu);
+#else
+static inline bool init_domheap_mappings(unsigned int cpu)
+{
+return true;
+}
+#endif
+
+#endif /* __ASM_ARM_DOMAIN_PAGE_H__ */
diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 9a94d7eaf7..a76578a16f 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -2,6 +2,9 @@
 #define __ARCH_ARM_MM__
 
 #include 
+#include 
+
+#include 
 #include 
 #include 
 #include 
diff --git a/xen/arch/arm/include/asm/mmu/mm.h 
b/xen/arch/arm/include/asm/mmu/mm.h
index c5e03a66bf..c03c3a51e4 100644
--- a/xen/arch/arm/include/asm/mmu/mm.h
+++ b/xen/arch/arm/include/asm/mmu/mm.h
@@ -2,6 +2,8 @@
 #ifndef __ARM_MMU_MM_H__
 #define __ARM_MMU_MM_H__
 
+DECLARE_PER_CPU(lpae_t *, xen_pgtable);
+
 /* Non-boot CPUs use this to find the correct pagetables. */
 extern uint64_t init_ttbr;
 
diff --git a/xen/arch/arm/mmu/pt.c b/xen/arch/arm/mmu/pt.c
index a7755728ae..e772ab4e66 100644
--- a/xen/arch/arm/mmu/pt.c
+++ b/xen/arch/arm/mmu/pt.c
@@ -606,9 +606,9 @@ static int xen_pt_update(unsigned long virt,
 unsigned long left = nr_mfns;
 
 /*
- * For arm32, page-tables are different on each CPUs. Yet, they share
- * some common mappings. It is assumed that only common mappings
- * will be modified with this function.
+ * Page-tables are different on each CPU. Yet, they share some common
+ * mappings. It is assumed that only common mappings will be modified
+ * with this function.
  *
  * XXX: Add a check.
  */
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index 57f1b46499..8c81e26da3 100644
--- a/xen/arch/arm/mmu/setup.c
+++ b/xen/arch/arm/mmu/setup.c
@@ -26,17 +26,15 @@
  * PCPUs.
  */
 
-#ifdef 

[PATCH v2] x86/setup: Do not create valid mappings when directmap=no

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Create empty mappings in the second e820 pass. Also, destroy existing
direct map mappings created in the first pass.

To make xenheap pages visible in guests, it is necessary to create empty
L3 tables in the direct map even when directmap=no, since guest cr3s
copy idle domain's L4 entries, which means they will share mappings in
the direct map if we pre-populate idle domain's L4 entries and L3
tables. A helper is introduced for this.

Also, after the direct map is actually gone, we need to stop updating
the direct map in update_xen_mappings().

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3b698c8c41..84c496ac4a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -976,6 +976,57 @@ static struct domain *__init create_dom0(const module_t 
*image,
 /* How much of the directmap is prebuilt at compile time. */
 #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
 
+/*
+ * This either populates a valid direct map, or allocates empty L3 tables and
+ * creates the L4 entries for virtual address between [start, end) in the
+ * direct map depending on has_directmap();
+ *
+ * When directmap=no, we still need to populate empty L3 tables in the
+ * direct map region. The reason is that on-demand xenheap mappings are
+ * created in the idle domain's page table but must be seen by
+ * everyone. Since all domains share the direct map L4 entries, they
+ * will share xenheap mappings if we pre-populate the L4 entries and L3
+ * tables in the direct map region for all RAM. We also rely on the fact
+ * that L3 tables are never freed.
+ */
+static void __init populate_directmap(uint64_t pstart, uint64_t pend,
+  unsigned int flags)
+{
+unsigned long vstart = (unsigned long)__va(pstart);
+unsigned long vend = (unsigned long)__va(pend);
+
+if ( pstart >= pend )
+return;
+
+BUG_ON(vstart < DIRECTMAP_VIRT_START);
+BUG_ON(vend > DIRECTMAP_VIRT_END);
+
+if ( has_directmap() )
+/* Populate valid direct map. */
+BUG_ON(map_pages_to_xen(vstart, maddr_to_mfn(pstart),
+PFN_DOWN(pend - pstart), flags));
+else
+{
+/* Create empty L3 tables. */
+unsigned long vaddr = vstart & ~((1UL << L4_PAGETABLE_SHIFT) - 1);
+
+for ( ; vaddr < vend; vaddr += (1UL << L4_PAGETABLE_SHIFT) )
+{
+l4_pgentry_t *pl4e = _pg_table[l4_table_offset(vaddr)];
+
+if ( !(l4e_get_flags(*pl4e) & _PAGE_PRESENT) )
+{
+mfn_t mfn = alloc_boot_pages(1, 1);
+void *v = map_domain_page(mfn);
+
+clear_page(v);
+UNMAP_DOMAIN_PAGE(v);
+l4e_write(pl4e, l4e_from_mfn(mfn, __PAGE_HYPERVISOR));
+}
+}
+}
+}
+
 void asmlinkage __init noreturn __start_xen(unsigned long mbi_p)
 {
 const char *memmap_type = NULL, *loader, *cmdline = "";
@@ -1596,8 +1647,17 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 map_e = min_t(uint64_t, e,
   ARRAY_SIZE(l2_directmap) << L2_PAGETABLE_SHIFT);
 
-/* Pass mapped memory to allocator /before/ creating new mappings. */
+/*
+ * Pass mapped memory to allocator /before/ creating new mappings.
+ * The direct map for the bottom 4GiB has been populated in the first
+ * e820 pass. In the second pass, we make sure those existing mappings
+ * are destroyed when directmap=no.
+ */
 init_boot_pages(s, min(map_s, e));
+if ( !has_directmap() )
+destroy_xen_mappings((unsigned long)__va(s),
+ (unsigned long)__va(min(map_s, e)));
+
 s = map_s;
 if ( s < map_e )
 {
@@ -1605,6 +1665,9 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 map_s = (s + mask) & ~mask;
 map_e &= ~mask;
 init_boot_pages(map_s, map_e);
+if ( !has_directmap() )
+destroy_xen_mappings((unsigned long)__va(map_s),
+ (unsigned long)__va(map_e));
 }
 
 if ( map_s > map_e )
@@ -1618,8 +1681,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 
 if ( map_e < end )
 {
-map_pages_to_xen((unsigned long)__va(map_e), 
maddr_to_mfn(map_e),
- PFN_DOWN(end - map_e), PAGE_HYPERVISOR);
+populate_directmap(map_e, end, PAGE_HYPERVISOR);
 init_boot_pages(map_e, end);
 map_e = end;
 }
@@ -1628,13 +1690,11 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 {
 /* This range must not be passed to the boot allocator and
  * must also not be mapped with _PAGE_GLOBAL. */
-  

[PATCH v2] xen/arm64: Implement a mapcache for arm64

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

At the moment, on arm64, map_domain_page() is implemented using
virt_to_mfn(). Therefore it is relying on the directmap.

In a follow-up patch, we will allow the admin to remove the directmap.
Therefore we want to implement a mapcache.

Thanksfully there is already one for arm32. So select ARCH_ARM_DOMAIN_PAGE
and add the necessary boiler plate to support 64-bit:
- The page-table start at level 0, so we need to allocate the level
  1 page-table
- map_domain_page() should check if the page is in the directmap. If
  yes, then use virt_to_mfn() to limit the performance impact
  when the directmap is still enabled (this will be selectable
  on the command line).

Take the opportunity to replace first_table_offset(...) with offsets[...].

Note that, so far, arch_mfns_in_directmap() always return true on
arm64. So the mapcache is not yet used. This will change in a
follow-up patch.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



There are a few TODOs:
- It is becoming more critical to fix the mapcache
  implementation (this is not compliant with the Arm Arm)
- Evaluate the performance

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 50e9bfae1a..278243f0d6 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -1,7 +1,6 @@
 config ARM_32
def_bool y
depends on "$(ARCH)" = "arm32"
-   select ARCH_MAP_DOMAIN_PAGE
 
 config ARM_64
def_bool y
@@ -12,6 +11,7 @@ config ARM_64
 config ARM
def_bool y
select HAS_ALTERNATIVE
+   select ARCH_MAP_DOMAIN_PAGE
select HAS_DEVICE_TREE
select HAS_PASSTHROUGH
select HAS_UBSAN
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index 4f339efb7b..f4a81aa705 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -4,6 +4,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
@@ -236,6 +237,14 @@ void __init setup_mm(void)
 setup_frametable_mappings(ram_start, ram_end);
 max_page = PFN_DOWN(ram_end);
 
+/*
+ * The allocators may need to use map_domain_page() (such as for
+ * scrubbing pages). So we need to prepare the domheap area first.
+ */
+if ( !init_domheap_mappings(smp_processor_id()) )
+panic("CPU%u: Unable to prepare the domheap page-tables\n",
+  smp_processor_id());
+
 init_staticmem_pages();
 }
 
diff --git a/xen/arch/arm/domain_page.c b/xen/arch/arm/domain_page.c
index ac2a6d0332..0f6ba48892 100644
--- a/xen/arch/arm/domain_page.c
+++ b/xen/arch/arm/domain_page.c
@@ -1,4 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
+#include 
 #include 
 #include 
 #include 
@@ -8,6 +9,8 @@
 /* Override macros from asm/page.h to make them work with mfn_t */
 #undef virt_to_mfn
 #define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+#undef mfn_to_virt
+#define mfn_to_virt(va) __mfn_to_virt(mfn_x(mfn))
 
 /* cpu0's domheap page tables */
 static DEFINE_PAGE_TABLES(cpu0_dommap, DOMHEAP_SECOND_PAGES);
@@ -31,13 +34,30 @@ bool init_domheap_mappings(unsigned int cpu)
 {
 unsigned int order = get_order_from_pages(DOMHEAP_SECOND_PAGES);
 lpae_t *root = per_cpu(xen_pgtable, cpu);
+lpae_t *first;
 unsigned int i, first_idx;
 lpae_t *domheap;
 mfn_t mfn;
 
+/* Convenience aliases */
+DECLARE_OFFSETS(offsets, DOMHEAP_VIRT_START);
+
 ASSERT(root);
 ASSERT(!per_cpu(xen_dommap, cpu));
 
+/*
+ * On Arm64, the root is at level 0. Therefore we need an extra step
+ * to allocate the first level page-table.
+ */
+#ifdef CONFIG_ARM_64
+if ( create_xen_table([offsets[0]]) )
+return false;
+
+first = xen_map_table(lpae_get_mfn(root[offsets[0]]));
+#else
+first = root;
+#endif
+
 /*
  * The domheap for cpu0 is initialized before the heap is initialized.
  * So we need to use pre-allocated pages.
@@ -58,16 +78,20 @@ bool init_domheap_mappings(unsigned int cpu)
  * domheap mapping pages.
  */
 mfn = virt_to_mfn(domheap);
-first_idx = first_table_offset(DOMHEAP_VIRT_START);
+first_idx = offsets[1];
 for ( i = 0; i < DOMHEAP_SECOND_PAGES; i++ )
 {
 lpae_t pte = mfn_to_xen_entry(mfn_add(mfn, i), MT_NORMAL);
 pte.pt.table = 1;
-write_pte([first_idx + i], pte);
+write_pte([first_idx + i], pte);
 }
 
 per_cpu(xen_dommap, cpu) = domheap;
 
+#ifdef CONFIG_ARM_64
+xen_unmap_table(first);
+#endif
+
 return true;
 }
 
@@ -91,6 +115,10 @@ void *map_domain_page(mfn_t mfn)
 lpae_t pte;
 int i, slot;
 
+/* Bypass the mapcache if the page is in the directmap */
+if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
+return mfn_to_virt(mfn);
+
 local_irq_save(flags);
 
 /* The map is laid out as an open-addressed hash table where each
@@ -153,13 +181,25 @@ void *map_domain_page(mfn_t mfn)
 /* Release a mapping taken with map_domain_page() */
 void 

[PATCH v2] x86/mapcache: Initialise the mapcache for the idle domain

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

In order to use the mapcache in the idle domain, we also have to
populate its page tables in the PERDOMAIN region, and we need to move
mapcache_domain_init() earlier in arch_domain_create().

Note, commit 'x86: lift mapcache variable to the arch level' has
initialised the mapcache for HVM domains. With this patch, PV, HVM,
idle domains now all initialise the mapcache.

Signed-off-by: Wei Wang 
Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
  * Free resources if mapcache initialisation fails
  * Remove `is_idle_domain()` check from `create_perdomain_mappings()`

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8ef3f7746f..d4c125bc14 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -750,9 +750,16 @@ int arch_domain_create(struct domain *d,
 
 spin_lock_init(>arch.e820_lock);
 
+if ( (rc = mapcache_domain_init(d)) != 0)
+{
+free_perdomain_mappings(d);
+return rc;
+}
+
 /* Minimal initialisation for the idle domain. */
 if ( unlikely(is_idle_domain(d)) )
 {
+struct page_info *pg = d->arch.perdomain_l3_pg;
 static const struct arch_csw idle_csw = {
 .from = paravirt_ctxt_switch_from,
 .to   = paravirt_ctxt_switch_to,
@@ -763,6 +770,9 @@ int arch_domain_create(struct domain *d,
 
 d->arch.cpu_policy = ZERO_BLOCK_PTR; /* Catch stray misuses. */
 
+idle_pg_table[l4_table_offset(PERDOMAIN_VIRT_START)] =
+l4e_from_page(pg, __PAGE_HYPERVISOR_RW);
+
 return 0;
 }
 
@@ -843,8 +853,6 @@ int arch_domain_create(struct domain *d,
 
 psr_domain_init(d);
 
-mapcache_domain_init(d);
-
 if ( is_hvm_domain(d) )
 {
 if ( (rc = hvm_domain_initialise(d, config)) != 0 )
-- 
2.40.1




[PATCH v2] xen/arm64: Allow the admin to enable/disable the directmap

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

Implement the same command line option as x86 to enable/disable the
directmap. By default this is kept enabled.

Also modify setup_directmap_mappings() to populate the L0 entries
related to the directmap area.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* Rely on the Kconfig option to enable Secret Hiding on Arm64
* Use generic helper instead of arch_has_directmap()

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 63c946f482..df90b1c4c9 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -799,7 +799,7 @@ that enabling this option cannot guarantee anything beyond 
what underlying
 hardware guarantees (with, where available and known to Xen, respective
 tweaks applied).
 
-### directmap (x86)
+### directmap (arm64, x86)
 > `= `
 
 > Default: `true`
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 278243f0d6..7a19826233 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM_64
depends on !ARM_32
select 64BIT
select HAS_FAST_MULTIPLY
+   select HAS_SECRET_HIDING
 
 config ARM
def_bool y
diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index f4a81aa705..22e1e5b9f4 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -157,16 +157,27 @@ void __init switch_ttbr(uint64_t ttbr)
 update_identity_mapping(false);
 }
 
-/* Map the region in the directmap area. */
+/*
+ * This either populate a valid fdirect map, or allocates empty L1 tables
+ * and creates the L0 entries for the given region in the direct map
+ * depending on has_directmap().
+ *
+ * When directmap=no, we still need to populate empty L1 tables in the
+ * directmap region. The reason is that the root page-table (i.e. L0)
+ * is per-CPU and secondary CPUs will initialize their root page-table
+ * based on the pCPU0 one. So L0 entries will be shared if they are
+ * pre-populated. We also rely on the fact that L1 tables are never
+ * freed.
+ */
 static void __init setup_directmap_mappings(unsigned long base_mfn,
 unsigned long nr_mfns)
 {
+unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
 int rc;
 
 /* First call sets the directmap physical and virtual offset. */
 if ( mfn_eq(directmap_mfn_start, INVALID_MFN) )
 {
-unsigned long mfn_gb = base_mfn & ~((FIRST_SIZE >> PAGE_SHIFT) - 1);
 
 directmap_mfn_start = _mfn(base_mfn);
 directmap_base_pdx = mfn_to_pdx(_mfn(base_mfn));
@@ -187,6 +198,24 @@ static void __init setup_directmap_mappings(unsigned long 
base_mfn,
 panic("cannot add directmap mapping at %lx below heap start %lx\n",
   base_mfn, mfn_x(directmap_mfn_start));
 
+if ( !has_directmap() )
+{
+vaddr_t vaddr = (vaddr_t)__mfn_to_virt(base_mfn);
+lpae_t *root = this_cpu(xen_pgtable);
+unsigned int i, slot;
+
+slot = first_table_offset(vaddr);
+nr_mfns += base_mfn - mfn_gb;
+for ( i = 0; i < nr_mfns; i += BIT(XEN_PT_LEVEL_ORDER(0), UL), slot++ )
+{
+lpae_t *entry = [slot];
+
+if ( !lpae_is_valid(*entry) && !create_xen_table(entry) )
+panic("Unable to populate zeroeth slot %u\n", slot);
+}
+return;
+}
+
 rc = map_pages_to_xen((vaddr_t)__mfn_to_virt(base_mfn),
   _mfn(base_mfn), nr_mfns,
   PAGE_HYPERVISOR_RW | _PAGE_BLOCK);
diff --git a/xen/arch/arm/include/asm/arm64/mm.h 
b/xen/arch/arm/include/asm/arm64/mm.h
index e0bd23a6ed..5888f29159 100644
--- a/xen/arch/arm/include/asm/arm64/mm.h
+++ b/xen/arch/arm/include/asm/arm64/mm.h
@@ -3,13 +3,10 @@
 
 extern DEFINE_PAGE_TABLE(xen_pgtable);
 
-/*
- * On ARM64, all the RAM is currently direct mapped in Xen.
- * Hence return always true.
- */
+/* On Arm64, the user can chose whether all the RAM is directmap. */
 static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 {
-return true;
+return has_directmap();
 }
 
 void arch_setup_page_tables(void);
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index b15a18a494..7fb75c5c3e 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3dec365c57..2bd060d321 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -748,6 +748,7 @@ void asmlinkage __init start_xen(unsigned long 
boot_phys_offset,
 cmdline_parse(cmdline);
 
 setup_mm();
+printk("Booting with directmap %s\n", has_directmap() ? "on" : "off");
 
 vm_init();
 
-- 
2.40.1




[PATCH v2] xen/page_alloc: Add a path for xenheap when there is no direct map

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When there is not an always-mapped direct map, xenheap allocations need
to be mapped and unmapped on-demand.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



I have left the call to map_pages_to_xen() and destroy_xen_mappings()
in the split heap for now. I am not entirely convinced this is necessary
because in that setup only the xenheap would be always mapped and
this doesn't contain any guest memory (aside the grant-table).
So map/unmapping for every allocation seems unnecessary.

Changes in v2:
* Fix remaining wrong indentation in alloc_xenheap_pages()

Changes since Hongyan's version:
* Rebase
* Fix indentation in alloc_xenheap_pages()
* Fix build for arm32

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index a3746cfbcf..52934ec5c1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2237,6 +2237,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe)
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
 {
 struct page_info *pg;
+void *ret;
 
 ASSERT_ALLOC_CONTEXT();
 
@@ -2245,17 +2246,36 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 if ( unlikely(pg == NULL) )
 return NULL;
 
+ret = page_to_virt(pg);
+
+if ( !has_directmap() &&
+ map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
+  PAGE_HYPERVISOR) )
+{
+/* Failed to map xenheap pages. */
+free_heap_pages(pg, order, false);
+return NULL;
+}
+
 return page_to_virt(pg);
 }
 
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
+unsigned long va = (unsigned long)v & PAGE_MASK;
+
 ASSERT_ALLOC_CONTEXT();
 
 if ( v == NULL )
 return;
 
+if ( !has_directmap() &&
+ destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
+dprintk(XENLOG_WARNING,
+"Error while destroying xenheap mappings at %p, order %u\n",
+v, order);
+
 free_heap_pages(virt_to_page(v), order, false);
 }
 
@@ -2279,6 +2299,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 {
 struct page_info *pg;
 unsigned int i;
+void *ret;
 
 ASSERT_ALLOC_CONTEXT();
 
@@ -2291,16 +2312,28 @@ void *alloc_xenheap_pages(unsigned int order, unsigned 
int memflags)
 if ( unlikely(pg == NULL) )
 return NULL;
 
+ret = page_to_virt(pg);
+
+if ( !has_directmap() &&
+ map_pages_to_xen((unsigned long)ret, page_to_mfn(pg), 1UL << order,
+  PAGE_HYPERVISOR) )
+{
+/* Failed to map xenheap pages. */
+free_domheap_pages(pg, order);
+return NULL;
+}
+
 for ( i = 0; i < (1u << order); i++ )
 pg[i].count_info |= PGC_xen_heap;
 
-return page_to_virt(pg);
+return ret;
 }
 
 void free_xenheap_pages(void *v, unsigned int order)
 {
 struct page_info *pg;
 unsigned int i;
+unsigned long va = (unsigned long)v & PAGE_MASK;
 
 ASSERT_ALLOC_CONTEXT();
 
@@ -2312,6 +2345,12 @@ void free_xenheap_pages(void *v, unsigned int order)
 for ( i = 0; i < (1u << order); i++ )
 pg[i].count_info &= ~PGC_xen_heap;
 
+if ( !has_directmap() &&
+ destroy_xen_mappings(va, va + (1UL << (order + PAGE_SHIFT))) )
+dprintk(XENLOG_WARNING,
+"Error while destroying xenheap mappings at %p, order %u\n",
+v, order);
+
 free_heap_pages(pg, order, true);
 }
 
-- 
2.40.1




[PATCH v2] x86/setup: vmap heap nodes when they are outside the direct map

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

When we do not have a direct map, archs_mfn_in_direct_map() will always
return false, thus init_node_heap() will allocate xenheap pages from an
existing node for the metadata of a new node. This means that the
metadata of a new node is in a different node, slowing down heap
allocation.

Since we now have early vmap, vmap the metadata locally in the new node.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* vmap_contig_pages() was renamed to vmap_contig()
* Fix indentation and coding style

Changes from Hongyan's version:
* arch_mfn_in_direct_map() was renamed to
  arch_mfns_in_direct_map()
* Use vmap_contig_pages() rather than __vmap(...).
* Add missing include (xen/vmap.h) so it compiles on Arm

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 52934ec5c1..42b9aaae1c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -136,6 +136,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -604,22 +605,44 @@ static unsigned long init_node_heap(int node, unsigned 
long mfn,
 needed = 0;
 }
 else if ( *use_tail && nr >= needed &&
-  arch_mfns_in_directmap(mfn + nr - needed, needed) &&
   (!xenheap_bits ||
-   !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
+  !((mfn + nr - 1) >> (xenheap_bits - PAGE_SHIFT))) )
 {
-_heap[node] = mfn_to_virt(mfn + nr - needed);
-avail[node] = mfn_to_virt(mfn + nr - 1) +
-  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+if ( arch_mfns_in_directmap(mfn + nr - needed, needed) )
+{
+_heap[node] = mfn_to_virt(mfn + nr - needed);
+avail[node] = mfn_to_virt(mfn + nr - 1) +
+  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+}
+else
+{
+mfn_t needed_start = _mfn(mfn + nr - needed);
+
+_heap[node] = vmap_contig(needed_start, needed);
+BUG_ON(!_heap[node]);
+avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
+  sizeof(**avail) * NR_ZONES;
+}
 }
 else if ( nr >= needed &&
-  arch_mfns_in_directmap(mfn, needed) &&
   (!xenheap_bits ||
-   !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )
+  !((mfn + needed - 1) >> (xenheap_bits - PAGE_SHIFT))) )
 {
-_heap[node] = mfn_to_virt(mfn);
-avail[node] = mfn_to_virt(mfn + needed - 1) +
-  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+if ( arch_mfns_in_directmap(mfn, needed) )
+{
+_heap[node] = mfn_to_virt(mfn);
+avail[node] = mfn_to_virt(mfn + needed - 1) +
+  PAGE_SIZE - sizeof(**avail) * NR_ZONES;
+}
+else
+{
+mfn_t needed_start = _mfn(mfn);
+
+_heap[node] = vmap_contig(needed_start, needed);
+BUG_ON(!_heap[node]);
+avail[node] = (void *)(_heap[node]) + (needed << PAGE_SHIFT) -
+  sizeof(**avail) * NR_ZONES;
+}
 *use_tail = false;
 }
 else if ( get_order_from_bytes(sizeof(**_heap)) ==
-- 
2.40.1




[PATCH v2] Rename maddr_to_virt() calls

2024-01-16 Thread Elias El Yandouzi
Until directmap gets completely removed, we'd still need to
keep some calls to mmaddr_to_virt() for xenheap pages or when the
directmap is enabled.

Rename the macro to maddr_to_directmap_virt() to flag them and
prevent further use of maddr_to_virt().

Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/x86/dmi_scan.c b/xen/arch/x86/dmi_scan.c
index 81f80c053a..ac016f3a04 100644
--- a/xen/arch/x86/dmi_scan.c
+++ b/xen/arch/x86/dmi_scan.c
@@ -277,7 +277,7 @@ const char *__init dmi_get_table(paddr_t *base, u32 *len)
return "SMBIOS";
}
} else {
-   char __iomem *p = maddr_to_virt(0xF), *q;
+   char __iomem *p = maddr_to_directmap_virt(0xF), *q;
union {
struct dmi_eps dmi;
struct smbios3_eps smbios3;
@@ -364,7 +364,7 @@ static int __init dmi_iterate(void (*decode)(const struct 
dmi_header *))
dmi.size = 0;
smbios3.length = 0;
 
-   p = maddr_to_virt(0xF);
+   p = maddr_to_directmap_virt(0xF);
for (q = p; q < p + 0x1; q += 16) {
if (!dmi.size) {
memcpy_fromio(, q, sizeof(dmi));
diff --git a/xen/arch/x86/include/asm/mach-default/bios_ebda.h 
b/xen/arch/x86/include/asm/mach-default/bios_ebda.h
index 42de6b2a5b..8cfe53d1f2 100644
--- a/xen/arch/x86/include/asm/mach-default/bios_ebda.h
+++ b/xen/arch/x86/include/asm/mach-default/bios_ebda.h
@@ -7,7 +7,7 @@
  */
 static inline unsigned int get_bios_ebda(void)
 {
-   unsigned int address = *(unsigned short *)maddr_to_virt(0x40E);
+   unsigned int address = *(unsigned short 
*)maddr_to_directmap_virt(0x40E);
address <<= 4;
return address; /* 0 means none */
 }
diff --git a/xen/arch/x86/include/asm/page.h b/xen/arch/x86/include/asm/page.h
index c6891b52d4..bf7bf08ba4 100644
--- a/xen/arch/x86/include/asm/page.h
+++ b/xen/arch/x86/include/asm/page.h
@@ -240,11 +240,11 @@ void copy_page_sse2(void *to, const void *from);
 
 /* Convert between Xen-heap virtual addresses and machine addresses. */
 #define __pa(x) (virt_to_maddr(x))
-#define __va(x) (maddr_to_virt(x))
+#define __va(x) (maddr_to_directmap_virt(x))
 
 /* Convert between Xen-heap virtual addresses and machine frame numbers. */
 #define __virt_to_mfn(va)   (virt_to_maddr(va) >> PAGE_SHIFT)
-#define __mfn_to_virt(mfn)  (maddr_to_virt((paddr_t)(mfn) << PAGE_SHIFT))
+#define __mfn_to_virt(mfn)  (maddr_to_directmap_virt((paddr_t)(mfn) << 
PAGE_SHIFT))
 
 /* Convert between machine frame numbers and page-info structures. */
 #define mfn_to_page(mfn)(frame_table + mfn_to_pdx(mfn))
@@ -270,7 +270,7 @@ void copy_page_sse2(void *to, const void *from);
 #define virt_to_mfn(va) __virt_to_mfn(va)
 #define mfn_to_directmap_virt(mfn)__mfn_to_virt(mfn)
 #define virt_to_maddr(va)   __virt_to_maddr((unsigned long)(va))
-#define maddr_to_virt(ma)   __maddr_to_virt((unsigned long)(ma))
+#define maddr_to_directmap_virt(ma)   __maddr_to_directmap_virt((unsigned 
long)(ma))
 #define maddr_to_page(ma)   __maddr_to_page(ma)
 #define page_to_maddr(pg)   __page_to_maddr(pg)
 #define virt_to_page(va)__virt_to_page(va)
diff --git a/xen/arch/x86/include/asm/x86_64/page.h 
b/xen/arch/x86/include/asm/x86_64/page.h
index f49e10475f..b9e47da46e 100644
--- a/xen/arch/x86/include/asm/x86_64/page.h
+++ b/xen/arch/x86/include/asm/x86_64/page.h
@@ -46,7 +46,7 @@ static inline unsigned long __virt_to_maddr(unsigned long va)
 return xen_phys_start + va - XEN_VIRT_START;
 }
 
-static inline void *__maddr_to_virt(unsigned long ma)
+static inline void *__maddr_to_directmap_virt(unsigned long ma)
 {
 /* Offset in the direct map, accounting for pdx compression */
 unsigned long va_offset = maddr_to_directmapoff(ma);
diff --git a/xen/arch/x86/mpparse.c b/xen/arch/x86/mpparse.c
index d8ccab2449..69181b0abe 100644
--- a/xen/arch/x86/mpparse.c
+++ b/xen/arch/x86/mpparse.c
@@ -664,7 +664,7 @@ void __init get_smp_config (void)
 
 static int __init smp_scan_config (unsigned long base, unsigned long length)
 {
-   unsigned int *bp = maddr_to_virt(base);
+   unsigned int *bp = maddr_to_directmap_virt(base);
struct intel_mp_floating *mpf;
 
Dprintk("Scan SMP from %p for %ld bytes.\n", bp,length);
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 39aed5845d..1b02e2b6d5 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -1764,7 +1764,7 @@ void __init efi_init_memory(void)
 if ( map_pages_to_xen((unsigned 
long)mfn_to_directmap_virt(smfn),
 _mfn(smfn), emfn - smfn, prot) == 0 )
 desc->VirtualStart =
-(unsigned long)maddr_to_virt(desc->PhysicalStart);
+(unsigned 
long)maddr_to_directmap_virt(desc->PhysicalStart);
 else
 printk(XENLOG_ERR "Could not 

[PATCH v2] x86: Lift mapcache variable to the arch level

2024-01-16 Thread Elias El Yandouzi
From: Wei Liu 

It is going to be needed by HVM and idle domain as well, because without
the direct map, both need a mapcache to map pages.

This only lifts the mapcache variable up. Whether we populate the
mapcache for a domain is unchanged in this patch.

Signed-off-by: Wei Liu 
Signed-off-by: Wei Wang 
Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 8a31d18f69..8ef3f7746f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -843,6 +843,8 @@ int arch_domain_create(struct domain *d,
 
 psr_domain_init(d);
 
+mapcache_domain_init(d);
+
 if ( is_hvm_domain(d) )
 {
 if ( (rc = hvm_domain_initialise(d, config)) != 0 )
@@ -850,8 +852,6 @@ int arch_domain_create(struct domain *d,
 }
 else if ( is_pv_domain(d) )
 {
-mapcache_domain_init(d);
-
 if ( (rc = pv_domain_initialise(d)) != 0 )
 goto fail;
 }
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index eac5e3304f..55e337aaf7 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -82,11 +82,11 @@ void *map_domain_page(mfn_t mfn)
 #endif
 
 v = mapcache_current_vcpu();
-if ( !v || !is_pv_vcpu(v) )
+if ( !v )
 return mfn_to_virt(mfn_x(mfn));
 
-dcache = >domain->arch.pv.mapcache;
-vcache = >arch.pv.mapcache;
+dcache = >domain->arch.mapcache;
+vcache = >arch.mapcache;
 if ( !dcache->inuse )
 return mfn_to_virt(mfn_x(mfn));
 
@@ -187,14 +187,14 @@ void unmap_domain_page(const void *ptr)
 ASSERT(va >= MAPCACHE_VIRT_START && va < MAPCACHE_VIRT_END);
 
 v = mapcache_current_vcpu();
-ASSERT(v && is_pv_vcpu(v));
+ASSERT(v);
 
-dcache = >domain->arch.pv.mapcache;
+dcache = >domain->arch.mapcache;
 ASSERT(dcache->inuse);
 
 idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
 mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx));
-hashent = >arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)];
+hashent = >arch.mapcache.hash[MAPHASH_HASHFN(mfn)];
 
 local_irq_save(flags);
 
@@ -233,11 +233,9 @@ void unmap_domain_page(const void *ptr)
 
 int mapcache_domain_init(struct domain *d)
 {
-struct mapcache_domain *dcache = >arch.pv.mapcache;
+struct mapcache_domain *dcache = >arch.mapcache;
 unsigned int bitmap_pages;
 
-ASSERT(is_pv_domain(d));
-
 #ifdef NDEBUG
 if ( !mem_hotplug && max_page <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
 return 0;
@@ -261,12 +259,12 @@ int mapcache_domain_init(struct domain *d)
 int mapcache_vcpu_init(struct vcpu *v)
 {
 struct domain *d = v->domain;
-struct mapcache_domain *dcache = >arch.pv.mapcache;
+struct mapcache_domain *dcache = >arch.mapcache;
 unsigned long i;
 unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
 unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
 
-if ( !is_pv_vcpu(v) || !dcache->inuse )
+if ( !dcache->inuse )
 return 0;
 
 if ( ents > dcache->entries )
@@ -293,7 +291,7 @@ int mapcache_vcpu_init(struct vcpu *v)
 BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES);
 for ( i = 0; i < MAPHASH_ENTRIES; i++ )
 {
-struct vcpu_maphash_entry *hashent = >arch.pv.mapcache.hash[i];
+struct vcpu_maphash_entry *hashent = >arch.mapcache.hash[i];
 
 hashent->mfn = ~0UL; /* never valid to map */
 hashent->idx = MAPHASHENT_NOTINUSE;
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index 4d97c68028..85b890b2cb 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -286,9 +286,6 @@ struct pv_domain
 /* Mitigate L1TF with shadow/crashing? */
 bool check_l1tf;
 
-/* map_domain_page() mapping cache. */
-struct mapcache_domain mapcache;
-
 struct cpuidmasks *cpuidmasks;
 };
 
@@ -327,6 +324,9 @@ struct arch_domain
 
 uint8_t spec_ctrl_flags; /* See SCF_DOM_MASK */
 
+/* map_domain_page() mapping cache. */
+struct mapcache_domain mapcache;
+
 union {
 struct pv_domain pv;
 struct hvm_domain hvm;
@@ -517,9 +517,6 @@ struct arch_domain
 
 struct pv_vcpu
 {
-/* map_domain_page() mapping cache. */
-struct mapcache_vcpu mapcache;
-
 unsigned int vgc_flags;
 
 struct trap_info *trap_ctxt;
@@ -619,6 +616,9 @@ struct arch_vcpu
 #define async_exception_state(t) async_exception_state[(t)-1]
 uint8_t async_exception_mask;
 
+/* map_domain_page() mapping cache. */
+struct mapcache_vcpu mapcache;
+
 /* Virtual Machine Extensions */
 union {
 struct pv_vcpu pv;
-- 
2.40.1




[PATCH v2] xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

The arm32 version of init_secondary_pagetables() will soon be re-used
for arm64 as well where the root table starts at level 0 rather than level 1.

So rename 'first' to 'root'.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changelog in v2:
* Rebase
* Fix typo

diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index b6fc0aae07..fb5df667ba 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -84,32 +84,30 @@ int prepare_secondary_mm(int cpu)
 #else
 int prepare_secondary_mm(int cpu)
 {
-lpae_t *first;
+lpae_t *root = alloc_xenheap_page();
 
-first = alloc_xenheap_page(); /* root == first level on 32-bit 3-level 
trie */
-
-if ( !first )
+if ( !root )
 {
-printk("CPU%u: Unable to allocate the first page-table\n", cpu);
+printk("CPU%u: Unable to allocate the root page-table\n", cpu);
 return -ENOMEM;
 }
 
 /* Initialise root pagetable from root of boot tables */
-memcpy(first, per_cpu(xen_pgtable, 0), PAGE_SIZE);
-per_cpu(xen_pgtable, cpu) = first;
+memcpy(root, per_cpu(xen_pgtable, 0), PAGE_SIZE);
+per_cpu(xen_pgtable, cpu) = root;
 
 if ( !init_domheap_mappings(cpu) )
 {
 printk("CPU%u: Unable to prepare the domheap page-tables\n", cpu);
 per_cpu(xen_pgtable, cpu) = NULL;
-free_xenheap_page(first);
+free_xenheap_page(root);
 return -ENOMEM;
 }
 
 clear_boot_pagetables();
 
 /* Set init_ttbr for this CPU coming up */
-init_ttbr = __pa(first);
+init_ttbr = __pa(root);
 clean_dcache(init_ttbr);
 
 return 0;
-- 
2.40.1




[PATCH v2] Rename mfn_to_virt() calls

2024-01-16 Thread Elias El Yandouzi
Until directmap gets completely removed, we'd still need to
keep some calls to mfn_to_virt() for xenheap pages or when the
directmap is enabled.

Rename the macro to mfn_to_directmap_virt() to flag them and
prevent further use of mfn_to_virt().

Signed-off-by: Elias El Yandouzi 

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index cbcf3bf147..9a94d7eaf7 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -336,6 +336,7 @@ static inline uint64_t gvirt_to_maddr(vaddr_t va, paddr_t 
*pa,
  */
 #define virt_to_mfn(va) __virt_to_mfn(va)
 #define mfn_to_virt(mfn)__mfn_to_virt(mfn)
+#define mfn_to_directmap_virt(mfn) mfn_to_virt(mfn)
 
 /* Convert between Xen-heap virtual addresses and page-info structures. */
 static inline struct page_info *virt_to_page(const void *v)
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 89caefc8a2..62d6fee0f4 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -81,14 +81,14 @@ void *map_domain_page(mfn_t mfn)
 
 #ifdef NDEBUG
 if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
-return mfn_to_virt(mfn_x(mfn));
+return mfn_to_directmap_virt(mfn_x(mfn));
 #endif
 
 v = mapcache_current_vcpu();
 if ( !v || !v->domain->arch.mapcache.inuse )
 {
 if ( arch_mfns_in_directmap(mfn_x(mfn), 1) )
-return mfn_to_virt(mfn_x(mfn));
+return mfn_to_directmap_virt(mfn_x(mfn));
 else
 {
 BUG_ON(system_state >= SYS_STATE_smp_boot);
@@ -324,7 +324,7 @@ void *map_domain_page_global(mfn_t mfn)
 
 #ifdef NDEBUG
 if ( arch_mfn_in_directmap(mfn_x(mfn, 1)) )
-return mfn_to_virt(mfn_x(mfn));
+return mfn_to_directmap_virt(mfn_x(mfn));
 #endif
 
 return vmap(, 1);
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index e59f6657d9..1b3ebae16f 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -439,7 +439,7 @@ static int __init pvh_populate_p2m(struct domain *d)
  d->arch.e820[i].addr + d->arch.e820[i].size);
 enum hvm_translation_result res =
  hvm_copy_to_guest_phys(mfn_to_maddr(_mfn(addr)),
-mfn_to_virt(addr),
+mfn_to_directmap_virt(addr),
 end - d->arch.e820[i].addr,
 v);
 
@@ -613,7 +613,7 @@ static int __init pvh_load_kernel(struct domain *d, const 
module_t *image,
 
 if ( initrd != NULL )
 {
-rc = hvm_copy_to_guest_phys(last_addr, mfn_to_virt(initrd->mod_start),
+rc = hvm_copy_to_guest_phys(last_addr, 
mfn_to_directmap_virt(initrd->mod_start),
 initrd->mod_end, v);
 if ( rc )
 {
diff --git a/xen/arch/x86/include/asm/page.h b/xen/arch/x86/include/asm/page.h
index 350d1fb110..c6891b52d4 100644
--- a/xen/arch/x86/include/asm/page.h
+++ b/xen/arch/x86/include/asm/page.h
@@ -268,7 +268,7 @@ void copy_page_sse2(void *to, const void *from);
  */
 #define mfn_valid(mfn)  __mfn_valid(mfn_x(mfn))
 #define virt_to_mfn(va) __virt_to_mfn(va)
-#define mfn_to_virt(mfn)__mfn_to_virt(mfn)
+#define mfn_to_directmap_virt(mfn)__mfn_to_virt(mfn)
 #define virt_to_maddr(va)   __virt_to_maddr((unsigned long)(va))
 #define maddr_to_virt(ma)   __maddr_to_virt((unsigned long)(ma))
 #define maddr_to_page(ma)   __maddr_to_page(ma)
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index a72c32d87c..9530c93b68 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -318,8 +318,8 @@ void __init arch_init_memory(void)
 iostart_pfn = max_t(unsigned long, pfn, 1UL << (20 - PAGE_SHIFT));
 ioend_pfn = min(rstart_pfn, 16UL << (20 - PAGE_SHIFT));
 if ( iostart_pfn < ioend_pfn )
-destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn),
- (unsigned long)mfn_to_virt(ioend_pfn));
+destroy_xen_mappings((unsigned 
long)mfn_to_directmap_virt(iostart_pfn),
+ (unsigned 
long)mfn_to_directmap_virt(ioend_pfn));
 
 /* Mark as I/O up to next RAM region. */
 for ( ; pfn < rstart_pfn; pfn++ )
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 84c496ac4a..de69b7935c 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -400,7 +400,7 @@ void *__init bootstrap_map(const module_t *mod)
 void *ret;
 
 if ( system_state != SYS_STATE_early_boot )
-return mod ? mfn_to_virt(mod->mod_start) : NULL;
+return mod ? mfn_to_directmap_virt(mod->mod_start) : NULL;
 
 if ( !mod )
 {
@@ -1703,7 +1703,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 {
 set_pdx_range(mod[i].mod_start,
   mod[i].mod_start + PFN_UP(mod[i].mod_end));
-

[PATCH v2] x86: Add a boot option to enable and disable the direct map

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Also add a helper function to retrieve it. Change arch_mfns_in_direct_map
to check this option before returning.

This is added as a Kconfig option as well as a boot command line option.
While being generic, the Kconfig option is only usable for x86 at the moment.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 



Changes in V2:
* Introduce a Kconfig option
* Reword the commit message
* Make opt_directmap and helper generic

Changes since Hongyan's version:
* Reword the commit message
* opt_directmap is only modified during boot so mark it as
  __ro_after_init

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index 8e65f8bd18..63c946f482 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -799,6 +799,18 @@ that enabling this option cannot guarantee anything beyond 
what underlying
 hardware guarantees (with, where available and known to Xen, respective
 tweaks applied).
 
+### directmap (x86)
+> `= `
+
+> Default: `true`
+
+Enable or disable the direct map region in Xen.
+
+By default, Xen creates the direct map region which maps physical memory
+in that region. Setting this to no will remove the direct map, blocking
+exploits that leak secrets via speculative memory access in the direct
+map.
+
 ### dma_bits
 > `= `
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 1acdffc51c..350f41b832 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -29,6 +29,7 @@ config X86
select HAS_UBSAN
select HAS_VPCI if HVM
select NEEDS_LIBELF
+   select HAS_SECRET_HIDING
 
 config ARCH_DEFCONFIG
string
diff --git a/xen/arch/x86/include/asm/mm.h b/xen/arch/x86/include/asm/mm.h
index 7d26d9cd2f..4aae270a78 100644
--- a/xen/arch/x86/include/asm/mm.h
+++ b/xen/arch/x86/include/asm/mm.h
@@ -620,10 +620,18 @@ void write_32bit_pse_identmap(uint32_t *l2);
 /*
  * x86 maps part of physical memory via the directmap region.
  * Return whether the range of MFN falls in the directmap region.
+ *
+ * When boot command line sets directmap=no, we will not have a direct map at
+ * all so this will always return false.
  */
 static inline bool arch_mfns_in_directmap(unsigned long mfn, unsigned long nr)
 {
-unsigned long eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
+unsigned long eva;
+
+if ( !has_directmap() )
+return false;
+
+eva = min(DIRECTMAP_VIRT_END, HYPERVISOR_VIRT_END);
 
 return (mfn + nr) <= (virt_to_mfn(eva - 1) + 1);
 }
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 4d0c90b7a0..b813ea75b5 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1512,6 +1512,8 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 if ( highmem_start )
 xenheap_max_mfn(PFN_DOWN(highmem_start - 1));
 
+printk("Booting with directmap %s\n", has_directmap() ? "on" : "off");
+
 /*
  * Walk every RAM region and map it in its entirety (on x86/64, at least)
  * and notify it to the boot allocator.
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 310ad4229c..9a24c89ac5 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -83,6 +83,23 @@ config HAS_UBSAN
 config MEM_ACCESS_ALWAYS_ON
bool
 
+config HAS_SECRET_HIDING
+   bool
+
+config SECRET_HIDING
+bool "Secret hiding"
+depends on HAS_SECRET_HIDING
+---help---
+The directmap contains mapping for most of the RAM which makes domain
+memory easily accessible. While making the performance better, it also 
makes
+the hypervisor more vulnerable to speculation attacks.
+
+Enabling this feature will allow the user to decide whether the memory
+is always mapped at boot or mapped only on demand (see the command line
+option "directmap").
+
+If unsure, say N.
+
 config MEM_ACCESS
def_bool MEM_ACCESS_ALWAYS_ON
prompt "Memory Access and VM events" if !MEM_ACCESS_ALWAYS_ON
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 740b6f0ff7..a3746cfbcf 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -173,6 +173,11 @@ paddr_t __ro_after_init mem_hotplug;
 static char __initdata opt_badpage[100] = "";
 string_param("badpage", opt_badpage);
 
+bool __ro_after_init opt_directmap = true;
+#ifdef CONFIG_HAS_SECRET_HIDING
+boolean_param("directmap", opt_directmap);
+#endif
+
 /*
  * no-bootscrub -> Free pages are not zeroed during boot.
  */
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3d9b2d05a5..f860e98ee4 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -165,6 +165,13 @@ extern unsigned long max_page;
 extern unsigned long total_pages;
 extern paddr_t mem_hotplug;
 
+extern bool opt_directmap;
+
+static inline bool has_directmap(void)
+{
+return opt_directmap;
+}
+
 /*
  * Extra fault info types which are used to further describe
  * the source of an 

[PATCH v2] x86/pv: Map L4 page table for shim domain

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

The root page table is allocated from the domheap and isn't
mapped by default. Map it on demand to build pv shim domain.

Signed-off-by: Hongyan Xia 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* New patch

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index dc5e9fe117..fc51c7d362 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -991,8 +991,12 @@ do {\
  * !CONFIG_VIDEO case so the logic here can be simplified.
  */
 if ( pv_shim )
+{
+l4start = map_domain_page(l4start_mfn);
 pv_shim_setup_dom(d, l4start, v_start, vxenstore_start, vconsole_start,
   vphysmap_start, si);
+UNMAP_DOMAIN_PAGE(l4start);
+}
 
 #ifdef CONFIG_COMPAT
 if ( compat )
-- 
2.40.1




[PATCH v2] xen/x86: Add support for the PMAP

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

PMAP will be used in a follow-up patch to bootstrap map domain
page infrastructure -- we need some way to map pages to setup the
mapcache without a direct map.

The functions pmap_{map, unmap} open code {set, clear}_fixmap to break
the loop.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



The PMAP infrastructure was upstream separately for Arm since
Hongyan sent the secret-free hypervisor series. So this is a new
patch to plumb the feature on x86.

Changes in v2:
* Declare PMAP entries earlier in fixed_addresses
* Reword the commit message

diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index 350f41b832..16b2a32469 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -25,6 +25,7 @@ config X86
select HAS_PASSTHROUGH
select HAS_PCI
select HAS_PCI_MSI
+   select HAS_PMAP
select HAS_SCHED_GRANULARITY
select HAS_UBSAN
select HAS_VPCI if HVM
diff --git a/xen/arch/x86/include/asm/fixmap.h 
b/xen/arch/x86/include/asm/fixmap.h
index 516ec3fa6c..a7ac365fc6 100644
--- a/xen/arch/x86/include/asm/fixmap.h
+++ b/xen/arch/x86/include/asm/fixmap.h
@@ -21,6 +21,8 @@
 
 #include 
 #include 
+#include 
+
 #include 
 #include 
 #include 
@@ -53,6 +55,8 @@ enum fixed_addresses {
 FIX_PV_CONSOLE,
 FIX_XEN_SHARED_INFO,
 #endif /* CONFIG_XEN_GUEST */
+FIX_PMAP_BEGIN,
+FIX_PMAP_END = FIX_PMAP_BEGIN + NUM_FIX_PMAP,
 /* Everything else should go further down. */
 FIX_APIC_BASE,
 FIX_IO_APIC_BASE_0,
diff --git a/xen/arch/x86/include/asm/pmap.h b/xen/arch/x86/include/asm/pmap.h
new file mode 100644
index 00..62746e191d
--- /dev/null
+++ b/xen/arch/x86/include/asm/pmap.h
@@ -0,0 +1,25 @@
+#ifndef __ASM_PMAP_H__
+#define __ASM_PMAP_H__
+
+#include 
+
+static inline void arch_pmap_map(unsigned int slot, mfn_t mfn)
+{
+unsigned long linear = (unsigned long)fix_to_virt(slot);
+l1_pgentry_t *pl1e = _fixmap[l1_table_offset(linear)];
+
+ASSERT(!(l1e_get_flags(*pl1e) & _PAGE_PRESENT));
+
+l1e_write_atomic(pl1e, l1e_from_mfn(mfn, PAGE_HYPERVISOR));
+}
+
+static inline void arch_pmap_unmap(unsigned int slot)
+{
+unsigned long linear = (unsigned long)fix_to_virt(slot);
+l1_pgentry_t *pl1e = _fixmap[l1_table_offset(linear)];
+
+l1e_write_atomic(pl1e, l1e_empty());
+flush_tlb_one_local(linear);
+}
+
+#endif /* __ASM_PMAP_H__ */
-- 
2.40.1




[PATCH v2] x86: Map/unmap pages in restore_all_guests

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Before, it assumed the pv cr3 could be accessed via a direct map. This
is no longer true.

Note that we do not map and unmap root_pgt for now since it is still a
xenheap page.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
* Rework the shadow perdomain mapping solution in the follow-up patches

Changes since Hongyan's version:
* Remove the final dot in the commit title

diff --git a/xen/arch/x86/include/asm/config.h 
b/xen/arch/x86/include/asm/config.h
index bbced338be..7cf1f33dc0 100644
--- a/xen/arch/x86/include/asm/config.h
+++ b/xen/arch/x86/include/asm/config.h
@@ -202,7 +202,7 @@ extern unsigned char boot_edid_info[128];
 /* Slot 260: per-domain mappings (including map cache). */
 #define PERDOMAIN_VIRT_START(PML4_ADDR(260))
 #define PERDOMAIN_SLOT_MBYTES   (PML4_ENTRY_BYTES >> (20 + PAGETABLE_ORDER))
-#define PERDOMAIN_SLOTS 3
+#define PERDOMAIN_SLOTS 4
 #define PERDOMAIN_VIRT_SLOT(s)  (PERDOMAIN_VIRT_START + (s) * \
  (PERDOMAIN_SLOT_MBYTES << 20))
 /* Slot 4: mirror of per-domain mappings (for compat xlat area accesses). */
@@ -316,6 +316,16 @@ extern unsigned long xen_phys_start;
 #define ARG_XLAT_START(v)\
 (ARG_XLAT_VIRT_START + ((v)->vcpu_id << ARG_XLAT_VA_SHIFT))
 
+/* root_pt shadow mapping area. The fourth per-domain-mapping sub-area */
+#define SHADOW_ROOT_PT_VIRT_START   PERDOMAIN_VIRT_SLOT(3)
+#define SHADOW_ROOT_PT_ENTRIES  MAX_VIRT_CPUS
+#define SHADOW_ROOT_PT_VIRT_END (SHADOW_ROOT_PT_VIRT_START +\
+ (SHADOW_ROOT_PT_ENTRIES * PAGE_SIZE))
+
+/* The address of a particular VCPU's ROOT_PT */
+#define SHADOW_ROOT_PT_VCPU_VIRT_START(v) \
+(SHADOW_ROOT_PT_VIRT_START + ((v)->vcpu_id * PAGE_SIZE))
+
 #define ELFSIZE 64
 
 #define ARCH_CRASH_SAVE_VMCOREINFO
diff --git a/xen/arch/x86/include/asm/domain.h 
b/xen/arch/x86/include/asm/domain.h
index 622d22bef2..4d97c68028 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -273,6 +273,7 @@ struct time_scale {
 struct pv_domain
 {
 l1_pgentry_t **gdt_ldt_l1tab;
+l1_pgentry_t **shadow_root_pt_l1tab;
 
 atomic_t nr_l4_pages;
 
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b56e0d8065..a72c32d87c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -505,6 +505,13 @@ void share_xen_page_with_guest(struct page_info *page, 
struct domain *d,
 spin_unlock(>page_alloc_lock);
 }
 
+#define shadow_root_pt_idx(v) \
+((v)->vcpu_id >> PAGETABLE_ORDER)
+
+#define pv_shadow_root_pt_pte(v) \
+((v)->domain->arch.pv.shadow_root_pt_l1tab[shadow_root_pt_idx(v)] + \
+ ((v)->vcpu_id & (L1_PAGETABLE_ENTRIES - 1)))
+
 void make_cr3(struct vcpu *v, mfn_t mfn)
 {
 struct domain *d = v->domain;
@@ -524,6 +531,13 @@ void write_ptbase(struct vcpu *v)
 
 if ( is_pv_vcpu(v) && v->domain->arch.pv.xpti )
 {
+mfn_t guest_root_pt = _mfn(v->arch.cr3 >> PAGE_SHIFT);
+l1_pgentry_t *pte = pv_shadow_root_pt_pte(v);
+
+ASSERT(v == current);
+
+l1e_write(pte, l1e_from_mfn(guest_root_pt, __PAGE_HYPERVISOR_RW));
+
 cpu_info->root_pgt_changed = true;
 cpu_info->pv_cr3 = __pa(this_cpu(root_pgt));
 if ( new_cr4 & X86_CR4_PCIDE )
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 2a445bb17b..fef9ae2352 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -288,6 +288,19 @@ static void pv_destroy_gdt_ldt_l1tab(struct vcpu *v)
   1U << GDT_LDT_VCPU_SHIFT);
 }
 
+static int pv_create_shadow_root_pt_l1tab(struct vcpu *v)
+{
+return create_perdomain_mapping(v->domain, 
SHADOW_ROOT_PT_VCPU_VIRT_START(v),
+1, v->domain->arch.pv.shadow_root_pt_l1tab,
+NULL);
+}
+
+static void pv_destroy_shadow_root_pt_l1tab(struct vcpu *v)
+
+{
+destroy_perdomain_mapping(v->domain, SHADOW_ROOT_PT_VCPU_VIRT_START(v), 1);
+}
+
 void pv_vcpu_destroy(struct vcpu *v)
 {
 if ( is_pv_32bit_vcpu(v) )
@@ -297,6 +310,7 @@ void pv_vcpu_destroy(struct vcpu *v)
 }
 
 pv_destroy_gdt_ldt_l1tab(v);
+pv_destroy_shadow_root_pt_l1tab(v);
 XFREE(v->arch.pv.trap_ctxt);
 }
 
@@ -311,6 +325,13 @@ int pv_vcpu_initialise(struct vcpu *v)
 if ( rc )
 return rc;
 
+if ( v->domain->arch.pv.xpti )
+{
+rc = pv_create_shadow_root_pt_l1tab(v);
+if ( rc )
+goto done;
+}
+
 BUILD_BUG_ON(X86_NR_VECTORS * sizeof(*v->arch.pv.trap_ctxt) >
  PAGE_SIZE);
 v->arch.pv.trap_ctxt = xzalloc_array(struct trap_info, X86_NR_VECTORS);
@@ -346,10 +367,12 @@ void pv_domain_destroy(struct domain *d)
 
 destroy_perdomain_mapping(d, GDT_LDT_VIRT_START,
   GDT_LDT_MBYTES << (20 - PAGE_SHIFT));
+destroy_perdomain_mapping(d, 

[PATCH v2] xen/x86: Add build assertion for fixmap entries

2024-01-16 Thread Elias El Yandouzi
The early fixed addresses must all fit into the static L1 table.
Introduce a build assertion to this end.

Signed-off-by: Elias El Yandouzi 



 Changes in v2:
 * New patch

diff --git a/xen/arch/x86/include/asm/fixmap.h 
b/xen/arch/x86/include/asm/fixmap.h
index a7ac365fc6..904bee0480 100644
--- a/xen/arch/x86/include/asm/fixmap.h
+++ b/xen/arch/x86/include/asm/fixmap.h
@@ -77,6 +77,11 @@ enum fixed_addresses {
 #define FIXADDR_SIZE  (__end_of_fixed_addresses << PAGE_SHIFT)
 #define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE)
 
+static inline void fixaddr_build_assertion(void)
+{
+BUILD_BUG_ON(FIX_PMAP_END > L1_PAGETABLE_ENTRIES - 1);
+}
+
 extern void __set_fixmap(
 enum fixed_addresses idx, unsigned long mfn, unsigned long flags);
 
-- 
2.40.1




[PATCH v2] x86/pv: Rewrite how building PV dom0 handles domheap mappings

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. Use the pages as they should be.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
* Clarify the commit message
* Break the patch in two parts

Changes since Hongyan's version:
* Rebase
* Remove spurious newline

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5659814e0c..dc5e9fe117 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -382,6 +382,10 @@ int __init dom0_construct_pv(struct domain *d,
 l3_pgentry_t *l3tab = NULL, *l3start = NULL;
 l2_pgentry_t *l2tab = NULL, *l2start = NULL;
 l1_pgentry_t *l1tab = NULL, *l1start = NULL;
+mfn_t l4start_mfn = INVALID_MFN;
+mfn_t l3start_mfn = INVALID_MFN;
+mfn_t l2start_mfn = INVALID_MFN;
+mfn_t l1start_mfn = INVALID_MFN;
 
 /*
  * This fully describes the memory layout of the initial domain. All
@@ -708,22 +712,32 @@ int __init dom0_construct_pv(struct domain *d,
 v->arch.pv.event_callback_cs= FLAT_COMPAT_KERNEL_CS;
 }
 
+#define UNMAP_MAP_AND_ADVANCE(mfn_var, virt_var, maddr) \
+do {\
+unmap_domain_page(virt_var);\
+mfn_var = maddr_to_mfn(maddr);  \
+maddr += PAGE_SIZE; \
+virt_var = map_domain_page(mfn_var);\
+} while ( false )
+
 if ( !compat )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l4_page_table;
-l4start = l4tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l4start_mfn, l4start, mpt_alloc);
+l4tab = l4start;
 clear_page(l4tab);
-init_xen_l4_slots(l4tab, _mfn(virt_to_mfn(l4start)),
-  d, INVALID_MFN, true);
-v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
+init_xen_l4_slots(l4tab, l4start_mfn, d, INVALID_MFN, true);
+v->arch.guest_table = pagetable_from_mfn(l4start_mfn);
 }
 else
 {
 /* Monitor table already created by switch_compat(). */
-l4start = l4tab = __va(pagetable_get_paddr(v->arch.guest_table));
+l4start_mfn = pagetable_get_mfn(v->arch.guest_table);
+l4start = l4tab = map_domain_page(l4start_mfn);
 /* See public/xen.h on why the following is needed. */
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l3_page_table;
 l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
 }
 
 l4tab += l4_table_offset(v_start);
@@ -733,14 +747,16 @@ int __init dom0_construct_pv(struct domain *d,
 if ( !((unsigned long)l1tab & (PAGE_SIZE-1)) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = PGT_l1_page_table;
-l1start = l1tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l1start_mfn, l1start, mpt_alloc);
+l1tab = l1start;
 clear_page(l1tab);
 if ( count == 0 )
 l1tab += l1_table_offset(v_start);
 if ( !((unsigned long)l2tab & (PAGE_SIZE-1)) )
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info = 
PGT_l2_page_table;
-l2start = l2tab = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l2start_mfn, l2start, mpt_alloc);
+l2tab = l2start;
 clear_page(l2tab);
 if ( count == 0 )
 l2tab += l2_table_offset(v_start);
@@ -750,19 +766,19 @@ int __init dom0_construct_pv(struct domain *d,
 {
 maddr_to_page(mpt_alloc)->u.inuse.type_info =
 PGT_l3_page_table;
-l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
+UNMAP_MAP_AND_ADVANCE(l3start_mfn, l3start, mpt_alloc);
 }
 l3tab = l3start;
 clear_page(l3tab);
 if ( count == 0 )
 l3tab += l3_table_offset(v_start);
-*l4tab = l4e_from_paddr(__pa(l3start), L4_PROT);
+*l4tab = l4e_from_mfn(l3start_mfn, L4_PROT);
 l4tab++;
 }
-*l3tab = l3e_from_paddr(__pa(l2start), L3_PROT);
+*l3tab = l3e_from_mfn(l2start_mfn, L3_PROT);
 l3tab++;
 }
-*l2tab = l2e_from_paddr(__pa(l1start), L2_PROT);
+*l2tab = l2e_from_mfn(l1start_mfn, L2_PROT);
 l2tab++;
 }
 if ( count < initrd_pfn || count >= initrd_pfn + PFN_UP(initrd_len) )
@@ -781,30 +797,34 @@ int __init dom0_construct_pv(struct domain *d,
 
 if ( compat )
 {
-

[PATCH v2] xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

At the moment the fixmap slots are prefixed differently between arm and
x86.

Some of them (e.g. the PMAP slots) are used in common code. So it would
be better if they are named the same way to avoid having to create
aliases.

I have decided to use the x86 naming because they are less change. So
all the Arm fixmap slots will now be prefixed with FIX rather than
FIXMAP.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 

Reviewed-by: Henry Wang 
Reviewed-by: Jan Beulich 
Reviewed-by: Stefano Stabellini 



Note that potentially more renaming that could be done to share
more code in future. I have decided to not do that to avoid going
down a rabbit hole.

diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
index 41d521f720..736cf09eca 100644
--- a/xen/arch/arm/acpi/lib.c
+++ b/xen/arch/arm/acpi/lib.c
@@ -40,10 +40,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 return NULL;
 
 offset = phys & (PAGE_SIZE - 1);
-base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN) + offset;
+base = FIXMAP_ADDR(FIX_ACPI_BEGIN) + offset;
 
 /* Check the fixmap is big enough to map the region */
-if ( (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - base) < size )
+if ( (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - base) < size )
 return NULL;
 
 /* With the fixmap, we can only map one region at the time */
@@ -54,7 +54,7 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 
 size += offset;
 mfn = maddr_to_mfn(phys);
-idx = FIXMAP_ACPI_BEGIN;
+idx = FIX_ACPI_BEGIN;
 
 do {
 set_fixmap(idx, mfn, PAGE_HYPERVISOR);
@@ -72,8 +72,8 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
 unsigned int idx;
 
 /* We are only handling fixmap address in the arch code */
-if ( (vaddr < FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) ||
- (vaddr >= (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)) )
+if ( (vaddr < FIXMAP_ADDR(FIX_ACPI_BEGIN)) ||
+ (vaddr >= (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE)) )
 return false;
 
 /*
@@ -81,16 +81,16 @@ bool __acpi_unmap_table(const void *ptr, unsigned long size)
  * for the ACPI fixmap region. The caller is expected to free with
  * the same address.
  */
-ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIXMAP_ACPI_BEGIN));
+ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIX_ACPI_BEGIN));
 
 /* The region allocated fit in the ACPI fixmap region. */
-ASSERT(size < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - vaddr));
+ASSERT(size < (FIXMAP_ADDR(FIX_ACPI_END) + PAGE_SIZE - vaddr));
 ASSERT(fixmap_inuse);
 
 fixmap_inuse = false;
 
-size += vaddr - FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
-idx = FIXMAP_ACPI_BEGIN;
+size += vaddr - FIXMAP_ADDR(FIX_ACPI_BEGIN);
+idx = FIX_ACPI_BEGIN;
 
 do
 {
diff --git a/xen/arch/arm/include/asm/early_printk.h 
b/xen/arch/arm/include/asm/early_printk.h
index c1e84f8b00..f444e89a86 100644
--- a/xen/arch/arm/include/asm/early_printk.h
+++ b/xen/arch/arm/include/asm/early_printk.h
@@ -17,7 +17,7 @@
 
 /* need to add the uart address offset in page to the fixmap address */
 #define EARLY_UART_VIRTUAL_ADDRESS \
-(FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
~PAGE_MASK))
+(FIXMAP_ADDR(FIX_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & ~PAGE_MASK))
 
 #define TEMPORARY_EARLY_UART_VIRTUAL_ADDRESS \
 (TEMPORARY_FIXMAP_ADDR(FIXMAP_CONSOLE) + (CONFIG_EARLY_UART_BASE_ADDRESS & 
~PAGE_MASK))
diff --git a/xen/arch/arm/include/asm/fixmap.h 
b/xen/arch/arm/include/asm/fixmap.h
index 734eb9b1d4..a823456ecb 100644
--- a/xen/arch/arm/include/asm/fixmap.h
+++ b/xen/arch/arm/include/asm/fixmap.h
@@ -8,17 +8,17 @@
 #include 
 
 /* Fixmap slots */
-#define FIXMAP_CONSOLE  0  /* The primary UART */
-#define FIXMAP_MISC 1  /* Ephemeral mappings of hardware */
-#define FIXMAP_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
-#define FIXMAP_ACPI_END(FIXMAP_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* 
End mappings of ACPI tables */
-#define FIXMAP_PMAP_BEGIN (FIXMAP_ACPI_END + 1) /* Start of PMAP */
-#define FIXMAP_PMAP_END (FIXMAP_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP 
*/
+#define FIX_CONSOLE  0  /* The primary UART */
+#define FIX_MISC 1  /* Ephemeral mappings of hardware */
+#define FIX_ACPI_BEGIN  2  /* Start mappings of ACPI tables */
+#define FIX_ACPI_END(FIX_ACPI_BEGIN + NUM_FIXMAP_ACPI_PAGES - 1)  /* End 
mappings of ACPI tables */
+#define FIX_PMAP_BEGIN (FIX_ACPI_END + 1) /* Start of PMAP */
+#define FIX_PMAP_END (FIX_PMAP_BEGIN + NUM_FIX_PMAP - 1) /* End of PMAP */
 
-#define FIXMAP_LAST FIXMAP_PMAP_END
+#define FIX_LAST FIX_PMAP_END
 
 #define FIXADDR_START FIXMAP_ADDR(0)
-#define FIXADDR_TOP FIXMAP_ADDR(FIXMAP_LAST)
+#define FIXADDR_TOP FIXMAP_ADDR(FIX_LAST)
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/arch/arm/mmu/setup.c b/xen/arch/arm/mmu/setup.c
index 72725840b6..57f1b46499 100644
--- a/xen/arch/arm/mmu/setup.c
+++ 

[PATCH v2] x86/pv: Domheap pages should be mapped while relocating initrd

2024-01-16 Thread Elias El Yandouzi
From: Wei Liu 

Xen shouldn't use domheap page as if they were xenheap pages. Map and
unmap pages accordingly.

Signed-off-by: Wei Liu 
Signed-off-by: Wei Wang 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in V2:
* Get rid of mfn_to_virt
* Don't open code copy_domain_page()

Changes since Hongyan's version:
* Add missing newline after the variable declaration

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 5bbed3a36a..5659814e0c 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -615,18 +615,25 @@ int __init dom0_construct_pv(struct domain *d,
 if ( d->arch.physaddr_bitsize &&
  ((mfn + count - 1) >> (d->arch.physaddr_bitsize - PAGE_SHIFT)) )
 {
+unsigned long nr_pages;
+
 order = get_order_from_pages(count);
 page = alloc_domheap_pages(d, order, MEMF_no_scrub);
 if ( !page )
 panic("Not enough RAM for domain 0 initrd\n");
+
+nr_pages = 1UL << order;
 for ( count = -count; order--; )
 if ( count & (1UL << order) )
 {
 free_domheap_pages(page, order);
 page += 1UL << order;
+nr_pages -= 1UL << order;
 }
-memcpy(page_to_virt(page), mfn_to_virt(initrd->mod_start),
-   initrd_len);
+
+for ( i = 0; i < nr_pages; i++ )
+copy_domain_page(page_to_mfn(page + i), _mfn(initrd_mfn + i));
+
 mpt_alloc = (paddr_t)initrd->mod_start << PAGE_SHIFT;
 init_domheap_pages(mpt_alloc,
mpt_alloc + PAGE_ALIGN(initrd_len));
-- 
2.40.1




[PATCH v2] xen/numa: vmap the pages for memnodemap

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

This avoids the assumption that there is a direct map and boot pages
fall inside the direct map.

Clean up the variables so that mfn actually stores a type-safe mfn.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



See the discussion in the next patch about using panic().

Changes in v2:
* vmap_contig_pages() was renamed to vmap_contig()
* Replace the BUG_ON() with a panic()

Changes compare to Hongyan's version:
* The function modified was moved to common code. So rebase it
* vmap_boot_pages() was renamed to vmap_contig_pages()

diff --git a/xen/common/numa.c b/xen/common/numa.c
index f454c4d894..ef13ec2255 100644
--- a/xen/common/numa.c
+++ b/xen/common/numa.c
@@ -424,13 +424,14 @@ static int __init populate_memnodemap(const struct node 
*nodes,
 static int __init allocate_cachealigned_memnodemap(void)
 {
 unsigned long size = PFN_UP(memnodemapsize * sizeof(*memnodemap));
-unsigned long mfn = mfn_x(alloc_boot_pages(size, 1));
+mfn_t mfn = alloc_boot_pages(size, 1);
 
-memnodemap = mfn_to_virt(mfn);
-mfn <<= PAGE_SHIFT;
+memnodemap = vmap_contig(mfn, size);
+if ( !memnodemap )
+panic("Unable to map the ACPI SLIT. Retry with numa=off");
 size <<= PAGE_SHIFT;
 printk(KERN_DEBUG "NUMA: Allocated memnodemap from %lx - %lx\n",
-   mfn, mfn + size);
+   mfn_to_maddr(mfn), mfn_to_maddr(mfn) + size);
 memnodemapsize = size / sizeof(*memnodemap);
 
 return 0;
-- 
2.40.1




[PATCH v2] x86/srat: vmap the pages for acpi_slit

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

This avoids the assumption that boot pages are in the direct map.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



There was a discussion with Jan regarding early failure vs
disable NUMA. I am strongly in favor of the latter because
it is more obvious that something went wrong.

From my understanding, Jan seems to be in favor of turning off NUMA
and then continue to boot. But then implied that a panic() would be
fine.

So I went with the panic() version. I am happy to rework it to another
approach if there is a consensus.

Changes in v2:
* vmap_contig_pages() was renamed to vmap_contig()
* Use a panic() rather than BUG_ON()

Changes since Hongyan's version:
* vmap_boot_pages() was renamed to vmap_contig_pages()

diff --git a/xen/arch/x86/srat.c b/xen/arch/x86/srat.c
index 3f70338e6e..688f410287 100644
--- a/xen/arch/x86/srat.c
+++ b/xen/arch/x86/srat.c
@@ -135,7 +135,9 @@ void __init acpi_numa_slit_init(struct acpi_table_slit 
*slit)
return;
}
mfn = alloc_boot_pages(PFN_UP(slit->header.length), 1);
-   acpi_slit = mfn_to_virt(mfn_x(mfn));
+   acpi_slit = vmap_contig(mfn, PFN_UP(slit->header.length));
+   if ( !acpi_slit )
+   panic("Unable to map the ACPI SLIT. Retry with numa=off");
memcpy(acpi_slit, slit, slit->header.length);
 }
 
-- 
2.40.1




[PATCH v2] x86/setup: Move vm_init() before acpi calls

2024-01-16 Thread Elias El Yandouzi
From: Wei Liu 

After the direct map removal, pages from the boot allocator are not
going to be mapped in the direct map. Although we have map_domain_page,
they are ephemeral and are less helpful for mappings that are more than a
page, so we want a mechanism to globally map a range of pages, which is
what vmap is for. Therefore, we bring vm_init into early boot stage.

To allow vmap to be initialised and used in early boot, we need to
modify vmap to receive pages from the boot allocator during early boot
stage.

Signed-off-by: Wei Liu 
Signed-off-by: David Woodhouse 
Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
- The return of map_pages_to_xen() is now checked in a separate
  patch
- Clarify the commit message
- Group the new boolean with the others

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 59dd9bb25a..7e28f62d09 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -748,6 +748,8 @@ void asmlinkage __init start_xen(unsigned long 
boot_phys_offset,
 
 setup_mm();
 
+vm_init();
+
 /* Parse the ACPI tables for possible boot-time configuration */
 acpi_boot_table_init();
 
@@ -759,8 +761,6 @@ void asmlinkage __init start_xen(unsigned long 
boot_phys_offset,
  */
 system_state = SYS_STATE_boot;
 
-vm_init();
-
 if ( acpi_disabled )
 {
 printk("Booting using Device Tree\n");
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 897b7e9208..4d0c90b7a0 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -989,6 +989,7 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 int i, j, e820_warn = 0, bytes = 0;
 unsigned long eb_start, eb_end;
 bool acpi_boot_table_init_done = false, relocated = false;
+bool vm_init_done = false;
 int ret;
 struct ns16550_defaults ns16550 = {
 .data_bits = 8,
@@ -1531,12 +1532,23 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 continue;
 
 if ( !acpi_boot_table_init_done &&
- s >= (1ULL << 32) &&
- !acpi_boot_table_init() )
+ s >= (1ULL << 32) )
 {
-acpi_boot_table_init_done = true;
-srat_parse_regions(s);
-setup_max_pdx(raw_max_page);
+/*
+ * We only initialise vmap and acpi after going through the bottom
+ * 4GiB, so that we have enough pages in the boot allocator.
+ */
+if ( !vm_init_done )
+{
+vm_init();
+vm_init_done = true;
+}
+if ( !acpi_boot_table_init() )
+{
+acpi_boot_table_init_done = true;
+srat_parse_regions(s);
+setup_max_pdx(raw_max_page);
+}
 }
 
 if ( pfn_to_pdx((e - 1) >> PAGE_SHIFT) >= max_pdx )
@@ -1722,6 +1734,9 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 
 init_frametable();
 
+if ( !vm_init_done )
+vm_init();
+
 if ( !acpi_boot_table_init_done )
 acpi_boot_table_init();
 
@@ -1761,12 +1776,6 @@ void asmlinkage __init noreturn __start_xen(unsigned 
long mbi_p)
 end_boot_allocator();
 
 system_state = SYS_STATE_boot;
-/*
- * No calls involving ACPI code should go between the setting of
- * SYS_STATE_boot and vm_init() (or else acpi_os_{,un}map_memory()
- * will break).
- */
-vm_init();
 
 bsp_stack = cpu_alloc_stack(0);
 if ( !bsp_stack )
diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 830f64c5ef..fc5c70da4d 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -34,10 +34,19 @@ void __init vm_init_type(enum vmap_region type, void 
*start, void *end)
 
 for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += 
PAGE_SIZE )
 {
-struct page_info *pg = alloc_domheap_page(NULL, 0);
+mfn_t mfn;
 int rc;
 
-rc = map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+if ( system_state == SYS_STATE_early_boot )
+mfn = alloc_boot_pages(1, 1);
+else
+{
+struct page_info *pg = alloc_domheap_page(NULL, 0);
+
+BUG_ON(!pg);
+mfn = page_to_mfn(pg);
+}
+rc = map_pages_to_xen(va, mfn, 1, PAGE_HYPERVISOR);
 BUG_ON(rc);
 
 clear_page((void *)va);
@@ -65,7 +74,7 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
 spin_lock(_lock);
 for ( ; ; )
 {
-struct page_info *pg;
+mfn_t mfn;
 
 ASSERT(vm_low[t] == vm_top[t] || !test_bit(vm_low[t], vm_bitmap(t)));
 for ( start = vm_low[t]; start < vm_top[t]; )
@@ -100,9 +109,16 @@ static void *vm_alloc(unsigned int nr, unsigned int align,
 if ( vm_top[t] >= vm_end[t] )
 return NULL;
 
-pg = alloc_domheap_page(NULL, 0);
- 

[PATCH v2] acpi: vmap pages in acpi_os_alloc_memory

2024-01-16 Thread Elias El Yandouzi
From: Hongyan Xia 

Also, introduce a wrapper around vmap that maps a contiguous range for
boot allocations. Unfortunately, the new helper cannot be a static inline
because the dependencies are a mess. We would need to re-include
asm/page.h (was removed in aa4b9d1ee653 "include: don't use asm/page.h
from common headers") and it doesn't look to be enough anymore
because bits from asm/cpufeature.h is used in the definition of PAGE_NX.

Lastly, with the move to vmap(), it is now easier to find the size
of the mapping. So pass the whole area to init_boot_pages() rather than
just the first page.

Signed-off-by: Hongyan Xia 
Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* Rename vmap_contig_pages() to vmap_contig()
* Rename nr_pages to nr to be consistent with vmap() parameters
* Pass the whole region to init_boot_pages()

Changes since Hongyan's version:
* Rename vmap_boot_pages() to vmap_contig_pages()
* Move the new helper in vmap.c to avoid compilation issue
* Don't use __pa() to translate the virtual address

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 171271fae3..966a7e763f 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -245,6 +245,11 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
 return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
+void *vmap_contig(mfn_t mfn, unsigned int nr)
+{
+return __vmap(, nr, 1, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
+}
+
 unsigned int vmap_size(const void *va)
 {
 unsigned int pages = vm_size(va, VMAP_DEFAULT);
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 389505f786..ab80d6b2a9 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -221,7 +221,11 @@ void *__init acpi_os_alloc_memory(size_t sz)
void *ptr;
 
if (system_state == SYS_STATE_early_boot)
-   return mfn_to_virt(mfn_x(alloc_boot_pages(PFN_UP(sz), 1)));
+   {
+   mfn_t mfn = alloc_boot_pages(PFN_UP(sz), 1);
+
+   return vmap_contig(mfn, PFN_UP(sz));
+   }
 
ptr = xmalloc_bytes(sz);
ASSERT(!ptr || is_xmalloc_memory(ptr));
@@ -246,5 +250,11 @@ void __init acpi_os_free_memory(void *ptr)
if (is_xmalloc_memory(ptr))
xfree(ptr);
else if (ptr && system_state == SYS_STATE_early_boot)
-   init_boot_pages(__pa(ptr), __pa(ptr) + PAGE_SIZE);
+   {
+   paddr_t addr = mfn_to_maddr(vmap_to_mfn(ptr));
+   unsigned int nr = vmap_size(ptr);
+
+   vunmap(ptr);
+   init_boot_pages(addr, addr + nr * PAGE_SIZE);
+   }
 }
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 24c85de490..0c16baa85f 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -15,6 +15,7 @@ void vm_init_type(enum vmap_region type, void *start, void 
*end);
 void *__vmap(const mfn_t *mfn, unsigned int granularity, unsigned int nr,
  unsigned int align, unsigned int flags, enum vmap_region type);
 void *vmap(const mfn_t *mfn, unsigned int nr);
+void *vmap_contig(mfn_t mfn, unsigned int nr);
 void vunmap(const void *va);
 
 void *vmalloc(size_t size);
-- 
2.40.1




[PATCH v2] xen/vmap: Check the page has been mapped in vm_init_type()

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

The function map_pages_to_xen() could fail if it can't allocate the
underlying page tables or (at least on Arm) if the area was already
mapped.

The first error is caught by clear_page() because it would fault.
However, the second error while very unlikely is not caught at all.

As this is boot code, use BUG_ON() to check if map_pages_to_xen() has
succeeded.

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
- New patch

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index 330e2ba897..830f64c5ef 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -35,8 +35,11 @@ void __init vm_init_type(enum vmap_region type, void *start, 
void *end)
 for ( i = 0, va = (unsigned long)vm_bitmap(type); i < nr; ++i, va += 
PAGE_SIZE )
 {
 struct page_info *pg = alloc_domheap_page(NULL, 0);
+int rc;
+
+rc = map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
+BUG_ON(rc);
 
-map_pages_to_xen(va, page_to_mfn(pg), 1, PAGE_HYPERVISOR);
 clear_page((void *)va);
 }
 bitmap_fill(vm_bitmap(type), vm_low[type]);
-- 
2.40.1




[PATCH v2] xen/vmap: Introduce vmap_size() and use it

2024-01-16 Thread Elias El Yandouzi
From: Julien Grall 

vunmap() and vfree() currently duplicate the (small) logic to find the
size of an vmap area. In a follow-up patch, we will want to introduce
another one (this time externally).

So introduce a new helper vmap_size() that will return the number of
pages in the area starting at the given address. Take the opportunity
to replace the open-coded version.

Note that vfree() was storing the type of the area in a local variable.
But this seems to have never been used (even when it was introduced).

Signed-off-by: Julien Grall 
Signed-off-by: Elias El Yandouzi 



Changes in v2:
* Patch added

diff --git a/xen/common/vmap.c b/xen/common/vmap.c
index fc5c70da4d..171271fae3 100644
--- a/xen/common/vmap.c
+++ b/xen/common/vmap.c
@@ -245,14 +245,21 @@ void *vmap(const mfn_t *mfn, unsigned int nr)
 return __vmap(mfn, 1, nr, 1, PAGE_HYPERVISOR, VMAP_DEFAULT);
 }
 
-void vunmap(const void *va)
+unsigned int vmap_size(const void *va)
 {
-unsigned long addr = (unsigned long)va;
 unsigned int pages = vm_size(va, VMAP_DEFAULT);
 
 if ( !pages )
 pages = vm_size(va, VMAP_XEN);
 
+return pages;
+}
+
+void vunmap(const void *va)
+{
+unsigned long addr = (unsigned long)va;
+unsigned pages = vmap_size(va);
+
 #ifndef _PAGE_NONE
 destroy_xen_mappings(addr, addr + PAGE_SIZE * pages);
 #else /* Avoid tearing down intermediate page tables. */
@@ -328,17 +335,11 @@ void vfree(void *va)
 unsigned int i, pages;
 struct page_info *pg;
 PAGE_LIST_HEAD(pg_list);
-enum vmap_region type = VMAP_DEFAULT;
 
 if ( !va )
 return;
 
-pages = vm_size(va, type);
-if ( !pages )
-{
-type = VMAP_XEN;
-pages = vm_size(va, type);
-}
+pages = vmap_size(va);
 ASSERT(pages);
 
 for ( i = 0; i < pages; i++ )
diff --git a/xen/include/xen/vmap.h b/xen/include/xen/vmap.h
index 2b7369e062..24c85de490 100644
--- a/xen/include/xen/vmap.h
+++ b/xen/include/xen/vmap.h
@@ -25,6 +25,9 @@ void vfree(void *va);
 
 void __iomem *ioremap(paddr_t pa, size_t len);
 
+/* Return the number of pages in the mapping starting at address 'va' */
+unsigned int vmap_size(const void *va);
+
 static inline void iounmap(void __iomem *va)
 {
 unsigned long addr = (unsigned long)(void __force *)va;
-- 
2.40.1




[PATCH v2] Remove the directmap

2024-01-16 Thread Elias El Yandouzi
Hi all,

A few years ago, Wei Liu implemented a PoC to remove the directmap
from Xen. The last version was sent by Hongyan Xia [1].

I will start with thanking both Wei and Hongyan for the initial work
to upstream the feature. A lot of patches already went in and this is
the last few patches missing to effectively enable the feature.

=== What is the directmap? ===

At the moment, on both arm64 and x86, most of the RAM is mapped
in Xen address space. This means that domain memory is easily
accessible in Xen.

=== Why do we want to remove the directmap? ===

(Summarizing my understanding of the previous discussion)

Speculation attacks (like Spectre SP1) rely on loading piece of memory
in the cache. If the region is not mapped then it can't be loaded.

So removing reducing the amount of memory mapped in Xen will also
reduce the surface attack.

=== What's the performance impact? ===

As the guest memory is not always mapped, then the cost of mapping
will increase. I haven't done the numbers with this new version, but
some measurement were provided in the previous version for x86.

=== Improvement possible ===

The known area to improve on x86 are:
   * Mapcache: There was a patch sent by Hongyan:
 
https://lore.kernel.org/xen-devel/4058e92ce21627731c49b588a95809dc0affd83a.1581015491.git.hongy...@amazon.com/
   * EPT: At the moment an guest page-tabel walk requires about 20 map/unmap.
 This will have an very high impact on the performance. We need to decide
 whether keep the EPT always mapped is a problem

The original series didn't have support for Arm64. But as there were
some interest, I have provided a PoC.

There are more extra work for Arm64:
   * The mapcache is quite simple. We would investigate the performance
   * The mapcache should be made compliant to the Arm Arm (this is now
 more critical).
   * We will likely have the same problem as for the EPT.
   * We have no support for merging table to a superpage, neither
 free empty page-tables. (See more below)

=== Implementation ===

The subject is probably a misnomer. The directmap is still present but
the RAM is not mapped by default. Instead, the region will still be used
to map pages allocate via alloc_xenheap_pages().

The advantage is the solution is simple (so IHMO good enough for been
merged as a tech preview). The disadvantage is the page allocator is not
trying to keep all the xenheap pages together. So we may end up to have
an increase of page table usage.

In the longer term, we should consider to remove the direct map
completely and switch to vmap(). The main problem with this approach
is it is frequent to use mfn_to_virt() in the code. So we would need
to cache the mapping (maybe in the struct page_info).

=== Why arm32 is not covered? ===

On Arm32, the domheap and xenheap is always separated. So by design
the guest memory is not mapped by default.

At this stage, it seems unnecessary to have to map/unmap xenheap pages
every time they are allocated.

=== Why not using a separate domheap and xenheap? ===

While a separate xenheap/domheap reduce the page-table usage (all
xenheap pages are contiguous and could be always mapped), it is also
currently less scalable because the split is fixed at boot time (XXX:
Can this be dynamic?).

=== Future of secret-free hypervisor ===

There are some information in an e-mail from Andrew a few years ago:

https://lore.kernel.org/xen-devel/e3219697-0759-39fc-2486-715cdec1c...@citrix.com/

Cheers,

[1] https://lore.kernel.org/xen-devel/cover.1588278317.git.hongy...@amazon.com/

*** BLURB HERE ***

Elias El Yandouzi (3):
  xen/x86: Add build assertion for fixmap entries
  Rename mfn_to_virt() calls
  Rename maddr_to_virt() calls

Hongyan Xia (13):
  acpi: vmap pages in acpi_os_alloc_memory
  xen/numa: vmap the pages for memnodemap
  x86/srat: vmap the pages for acpi_slit
  x86: Map/unmap pages in restore_all_guests
  x86/pv: Rewrite how building PV dom0 handles domheap mappings
  x86/pv: Map L4 page table for shim domain
  x86/mapcache: Initialise the mapcache for the idle domain
  x86: Add a boot option to enable and disable the direct map
  x86/domain_page: Remove the fast paths when mfn is not in the
directmap
  xen/page_alloc: Add a path for xenheap when there is no direct map
  x86/setup: Leave early boot slightly earlier
  x86/setup: vmap heap nodes when they are outside the direct map
  x86/setup: Do not create valid mappings when directmap=no

Julien Grall (8):
  xen/vmap: Check the page has been mapped in vm_init_type()
  xen/vmap: Introduce vmap_size() and use it
  xen/arm: fixmap: Rename the fixmap slots to follow the x86 convention
  xen/x86: Add support for the PMAP
  xen/arm32: mm: Rename 'first' to 'root' in init_secondary_pagetables()
  xen/arm64: mm: Use per-pCPU page-tables
  xen/arm64: Implement a mapcache for arm64
  xen/arm64: Allow the admin to enable/disable the directmap

Wei Liu (3):
  x86/setup: Move vm_init() before acpi calls
  x86/pv: Domheap pages 

Re: Governance change proposal on small updates

2024-01-16 Thread Kelly Choi
Hi all,

I have not had any further feedback opposing this proposal and will go
ahead with the changes.

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Thu, Dec 21, 2023 at 1:00 PM Kelly Choi  wrote:

> Hi all,
>
> I am proposing a small change in how we update non-trivial matters in our
> governance.
>
> Currently, any governance change requires a formal vote. However, there
> will be times when this is unnecessary and would hinder progress in the
> project. For example, my previous email proposal on changes to clarity and
> inclusivity language. As it stands, I have not received any pushback or
> feedback from the community.
>
> To help us progress faster, I would suggest the following:
> -* Small changes will still need to be proposed to xen-devel and the
> community*
> *- The community is welcome to give as much feedback as necessary before
> any changes are made*
> *- Proposals can be changed/updated as needed, then resubmitted to the
> community *
> *- Anyone can object to these changes or call a vote within 30 days of the
> proposal if deemed necessary*
> *- A committer must ack the change for it to go ahead*
> *- If the community manager does not hear any feedback within 30 days, the
> changes can be acked by a committer and put into the governance*
> *- All important matters and policy changes to the community will still go
> through a formal voting process. This change only applies to small matters
> within the governance. *
>
> Examples:
> *- Wording or spelling changes*
> *- Updating small sentences or clarity changes*
> *- Adding examples to existing code of conduct policies*
>
> I welcome your thoughts on the above proposal.
> Please reply by 14th January 2024 should you have any objections to this.
> If by lazy consensus I do not hear back from this date, I will assume I
> have your agreement on this.
>
> Many thanks,
> Kelly Choi
>
> Community Manager
> Xen Project
>
>
> On Fri, Nov 24, 2023 at 10:57 AM Kelly Choi  wrote:
>
>> Hi all,
>>
>> Please see an updated Governance PR on GitLab here:
>> https://gitlab.com/xen-project/governance/governance/-/merge_requests/1
>>
>> Comments:
>>
>> Revise code of conduct for enhanced clarity, inclusivity, and
>> accountability
>>
>> In response to valuable feedback from community members and in alignment
>> with our ongoing commitment to creating a safe and welcoming space for
>> collaboration, this commit refines the code of conduct. The changes focus
>> on:
>>
>>- *Clarity:* Rewording sections to eliminate ambiguity and ensure
>>that expectations are clearly communicated.
>>- *Inclusivity:* Adding language to emphasize our dedication to
>>diversity and inclusion, and providing examples to illustrate the types of
>>behavior we encourage.
>>
>> These updates aim to foster a more positive and collaborative atmosphere
>> within our community. Please review the changes and don't hesitate to
>> provide further input or suggestions.
>>
>> Note that the patches should be read as a whole; I'm still learning git
>> and using the gitlab UI, which doesn't have a way to do history editing.
>> Many thanks,
>> Kelly Choi
>>
>> Open Source Community Manager
>> XenServer, Cloud Software Group
>>
>


Re: [PATCH] x86/PV: use altcall for I/O emulation quirk hook

2024-01-16 Thread Andrew Cooper
On 16/01/2024 4:58 pm, Jan Beulich wrote:
> This way we can arrange for ioemul_handle_proliant_quirk()'s ENDBR to
> also be zapped. Utilize existing data rather than introducing another
> otherwise unused static variable (array); eventually (if any new quirk
> was in need of adding) we may want to use .callback and .driver_data
> anyway.
>
> For the decision to be taken before the 2nd alternative patching pass,
> the initcall needs to become a pre-SMP one.
>
> While touching this code, also arrange for it to not be built at all
> when !PV - that way the respective ENDBR won't be there from the
> beginning.
>
> Signed-off-by: Jan Beulich 
> ---
> Obviously the file may want moving to pv/ then. I wasn't sure whether
> to also fold doing so right into here.

For PVH dom0, we allow almost blanket IO port access.  We could do the
same for PV dom0 by setting up a suitable TSS IO port bitmap.

That said, x86-S is soon to revoke the ability to do that, so maybe we
just save ourselves the work...


I'm confused about "rather than introducing another otherwise unused
static variable (array)".  Why an array?

In this instance, you could use the same trick as the ctxt switch mask. 
Whether we match DMI or not, it's safe to clobber the ENDBR.  We could
also consider a __{read_mostly,ro_after_init}_cf_clobber sections.


However, it's probably better still to have a `bool prolient_quirk` and
a direct call.  No extra vendor hooks have been added since this was
introduced in 2007, and I really don't foresee this changing in the near
future.  Lets just simplify it and drop all the alternatives/clobbering
games entirely.

~Andrew



[libvirt test] 184367: tolerable all pass - PUSHED

2024-01-16 Thread osstest service owner
flight 184367 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184367/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184337
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184337
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184337
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  3a4548d74b2cb351c74cab472ac654f54708
baseline version:
 libvirt  10042f0253b98c8ce4626b271a00339e9128eda1

Last test of basis   184337  2024-01-13 04:21:08 Z3 days
Testing same since   184367  2024-01-16 04:20:44 Z0 days1 attempts


People who touched revisions under test:
  Andrea Bolognani 
  Göran Uddeborg 
  Jiri Denemark 
  Jonathon Jongsma 
  Laine Stump 
  Michal Privoznik 
  Peter Krempa 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at

Re: [PATCH v3 21/34] xen/riscv: introduce p2m.h

2024-01-16 Thread Julien Grall

Hi Oleksii,

On 16/01/2024 09:44, Oleksii wrote:

On Mon, 2024-01-15 at 12:01 +0100, Jan Beulich wrote:

On 15.01.2024 11:35, Oleksii wrote:

Hi Julien,

On Fri, 2024-01-12 at 10:39 +, Julien Grall wrote:

Hi Oleksii,

On 22/12/2023 15:13, Oleksii Kurochko wrote:

Signed-off-by: Oleksii Kurochko 
---
Changes in V3:
   - add SPDX
   - drop unneeded for now p2m types.
   - return false in all functions implemented with BUG()
inside.
   - update the commit message
---
Changes in V2:
   - Nothing changed. Only rebase.
---
   xen/arch/ppc/include/asm/p2m.h   |   3 +-
   xen/arch/riscv/include/asm/p2m.h | 102
+++
   2 files changed, 103 insertions(+), 2 deletions(-)
   create mode 100644 xen/arch/riscv/include/asm/p2m.h

diff --git a/xen/arch/ppc/include/asm/p2m.h
b/xen/arch/ppc/include/asm/p2m.h
index 25ba054668..3bc05b7c05 100644
--- a/xen/arch/ppc/include/asm/p2m.h
+++ b/xen/arch/ppc/include/asm/p2m.h
@@ -50,8 +50,7 @@ static inline void memory_type_changed(struct
domain *d)
   static inline int
guest_physmap_mark_populate_on_demand(struct
domain *d, unsigned long gfn,
  
unsigned

int order)
   {
-    BUG_ON("unimplemented");
-    return 1;
+    return -EOPNOTSUPP;
   }
   
   static inline int guest_physmap_add_entry(struct domain *d,

diff --git a/xen/arch/riscv/include/asm/p2m.h
b/xen/arch/riscv/include/asm/p2m.h
new file mode 100644
index 00..d270ef6635
--- /dev/null
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -0,0 +1,102 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef __ASM_RISCV_P2M_H__
+#define __ASM_RISCV_P2M_H__
+
+#include 
+
+#define paddr_bits PADDR_BITS
+
+/*
+ * List of possible type for each page in the p2m entry.
+ * The number of available bit per page in the pte for this
purpose is 4 bits.
+ * So it's possible to only have 16 fields. If we run out of
value
in the
+ * future, it's possible to use higher value for pseudo-type
and
don't store
+ * them in the p2m entry.
+ */


This looks like a verbatim copy from Arm. Did you actually check
RISC-V
has 4 bits available in the PTE to store this value?

Thanks for noticing that, in RISC-V it is available only 2 bits (
bits
8 and 9), so I'll update the comment:
53   10 9    8 7 6 5 4 3 2 1 0
  Physical Page Number RSV  D A G U X W R V


It's RSW (Reserved for Supervisor softWare use), not RSV, which is
pretty
important in this context.

Yes, you are right it is RSW. Thanks for the correction.




It seems that I missed something in the Arm code/architecture.As
far as I recall, in Arm, bits 5-8 are ignored by the MMU, and they
are expected
to be used by the hypervisor for its purpose.
However, in the code, I notice that these bits are utilized for
storing
a reference counter.


Why "however"? Hardware still is going to ignore these bits.

Sure, these bits are ignored by hardware. What I meant is that,
according to the code, these bits are used for storing a reference
counter, not p2m_type_t. I guess I am missing something...


I can only guess where you saw the field used for reference counting. 
This was the domain map page infrastruture, right?


If so, this is for stage-1 page-table (aka hypervisor table) and not the 
stage-2 (e.g. P2M). For the latter, we would use the p2m_type_t.


Cheers,

--
Julien Grall



Re: [PATCH] x86/HPET: avoid an indirect call

2024-01-16 Thread Andrew Cooper
On 16/01/2024 4:56 pm, Jan Beulich wrote:
> When this code was written, indirect branches still weren't considered
> much of a problem (besides being a little slower). Instead of a function
> pointer, pass a boolean to _disable_pit_irq(), thus allowing to
> eliminate two ENDBR (one of them in .text).
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 



[PATCH] x86/PV: use altcall for I/O emulation quirk hook

2024-01-16 Thread Jan Beulich
This way we can arrange for ioemul_handle_proliant_quirk()'s ENDBR to
also be zapped. Utilize existing data rather than introducing another
otherwise unused static variable (array); eventually (if any new quirk
was in need of adding) we may want to use .callback and .driver_data
anyway.

For the decision to be taken before the 2nd alternative patching pass,
the initcall needs to become a pre-SMP one.

While touching this code, also arrange for it to not be built at all
when !PV - that way the respective ENDBR won't be there from the
beginning.

Signed-off-by: Jan Beulich 
---
Obviously the file may want moving to pv/ then. I wasn't sure whether
to also fold doing so right into here.

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -45,7 +45,7 @@ obj-$(CONFIG_LIVEPATCH) += alternative.o
 obj-y += msi.o
 obj-y += msr.o
 obj-$(CONFIG_INDIRECT_THUNK) += indirect-thunk.o
-obj-y += ioport_emulate.o
+obj-$(CONFIG_PV) += ioport_emulate.o
 obj-y += irq.o
 obj-$(CONFIG_KEXEC) += machine_kexec.o
 obj-y += mm.o x86_64/mm.o
--- a/xen/arch/x86/ioport_emulate.c
+++ b/xen/arch/x86/ioport_emulate.c
@@ -36,7 +36,7 @@ static unsigned int cf_check ioemul_hand
 }
 
 /* This table is the set of system-specific I/O emulation hooks. */
-static const struct dmi_system_id __initconstrel ioport_quirks_tbl[] = {
+static const struct dmi_system_id __initconst_cf_clobber ioport_quirks_tbl[] = 
{
 /*
  * I/O emulation hook for certain HP ProLiant servers with
  * 'special' SMM goodness.
@@ -46,6 +46,8 @@ static const struct dmi_system_id __init
 DMI_MATCH2(
 DMI_MATCH(DMI_BIOS_VENDOR, "HP"),
 DMI_MATCH(DMI_PRODUCT_NAME, "ProLiant DL3")),
+/* Need in one entry only as long as .callback isn't also used. */
+.driver_data = ioemul_handle_proliant_quirk,
 },
 {
 .ident = "HP ProLiant DL5xx",
@@ -99,7 +101,7 @@ static int __init cf_check ioport_quirks
 
 return 0;
 }
-__initcall(ioport_quirks_init);
+presmp_initcall(ioport_quirks_init);
 
 /*
  * Local variables:
--- a/xen/arch/x86/pv/emul-priv-op.c
+++ b/xen/arch/x86/pv/emul-priv-op.c
@@ -112,7 +112,8 @@ static io_emul_stub_t *io_emul_stub_setu
 /* Some platforms might need to quirk the stub for specific inputs. */
 if ( unlikely(ioemul_handle_quirk) )
 {
-quirk_bytes = ioemul_handle_quirk(opcode, p, ctxt->ctxt.regs);
+quirk_bytes = alternative_call(ioemul_handle_quirk, opcode, p,
+   ctxt->ctxt.regs);
 p += quirk_bytes;
 }
 



[PATCH] x86/HPET: avoid an indirect call

2024-01-16 Thread Jan Beulich
When this code was written, indirect branches still weren't considered
much of a problem (besides being a little slower). Instead of a function
pointer, pass a boolean to _disable_pit_irq(), thus allowing to
eliminate two ENDBR (one of them in .text).

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -563,7 +563,7 @@ static void cf_check handle_rtc_once(uin
 }
 }
 
-void __init cf_check hpet_broadcast_init(void)
+void __init hpet_broadcast_init(void)
 {
 u64 hpet_rate = hpet_setup();
 u32 hpet_id, cfg;
@@ -634,7 +634,7 @@ void __init cf_check hpet_broadcast_init
 hpet_events->flags = HPET_EVT_LEGACY;
 }
 
-void cf_check hpet_broadcast_resume(void)
+void hpet_broadcast_resume(void)
 {
 u32 cfg;
 unsigned int i, n;
--- a/xen/arch/x86/include/asm/hpet.h
+++ b/xen/arch/x86/include/asm/hpet.h
@@ -89,8 +89,8 @@ void hpet_disable_legacy_replacement_mod
  * Temporarily use an HPET event counter for timer interrupt handling,
  * rather than using the LAPIC timer. Used for Cx state entry.
  */
-void cf_check hpet_broadcast_init(void);
-void cf_check hpet_broadcast_resume(void);
+void hpet_broadcast_init(void);
+void hpet_broadcast_resume(void);
 void cf_check hpet_broadcast_enter(void);
 void cf_check hpet_broadcast_exit(void);
 int hpet_broadcast_is_available(void);
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -2427,7 +2427,7 @@ void __init early_time_init(void)
 }
 
 /* keep pit enabled for pit_broadcast working while cpuidle enabled */
-static int _disable_pit_irq(void(*hpet_broadcast_setup)(void))
+static int _disable_pit_irq(bool init)
 {
 int ret = 1;
 
@@ -2442,13 +2442,13 @@ static int _disable_pit_irq(void(*hpet_b
  */
 if ( cpuidle_using_deep_cstate() && !boot_cpu_has(X86_FEATURE_ARAT) )
 {
-hpet_broadcast_setup();
+init ? hpet_broadcast_init() : hpet_broadcast_resume();
 if ( !hpet_broadcast_is_available() )
 {
 if ( xen_cpuidle > 0 )
 {
-printk("%ps() failed, turning to PIT broadcast\n",
-   hpet_broadcast_setup);
+printk("hpet_broadcast_%s() failed, turning to PIT 
broadcast\n",
+   init ? "init" : "resume");
 return -1;
 }
 ret = 0;
@@ -2465,7 +2465,7 @@ static int _disable_pit_irq(void(*hpet_b
 
 static int __init cf_check disable_pit_irq(void)
 {
-if ( !_disable_pit_irq(hpet_broadcast_init) )
+if ( !_disable_pit_irq(true) )
 {
 xen_cpuidle = 0;
 printk("CPUIDLE: disabled due to no HPET. "
@@ -2526,7 +2526,7 @@ int time_resume(void)
 
 resume_platform_timer();
 
-if ( !_disable_pit_irq(hpet_broadcast_resume) )
+if ( !_disable_pit_irq(false) )
 BUG();
 
 init_percpu_time();



[PATCH] x86: arrange for ENDBR zapping from _ctxt_switch_masking()

2024-01-16 Thread Jan Beulich
While altcall is already used for them, the functions want announcing in
.init.rodata.cf_clobber, even if the resulting static variables aren't
otherwise used.

While doing this also move ctxt_switch_masking to .data.ro_after_init.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/cpu/amd.c
+++ b/xen/arch/x86/cpu/amd.c
@@ -258,6 +258,11 @@ static void cf_check amd_ctxt_switch_mas
 #undef LAZY
 }
 
+#ifdef CONFIG_XEN_IBT /* Announce the function to ENDBR clobbering logic. */
+static const typeof(ctxt_switch_masking) __initconst_cf_clobber __used csm =
+amd_ctxt_switch_masking;
+#endif
+
 /*
  * Mask the features and extended features returned by CPUID.  Parameters are
  * set from the boot line via two methods:
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -119,7 +119,7 @@ static const struct cpu_dev __initconst_
 static const struct cpu_dev *this_cpu = _cpu;
 
 static DEFINE_PER_CPU(uint64_t, msr_misc_features);
-void (* __read_mostly ctxt_switch_masking)(const struct vcpu *next);
+void (* __ro_after_init ctxt_switch_masking)(const struct vcpu *next);
 
 bool __init probe_cpuid_faulting(void)
 {
--- a/xen/arch/x86/cpu/intel.c
+++ b/xen/arch/x86/cpu/intel.c
@@ -220,6 +220,11 @@ static void cf_check intel_ctxt_switch_m
 #undef LAZY
 }
 
+#ifdef CONFIG_XEN_IBT /* Announce the function to ENDBR clobbering logic. */
+static const typeof(ctxt_switch_masking) __initconst_cf_clobber __used csm =
+intel_ctxt_switch_masking;
+#endif
+
 /*
  * opt_cpuid_mask_ecx/edx: cpuid.1[ecx, edx] feature mask.
  * For example, E8400[Intel Core 2 Duo Processor series] ecx = 0x0008E3FD,



Re: Xen 4.19 release management plan

2024-01-16 Thread Jan Beulich
(reducing Cc list)

On 16.01.2024 17:32, Oleksii wrote:
> Please reply with items you would like to see in 4.19 so that people
> know what is happening and prioritize accordingly.
> You're welcome to provide a description and use cases of the feature
> you're working on.

The "annotate entry points with type and size" series including as many
as possible follow-ups on the x86 and Arm side, ideally bringing both
architectures fully in shape for the new model.

On x86,
- among smaller scope ISA extension work we probably want to make
  sure AVX10.1 is going to be usable by guests (patches already posted),
- "x86: memcpy() / memset() (non-)ERMS flavors plus fallout"

There's likely more, but let's go with this for now.

Jan



Xen 4.19 release management plan

2024-01-16 Thread Oleksii
Hello everyone,

I would like to start tracking which features and changes are expected
in Xen 4.19 for each architecture.

Please reply with items you would like to see in 4.19 so that people
know what is happening and prioritize accordingly.
You're welcome to provide a description and use cases of the feature
you're working on.

On my side:

 RISC-V:
  - full Xen build
  - Dom0 boot ( let me be optimistic. Probably, I have to divide this 
into small parts. and track them separately ).

 x86:
  - Finish and start sending AMD SEV patches ( note: this is being
developed by my colleagues at Vates, not me ).
We are currently on phase one:
https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg156388.html
.

Thanks in advance for your cooperation.

I hope you have a nice week.

Best regards,
  Oleksii




Re: Xen Project Annual Survey

2024-01-16 Thread Kelly Choi
Hi all,

A reminder to fill out the Xen Project Annual survey!

Many thanks,
Kelly Choi

Community Manager
Xen Project


On Tue, Jan 9, 2024 at 1:21 PM Kelly Choi  wrote:

> Happy New Year Xen Community,
>
> As we start the New Year, I'd like to ask you to reflect on how the
> project went in 2023. This will help us track the health of the community
> and also give you a chance to express your ideas and feedback.
>
> The survey can be answered anonymously and should take less than 10
> minutes.
>
> *Link: 
> https://cryptpad.fr/form/#/2/form/view/uG22fJfe8UILyP9+jJ-YesXsINKMZRpuWh2c58bhBYI/
> 
>  *
> *Deadline: 31st January 2024. *
>
> Many thanks,
> Kelly Choi
>
> Community Manager
> Xen Project
>


Re: [BUG]i2c_hid_acpi broken with 4.17.2 on Framework Laptop 13 AMD

2024-01-16 Thread Jan Beulich
On 16.01.2024 16:52, Sébastien Chaumat wrote:
> Le mar. 2 janv. 2024 à 21:23, Sébastien Chaumat  a
> écrit :
> 
>>
>>  output of gpioinfo
>>>
>>> kernel alone :
>>>
>>> line   5: unnamed input active-low consumer=interrupt
>>> line  84: unnamed input active-low consumer=interrupt
>>>
>>> xen:
>>>
>>> line   5: unnamed input active-low
>>> line  84: unnamed input active-low
>>>
>>> xen with skipping IRQ7 double init :
>>>
>>> line   5: unnamed input active-low consumer=interrupt
>>> line  84: unnamed input active-low
>>>
>>>
>>> So definitely progressing.
>>>
>>
>> Checking /sys/kernel/irq/7
>>
>> kernel alone :
>>  actions: pinctrl_amd
>>  chip_name: IR-IO-APIC
>>  hwirq: 7
>>  name: fasteoi
>>  per_cpu_count: 0,0,0,0,0,20,0,0,0,0,0,0,0,0,0,0
>>  type: level
>>  wakeup: enabled
>>
>> xen skipping IRQ7 double init :
>>
>> actions: pinctrl_amd
>>  chip_name: xen-pirq
>>  hwirq:
>>  name: ioapic-level
>>  per_cpu_count: 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
>>  type: edge
>>  wakeup: disabled
>>
>> So the skip of IRQ7 in pci_xen_initial_domain() sets the correct handler
>>  (IIUC xen uses the ioapic-level and handles the eoi separately), but not
>> the correct type (still edge).
>> I guess this may explains the results above.
>>
>>
>  Mario (in CC) patched the pinctrl_amd to flush pending interrupt before
> starting the driver for the GPIO.
> 
> This helped in  the sense of there's no more pending interrupt on IRQ7
> (whatever the handler is, level or edge) but then the touchpad is not
> detected by i2c-hid.
> 
> Is there any work in progress related to the incorrect IRQ configuration ?

I'm not aware of any. As per my recollection it's still not entirely
clear where in the kernel things go astray. And to be honest I don't
feel comfortable trying to half-blindly address this, e.g. by trying
to circumvent / defer the early setting up of the low 16 IRQs.

Jan



Re: [PATCH v3 14/34] xen/riscv: introduce io.h

2024-01-16 Thread Jan Beulich
On 16.01.2024 16:20, Oleksii wrote:
> On Mon, 2024-01-15 at 17:57 +0100, Jan Beulich wrote:
>> On 22.12.2023 16:12, Oleksii Kurochko wrote:
>>> +/*
>>> + * Unordered I/O memory access primitives.  These are even more
>>> relaxed than
>>> + * the relaxed versions, as they don't even order accesses between
>>> successive
>>> + * operations to the I/O regions.
>>> + */
>>> +#define readb_cpu(c)   ({ u8  __r = __raw_readb(c); __r;
>>> })
>>> +#define readw_cpu(c)   ({ u16 __r = le16_to_cpu((__force
>>> __le16)__raw_readw(c)); __r; })
>>> +#define readl_cpu(c)   ({ u32 __r = le32_to_cpu((__force
>>> __le32)__raw_readl(c)); __r; })
>>> +
>>> +#define
>>> writeb_cpu(v,c) ((void)__raw_writeb((v),(c)))
>>> +#define
>>> writew_cpu(v,c) ((void)__raw_writew((__force 
>>> u16)cpu_to_le16(v),(c)))
>>> +#define
>>> writel_cpu(v,c) ((void)__raw_writel((__force 
>>> u32)cpu_to_le32(v),(c)))
>>> +
>>> +#ifdef CONFIG_64BIT
>>> +#define readq_cpu(c)   ({ u64 __r = le64_to_cpu((__force
>>> __le64)__raw_readq(c)); __r; })
>>> +#define
>>> writeq_cpu(v,c) ((void)__raw_writeq((__force 
>>> u64)cpu_to_le64(v),(c)))
>>> +#endif
>>
>> How come there are endianness assumptions here on the MMIO accessed?
> It is a hard story.
> 
> As you might expect it was copy from Linux Kernel where it was decided
> to follow only LE way:
> https://patchwork.kernel.org/project/linux-riscv/patch/2019045623.5749-3-...@lst.de/
> One of the answers of the author of the commit:
> And we don't know if Linux will be around if that ever changes.
> The point is:
>  a) the current RISC-V spec is LE only
>  b) the current linux port is LE only except for this little bit
> There is no point in leaving just this bitrotting code around.  It
> just confuses developers, (very very slightly) slows down compiles
> and will bitrot.  It also won't be any significant help to a future
> developer down the road doing a hypothetical BE RISC-V Linux port.

Reads to me like a justification to _omit_ the cpu_to_le().

Jan



Re: [BUG]i2c_hid_acpi broken with 4.17.2 on Framework Laptop 13 AMD

2024-01-16 Thread Sébastien Chaumat
Le mar. 2 janv. 2024 à 21:23, Sébastien Chaumat  a
écrit :

>
>  output of gpioinfo
>>
>> kernel alone :
>>
>> line   5: unnamed input active-low consumer=interrupt
>> line  84: unnamed input active-low consumer=interrupt
>>
>> xen:
>>
>> line   5: unnamed input active-low
>> line  84: unnamed input active-low
>>
>> xen with skipping IRQ7 double init :
>>
>> line   5: unnamed input active-low consumer=interrupt
>> line  84: unnamed input active-low
>>
>>
>> So definitely progressing.
>>
>
> Checking /sys/kernel/irq/7
>
> kernel alone :
>  actions: pinctrl_amd
>  chip_name: IR-IO-APIC
>  hwirq: 7
>  name: fasteoi
>  per_cpu_count: 0,0,0,0,0,20,0,0,0,0,0,0,0,0,0,0
>  type: level
>  wakeup: enabled
>
> xen skipping IRQ7 double init :
>
> actions: pinctrl_amd
>  chip_name: xen-pirq
>  hwirq:
>  name: ioapic-level
>  per_cpu_count: 0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
>  type: edge
>  wakeup: disabled
>
> So the skip of IRQ7 in pci_xen_initial_domain() sets the correct handler
>  (IIUC xen uses the ioapic-level and handles the eoi separately), but not
> the correct type (still edge).
> I guess this may explains the results above.
>
>
 Mario (in CC) patched the pinctrl_amd to flush pending interrupt before
starting the driver for the GPIO.

This helped in  the sense of there's no more pending interrupt on IRQ7
(whatever the handler is, level or edge) but then the touchpad is not
detected by i2c-hid.

Is there any work in progress related to the incorrect IRQ configuration ?

Thanks,
Sébastien


Re: [PATCH v3 14/34] xen/riscv: introduce io.h

2024-01-16 Thread Oleksii
On Mon, 2024-01-15 at 17:57 +0100, Jan Beulich wrote:
> On 22.12.2023 16:12, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/io.h
> > @@ -0,0 +1,142 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * {read,write}{b,w,l,q} based on arch/arm64/include/asm/io.h
> > + *   which was based on arch/arm/include/io.h
> > + *
> > + * Copyright (C) 1996-2000 Russell King
> > + * Copyright (C) 2012 ARM Ltd.
> > + * Copyright (C) 2014 Regents of the University of California
> > + */
> > +
> > +
> > +#ifndef _ASM_RISCV_IO_H
> > +#define _ASM_RISCV_IO_H
> > +
> > +#include 
> > +
> > +/*
> > + * The RISC-V ISA doesn't yet specify how to query or modify PMAs,
> > so we can't
> > + * change the properties of memory regions.  This should be fixed
> > by the
> > + * upcoming platform spec.
> > + */
> > +#define ioremap_nocache(addr, size) ioremap((addr), (size))
> > +#define ioremap_wc(addr, size) ioremap((addr), (size))
> > +#define ioremap_wt(addr, size) ioremap((addr), (size))
> 
> Nit: No need for the inner parentheses.
Thanks. I'll update that place.

> 
> > +/* Generic IO read/write.  These perform native-endian accesses.
> > */
> > +#define __raw_writeb __raw_writeb
> > +static inline void __raw_writeb(u8 val, volatile void __iomem
> > *addr)
> > +{
> > +   asm volatile("sb %0, 0(%1)" : : "r" (val), "r" (addr));
> > +}
> > +
> > +#define __raw_writew __raw_writew
> > +static inline void __raw_writew(u16 val, volatile void __iomem
> > *addr)
> > +{
> > +   asm volatile("sh %0, 0(%1)" : : "r" (val), "r" (addr));
> > +}
> > +
> > +#define __raw_writel __raw_writel
> > +static inline void __raw_writel(u32 val, volatile void __iomem
> > *addr)
> > +{
> > +   asm volatile("sw %0, 0(%1)" : : "r" (val), "r" (addr));
> > +}
> > +
> > +#ifdef CONFIG_64BIT
> > +#define __raw_writeq __raw_writeq
> > +static inline void __raw_writeq(u64 val, volatile void __iomem
> > *addr)
> > +{
> > +   asm volatile("sd %0, 0(%1)" : : "r" (val), "r" (addr));
> > +}
> > +#endif
> > +
> > +#define __raw_readb __raw_readb
> > +static inline u8 __raw_readb(const volatile void __iomem *addr)
> > +{
> > +   u8 val;
> > +
> > +   asm volatile("lb %0, 0(%1)" : "=r" (val) : "r" (addr));
> > +   return val;
> > +}
> > +
> > +#define __raw_readw __raw_readw
> > +static inline u16 __raw_readw(const volatile void __iomem *addr)
> > +{
> > +   u16 val;
> > +
> > +   asm volatile("lh %0, 0(%1)" : "=r" (val) : "r" (addr));
> > +   return val;
> > +}
> > +
> > +#define __raw_readl __raw_readl
> > +static inline u32 __raw_readl(const volatile void __iomem *addr)
> > +{
> > +   u32 val;
> > +
> > +   asm volatile("lw %0, 0(%1)" : "=r" (val) : "r" (addr));
> > +   return val;
> > +}
> > +
> > +#ifdef CONFIG_64BIT
> > +#define __raw_readq __raw_readq
> > +static inline u64 __raw_readq(const volatile void __iomem *addr)
> > +{
> > +   u64 val;
> > +
> > +   asm volatile("ld %0, 0(%1)" : "=r" (val) : "r" (addr));
> > +   return val;
> > +}
> > +#endif
> > +
> > +/*
> > + * Unordered I/O memory access primitives.  These are even more
> > relaxed than
> > + * the relaxed versions, as they don't even order accesses between
> > successive
> > + * operations to the I/O regions.
> > + */
> > +#define readb_cpu(c)   ({ u8  __r = __raw_readb(c); __r;
> > })
> > +#define readw_cpu(c)   ({ u16 __r = le16_to_cpu((__force
> > __le16)__raw_readw(c)); __r; })
> > +#define readl_cpu(c)   ({ u32 __r = le32_to_cpu((__force
> > __le32)__raw_readl(c)); __r; })
> > +
> > +#define
> > writeb_cpu(v,c) ((void)__raw_writeb((v),(c)))
> > +#define
> > writew_cpu(v,c) ((void)__raw_writew((__force 
> > u16)cpu_to_le16(v),(c)))
> > +#define
> > writel_cpu(v,c) ((void)__raw_writel((__force 
> > u32)cpu_to_le32(v),(c)))
> > +
> > +#ifdef CONFIG_64BIT
> > +#define readq_cpu(c)   ({ u64 __r = le64_to_cpu((__force
> > __le64)__raw_readq(c)); __r; })
> > +#define
> > writeq_cpu(v,c) ((void)__raw_writeq((__force 
> > u64)cpu_to_le64(v),(c)))
> > +#endif
> 
> How come there are endianness assumptions here on the MMIO accessed?
It is a hard story.

As you might expect it was copy from Linux Kernel where it was decided
to follow only LE way:
https://patchwork.kernel.org/project/linux-riscv/patch/2019045623.5749-3-...@lst.de/
One of the answers of the author of the commit:
And we don't know if Linux will be around if that ever changes.
The point is:
 a) the current RISC-V spec is LE only
 b) the current linux port is LE only except for this little bit
There is no point in leaving just this bitrotting code around.  It
just confuses developers, (very very slightly) slows down compiles
and will bitrot.  It also won't be any significant help to a future
developer down the road doing a hypothetical BE RISC-V Linux port.

>From the specs [1, p.5 ] it is mentioned that:
   The base ISA has been defined to have a little-endian memory 

[xen-unstable test] 184365: tolerable FAIL - PUSHED

2024-01-16 Thread osstest service owner
flight 184365 xen-unstable real [real]
flight 184373 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/184365/
http://logs.test-lab.xenproject.org/osstest/logs/184373/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64 12 windows-install fail pass in 
184373-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail in 184373 like 184358
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 184358
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 184358
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 184358
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 184358
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 184358
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 184358
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 184358
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 184358
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 184358
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 184358
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 184358
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 xen  f3f6c500e2dbd23af77c207e2cf4b496fffa1b0d
baseline version:
 xen  c2ce3466472e9c9eda79f5dc98eb701bc6fdba20

Last test of basis   184358  2024-01-15 14:41:38 

[PATCH v1 repost 1/4] arm/mmu: Move init_ttbr to a new section .data.idmap

2024-01-16 Thread Julien Grall
From: Julien Grall 

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, init_ttbr is used by secondary CPUs to find there page-tables
before the MMU is on. Yet it is currently in .data which is unlikely
to be within the same page as the rest of the idmap.

Create a new section .data.idmap that will be used for variables
accessed by the early boot code. The first one is init_ttbr.

The idmap is currently part of the text section and therefore will
be mapped read-only executable. This means that we need to temporarily
remap init_ttbr in order to update it.

Introduce a new function set_init_ttbr() for this purpose so the code
is not duplicated between arm64 and arm32.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/mmu/smpboot.c | 34 +-
 xen/arch/arm/xen.lds.S |  1 +
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index b6fc0aae07f1..f1cf9252710c 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -9,6 +9,10 @@
 
 #include 
 
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef virt_to_mfn
+#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+
 /*
  * Static start-of-day pagetables that we use before the allocators
  * are up. These are used by all CPUs during bringup before switching
@@ -44,7 +48,7 @@ DEFINE_BOOT_PAGE_TABLE(boot_second);
 DEFINE_BOOT_PAGE_TABLES(boot_third, XEN_NR_ENTRIES(2));
 
 /* Non-boot CPUs use this to find the correct pagetables. */
-uint64_t init_ttbr;
+uint64_t __section(".data.idmap") init_ttbr;
 
 /* Clear a translation table and clean & invalidate the cache */
 static void clear_table(void *table)
@@ -68,6 +72,27 @@ static void clear_boot_pagetables(void)
 clear_table(boot_third);
 }
 
+static void set_init_ttbr(lpae_t *root)
+{
+/*
+ * init_ttbr is part of the identity mapping which is read-only. So
+ * We need to re-map the region so it can be updated
+ */
+void *ptr = map_domain_page(virt_to_mfn(_ttbr));
+
+ptr += PAGE_OFFSET(_ttbr);
+
+*(uint64_t *)ptr = virt_to_maddr(root);
+
+/*
+ * init_ttbr will be accessed with the MMU off, so ensure the update
+ * is visible by cleaning the cache.
+ */
+clean_dcache(ptr);
+
+unmap_domain_page(ptr);
+}
+
 #ifdef CONFIG_ARM_64
 int prepare_secondary_mm(int cpu)
 {
@@ -77,8 +102,8 @@ int prepare_secondary_mm(int cpu)
  * Set init_ttbr for this CPU coming up. All CPUs share a single setof
  * pagetables, but rewrite it each time for consistency with 32 bit.
  */
-init_ttbr = virt_to_maddr(xen_pgtable);
-clean_dcache(init_ttbr);
+set_init_ttbr(xen_pgtable);
+
 return 0;
 }
 #else
@@ -109,8 +134,7 @@ int prepare_secondary_mm(int cpu)
 clear_boot_pagetables();
 
 /* Set init_ttbr for this CPU coming up */
-init_ttbr = __pa(first);
-clean_dcache(init_ttbr);
+set_init_ttbr(first);
 
 return 0;
 }
diff --git a/xen/arch/arm/xen.lds.S b/xen/arch/arm/xen.lds.S
index 20598c6963ce..470c8f22084f 100644
--- a/xen/arch/arm/xen.lds.S
+++ b/xen/arch/arm/xen.lds.S
@@ -36,6 +36,7 @@ SECTIONS
*(.text.header)
*(.text.idmap)
*(.rodata.idmap)
+   *(.data.idmap)
_idmap_end = .;
 
*(.text.cold)
-- 
2.40.1




[PATCH v1 repost 3/4] xen/arm64: head: Use PRINT_ID() for secondary CPU MMU-off boot code

2024-01-16 Thread Julien Grall
From: Julien Grall 

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, most of the early printk messages are using PRINT() which
will add the message in .rodata. This is unlikely to be within the
same page as the rest of the idmap.

So replace all the PRINT() that can be reachable by the secondary
CPU with MMU-off with PRINT_ID().

Signed-off-by: Julien Grall 
---
 xen/arch/arm/arm64/head.S   | 14 +++---
 xen/arch/arm/arm64/mmu/head.S   |  2 +-
 xen/arch/arm/include/asm/arm64/macros.h |  9 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index cfc04c755400..fa8b00b6f1db 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -289,9 +289,9 @@ GLOBAL(init_secondary)
 
 #ifdef CONFIG_EARLY_PRINTK
 ldr   x23, =CONFIG_EARLY_UART_BASE_ADDRESS /* x23 := UART base address 
*/
-PRINT("- CPU ")
+PRINT_ID("- CPU ")
 print_reg x24
-PRINT(" booting -\r\n")
+PRINT_ID(" booting -\r\n")
 #endif
 blcheck_cpu_mode
 blcpu_init
@@ -314,10 +314,10 @@ ENDPROC(init_secondary)
  * Clobbers x0 - x5
  */
 check_cpu_mode:
-PRINT("- Current EL ")
+PRINT_ID("- Current EL ")
 mrs   x5, CurrentEL
 print_reg x5
-PRINT(" -\r\n")
+PRINT_ID(" -\r\n")
 
 /* Are we in EL2 */
 cmp   x5, #PSR_MODE_EL2t
@@ -326,8 +326,8 @@ check_cpu_mode:
 ret
 1:
 /* OK, we're boned. */
-PRINT("- Xen must be entered in NS EL2 mode -\r\n")
-PRINT("- Please update the bootloader -\r\n")
+PRINT_ID("- Xen must be entered in NS EL2 mode -\r\n")
+PRINT_ID("- Please update the bootloader -\r\n")
 b fail
 ENDPROC(check_cpu_mode)
 
@@ -361,7 +361,7 @@ ENDPROC(zero_bss)
  * Clobbers x0 - x3
  */
 cpu_init:
-PRINT("- Initialize CPU -\r\n")
+PRINT_ID("- Initialize CPU -\r\n")
 
 /* Set up memory attribute type tables */
 ldr   x0, =MAIRVAL
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index 92b62ae94ce5..fa40b696ddc8 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -276,7 +276,7 @@ ENDPROC(create_page_tables)
 enable_mmu:
 mov   x4, x0
 mov   x5, x1
-PRINT("- Turning on paging -\r\n")
+PRINT_ID("- Turning on paging -\r\n")
 
 /*
  * The state of the TLBs is unknown before turning on the MMU.
diff --git a/xen/arch/arm/include/asm/arm64/macros.h 
b/xen/arch/arm/include/asm/arm64/macros.h
index 10e652041f57..6a0108f778a2 100644
--- a/xen/arch/arm/include/asm/arm64/macros.h
+++ b/xen/arch/arm/include/asm/arm64/macros.h
@@ -39,9 +39,12 @@
  * There are multiple flavors:
  *  - PRINT_SECT(section, string): The @string will be located in @section
  *  - PRINT(): The string will be located in .rodata.str.
- *  - PRINT_ID(): When Xen is running on the Identity Mapping, it is
- *only possible to have a limited amount of Xen. This will create
- *the string in .rodata.idmap which will always be mapped.
+ *  - PRINT_ID(): This will create the string in .rodata.idmap which
+ *will always be accessible. This is used when:
+ *  - Xen is running on the identity mapping because not all of Xen is 
mapped
+ *  - Running with the MMU-off on secondary boots as Xen may not be
+ *physically contiguous in memory (e.g. in the case of cache
+ *coloring).
  *
  * Clobbers x0 - x3
  */
-- 
2.40.1




[PATCH v1 repost 4/4] [DO NOT COMMIT] xen/arm: Create a trampoline for secondary boot CPUs

2024-01-16 Thread Julien Grall
From: Julien Grall 

In order to confirm the early boot code is self-contained, allocate a
separate trampoline region for secondary to boot from it.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/arm64/mmu/mm.c |  7 +++
 xen/arch/arm/mmu/smpboot.c  |  4 +++-
 xen/arch/arm/psci.c |  5 -
 xen/arch/arm/smpboot.c  | 15 ++-
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index d2651c948698..3c4988dc75d1 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -110,11 +110,18 @@ void __init arch_setup_page_tables(void)
 prepare_runtime_identity_mapping();
 }
 
+extern mfn_t trampoline_start;
+
 void update_identity_mapping(bool enable)
 {
 paddr_t id_addr = virt_to_maddr(_start);
 int rc;
 
+if ( !mfn_eq(trampoline_start, INVALID_MFN) )
+{
+id_addr = mfn_to_maddr(trampoline_start);
+}
+
 if ( enable )
 rc = map_pages_to_xen(id_addr, maddr_to_mfn(id_addr), 1,
   PAGE_HYPERVISOR_RX);
diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index f1cf9252710c..d768dfe065a5 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -72,13 +72,15 @@ static void clear_boot_pagetables(void)
 clear_table(boot_third);
 }
 
+extern mfn_t trampoline_start;
+
 static void set_init_ttbr(lpae_t *root)
 {
 /*
  * init_ttbr is part of the identity mapping which is read-only. So
  * We need to re-map the region so it can be updated
  */
-void *ptr = map_domain_page(virt_to_mfn(_ttbr));
+void *ptr = map_domain_page(trampoline_start);
 
 ptr += PAGE_OFFSET(_ttbr);
 
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 695d2fa1f1b5..a00574d559d6 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -36,11 +36,14 @@ static uint32_t psci_cpu_on_nr;
 
 #define PSCI_RET(res)   ((int32_t)(res).a0)
 
+extern mfn_t trampoline_start;
+
 int call_psci_cpu_on(int cpu)
 {
 struct arm_smccc_res res;
 
-arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
+arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu),
+  mfn_to_maddr(trampoline_start) + PAGE_OFFSET(init_secondary),
   );
 
 return PSCI_RET(res);
diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 8d508a1bb258..ef84b73ebd46 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -293,10 +293,13 @@ unsigned int __init smp_get_max_cpus(void)
 return cpus;
 }
 
+mfn_t trampoline_start = INVALID_MFN_INITIALIZER;
+
 void __init
 smp_prepare_cpus(void)
 {
 int rc;
+void *trampoline;
 
 cpumask_copy(_present_map, _possible_map);
 
@@ -304,6 +307,16 @@ smp_prepare_cpus(void)
 if ( rc )
 panic("Unable to allocate CPU sibling/core maps\n");
 
+/* Create a trampoline to confirm early boot code is self-contained */
+trampoline = alloc_xenheap_page();
+BUG_ON(!trampoline);
+
+memcpy(trampoline, _start, PAGE_SIZE);
+clean_dcache_va_range(trampoline, PAGE_SIZE);
+invalidate_icache();
+
+printk("Trampoline 0x%lx\n", virt_to_maddr(trampoline));
+trampoline_start = virt_to_mfn(trampoline);
 }
 
 /* Boot the current CPU */
@@ -439,7 +452,7 @@ static void set_smp_up_cpu(unsigned long mpidr)
  * smp_up_cpu is part of the identity mapping which is read-only. So
  * We need to re-map the region so it can be updated.
  */
-void *ptr = map_domain_page(virt_to_mfn(_up_cpu));
+void *ptr = map_domain_page(trampoline_start);
 
 ptr += PAGE_OFFSET(_up_cpu);
 
-- 
2.40.1




[PATCH v1 repost 0/4] xen/arm64: Rework the MMU-off code (idmap) so it is self-contained

2024-01-16 Thread Julien Grall
From: Julien Grall 

Hi all,

Right now, the MMU-off code will access may access data that are either
in .rodata or .data. With the enablement of cache coloring, Xen may
not be physcally contiguous anymore when secondary CPUs are brought up.

There are multiple way to solve this problem. The first is to keep
a copy of Xen physically contiguous in memory. The downside is this
means we are using up to 8MB (maximum size of Xen) when only a few
KBs is necessary.

This series is reworking the logic so all the MMU-off code is now
self-contained for secondary boot CPUs on arm64.

On arm32, this is not yet possible because secondary CPUs needs to
rebuild boot page-tables.

Cheers,

Julien Grall (4):
  arm/mmu: Move init_ttbr to a new section .data.idmap
  arm/smpboot: Move smp_up_cpu to a new section .data.idmap
  xen/arm64: head: Use PRINT_ID() for secondary CPU MMU-off boot code
  [DO NOT COMMIT] xen/arm: Create a trampoline for secondary boot CPUs

 xen/arch/arm/arm64/head.S   | 14 +++
 xen/arch/arm/arm64/mmu/head.S   |  2 +-
 xen/arch/arm/arm64/mmu/mm.c |  7 
 xen/arch/arm/include/asm/arm64/macros.h |  9 +++--
 xen/arch/arm/mmu/smpboot.c  | 36 +++---
 xen/arch/arm/psci.c |  5 ++-
 xen/arch/arm/smpboot.c  | 49 ++---
 xen/arch/arm/xen.lds.S  |  1 +
 8 files changed, 101 insertions(+), 22 deletions(-)

-- 
2.40.1




[PATCH v1 repost 2/4] arm/smpboot: Move smp_up_cpu to a new section .data.idmap

2024-01-16 Thread Julien Grall
From: Julien Grall 

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, smp_up_cpu is used by secondary CPUs to wait there turn for
booting before the MMU is on. Yet it is currently in .data which is
unlikely to be within the same page as the rest of the idmap.

Move smp_up_cpu to the recently create section .data.idmap. The idmap is
currently part of the text section and therefore will be mapped read-onl
executable. This means that we need to temporarily remap
smp_up_cpu in order to update it.

Introduce a new function set_smp_up_cpu() for this purpose so the code
is not duplicated between when opening and closing the gate.

Signed-off-by: Julien Grall 
---
 xen/arch/arm/smpboot.c | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
index 7110bc11fc05..8d508a1bb258 100644
--- a/xen/arch/arm/smpboot.c
+++ b/xen/arch/arm/smpboot.c
@@ -29,6 +29,10 @@
 #include 
 #include 
 
+/* Override macros from asm/page.h to make them work with mfn_t */
+#undef virt_to_mfn
+#define virt_to_mfn(va) _mfn(__virt_to_mfn(va))
+
 cpumask_t cpu_online_map;
 cpumask_t cpu_present_map;
 cpumask_t cpu_possible_map;
@@ -56,7 +60,7 @@ struct init_info init_data =
 };
 
 /* Shared state for coordinating CPU bringup */
-unsigned long smp_up_cpu = MPIDR_INVALID;
+unsigned long __section(".data.idmap") smp_up_cpu = MPIDR_INVALID;
 /* Shared state for coordinating CPU teardown */
 static bool cpu_is_dead;
 
@@ -429,6 +433,28 @@ void stop_cpu(void)
 wfi();
 }
 
+static void set_smp_up_cpu(unsigned long mpidr)
+{
+/*
+ * smp_up_cpu is part of the identity mapping which is read-only. So
+ * We need to re-map the region so it can be updated.
+ */
+void *ptr = map_domain_page(virt_to_mfn(_up_cpu));
+
+ptr += PAGE_OFFSET(_up_cpu);
+
+*(unsigned long *)ptr = mpidr;
+
+/*
+ * init_ttbr will be accessed with the MMU off, so ensure the update
+ * is visible by cleaning the cache.
+ */
+clean_dcache(ptr);
+
+unmap_domain_page(ptr);
+
+}
+
 int __init cpu_up_send_sgi(int cpu)
 {
 /* We don't know the GIC ID of the CPU until it has woken up, so just
@@ -460,8 +486,7 @@ int __cpu_up(unsigned int cpu)
 init_data.cpuid = cpu;
 
 /* Open the gate for this CPU */
-smp_up_cpu = cpu_logical_map(cpu);
-clean_dcache(smp_up_cpu);
+set_smp_up_cpu(cpu_logical_map(cpu));
 
 rc = arch_cpu_up(cpu);
 
@@ -497,8 +522,9 @@ int __cpu_up(unsigned int cpu)
  */
 init_data.stack = NULL;
 init_data.cpuid = ~0;
-smp_up_cpu = MPIDR_INVALID;
-clean_dcache(smp_up_cpu);
+
+set_smp_up_cpu(MPIDR_INVALID);
+
 arch_cpu_up_finish();
 
 if ( !cpu_online(cpu) )
-- 
2.40.1




Re: [PATCH 3/3] [DO NOT COMMIT] xen/arm: Create a trampoline for secondary boot CPUs

2024-01-16 Thread Julien Grall




On 16/01/2024 14:24, Carlo Nonato wrote:

Hi Julien,

On Tue, Jan 16, 2024 at 12:55 PM Julien Grall  wrote:


From: Julien Grall 

In order to confirm the early boot code is self-contained, allocate a
separate trampoline region for secondary to boot from it.

Signed-off-by: Julien Grall 
---
  xen/arch/arm/arm64/mmu/mm.c |  7 +++
  xen/arch/arm/mmu/smpboot.c  |  4 +++-
  xen/arch/arm/psci.c |  5 -
  xen/arch/arm/smpboot.c  | 15 ++-
  4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
index d2651c948698..3c4988dc75d1 100644
--- a/xen/arch/arm/arm64/mmu/mm.c
+++ b/xen/arch/arm/arm64/mmu/mm.c
@@ -110,11 +110,18 @@ void __init arch_setup_page_tables(void)
  prepare_runtime_identity_mapping();
  }

+extern mfn_t trampoline_start;
+
  void update_identity_mapping(bool enable)
  {
  paddr_t id_addr = virt_to_maddr(_start);
  int rc;

+if ( !mfn_eq(trampoline_start, INVALID_MFN) )
+{
+id_addr = mfn_to_maddr(trampoline_start);
+}
+
  if ( enable )
  rc = map_pages_to_xen(id_addr, maddr_to_mfn(id_addr), 1,
PAGE_HYPERVISOR_RX);
diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
index f1cf9252710c..d768dfe065a5 100644
--- a/xen/arch/arm/mmu/smpboot.c
+++ b/xen/arch/arm/mmu/smpboot.c
@@ -72,13 +72,15 @@ static void clear_boot_pagetables(void)
  clear_table(boot_third);
  }

+extern mfn_t trampoline_start;
+
  static void set_init_ttbr(lpae_t *root)


Isn't this function not present in the patch series?


Oh. It looks like I forgot to post one patch. Let me resend it.

Cheers,

--
Julien Grall



Re: [PATCH v2 3/3] x86/vmx: Disallow the use of inactivity states

2024-01-16 Thread Tamas K Lengyel
On Thu, Jan 11, 2024 at 6:13 PM Andrew Cooper  wrote:
>
> Right now, vvmx will blindly copy L12's ACTIVITY_STATE into the L02 VMCS and
> enter the vCPU.  Luckily for us, nested-virt is explicitly unsupported for
> security bugs.
>
> The inactivity states are HLT, SHUTDOWN and WAIT-FOR-SIPI, and as noted by the
> SDM in Vol3 27.7 "Special Features of VM Entry":
>
>   If VM entry ends with the logical processor in an inactive activity state,
>   the VM entry generates any special bus cycle that is normally generated when
>   that activity state is entered from the active state.
>
> Also,
>
>   Some activity states unconditionally block certain events.
>
> I.e. A VMEntry with ACTIVITY=SHUTDOWN will initiate a platform reset, while a
> VMEntry with ACTIVITY=WAIT-FOR-SIPI will really block everything other than
> SIPIs.
>
> Both of these activity states are for the TXT ACM to use, not for regular
> hypervisors, and Xen doesn't support dropping the HLT intercept either.
>
> There are two paths in Xen which operate on ACTIVITY_STATE.
>
> 1) The vmx_{get,set}_nonreg_state() helpers for VM-Fork.
>
>As regular VMs can't use any inactivity states, this is just duplicating
>the 0 from construct_vmcs().  Retain the ability to query activity_state,
>but crash the domain on any attempt to set an inactivity state.
>
> 2) Nested virt, because of ACTIVITY_STATE in vmcs_gstate_field[].
>
>Explicitly hide the inactivity states in the guest's view of MSR_VMX_MISC,
>and remove ACTIVITY_STATE from vmcs_gstate_field[].
>
>In virtual_vmentry(), we should trigger a VMEntry failure for the use of
>any inactivity states, but there's no support for that in the code at all
>so leave a TODO for when we finally start working on nested-virt in
>earnest.
>
> Reported-by: Reima Ishii 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tamas K Lengyel 



Re: [PATCH 3/3] [DO NOT COMMIT] xen/arm: Create a trampoline for secondary boot CPUs

2024-01-16 Thread Carlo Nonato
Hi Julien,

On Tue, Jan 16, 2024 at 12:55 PM Julien Grall  wrote:
>
> From: Julien Grall 
>
> In order to confirm the early boot code is self-contained, allocate a
> separate trampoline region for secondary to boot from it.
>
> Signed-off-by: Julien Grall 
> ---
>  xen/arch/arm/arm64/mmu/mm.c |  7 +++
>  xen/arch/arm/mmu/smpboot.c  |  4 +++-
>  xen/arch/arm/psci.c |  5 -
>  xen/arch/arm/smpboot.c  | 15 ++-
>  4 files changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/xen/arch/arm/arm64/mmu/mm.c b/xen/arch/arm/arm64/mmu/mm.c
> index d2651c948698..3c4988dc75d1 100644
> --- a/xen/arch/arm/arm64/mmu/mm.c
> +++ b/xen/arch/arm/arm64/mmu/mm.c
> @@ -110,11 +110,18 @@ void __init arch_setup_page_tables(void)
>  prepare_runtime_identity_mapping();
>  }
>
> +extern mfn_t trampoline_start;
> +
>  void update_identity_mapping(bool enable)
>  {
>  paddr_t id_addr = virt_to_maddr(_start);
>  int rc;
>
> +if ( !mfn_eq(trampoline_start, INVALID_MFN) )
> +{
> +id_addr = mfn_to_maddr(trampoline_start);
> +}
> +
>  if ( enable )
>  rc = map_pages_to_xen(id_addr, maddr_to_mfn(id_addr), 1,
>PAGE_HYPERVISOR_RX);
> diff --git a/xen/arch/arm/mmu/smpboot.c b/xen/arch/arm/mmu/smpboot.c
> index f1cf9252710c..d768dfe065a5 100644
> --- a/xen/arch/arm/mmu/smpboot.c
> +++ b/xen/arch/arm/mmu/smpboot.c
> @@ -72,13 +72,15 @@ static void clear_boot_pagetables(void)
>  clear_table(boot_third);
>  }
>
> +extern mfn_t trampoline_start;
> +
>  static void set_init_ttbr(lpae_t *root)

Isn't this function not present in the patch series?

>  {
>  /*
>   * init_ttbr is part of the identity mapping which is read-only. So
>   * We need to re-map the region so it can be updated
>   */
> -void *ptr = map_domain_page(virt_to_mfn(_ttbr));
> +void *ptr = map_domain_page(trampoline_start);
>
>  ptr += PAGE_OFFSET(_ttbr);
>
> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> index 695d2fa1f1b5..a00574d559d6 100644
> --- a/xen/arch/arm/psci.c
> +++ b/xen/arch/arm/psci.c
> @@ -36,11 +36,14 @@ static uint32_t psci_cpu_on_nr;
>
>  #define PSCI_RET(res)   ((int32_t)(res).a0)
>
> +extern mfn_t trampoline_start;
> +
>  int call_psci_cpu_on(int cpu)
>  {
>  struct arm_smccc_res res;
>
> -arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
> +arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu),
> +  mfn_to_maddr(trampoline_start) + 
> PAGE_OFFSET(init_secondary),
>);
>
>  return PSCI_RET(res);
> diff --git a/xen/arch/arm/smpboot.c b/xen/arch/arm/smpboot.c
> index 8d508a1bb258..ef84b73ebd46 100644
> --- a/xen/arch/arm/smpboot.c
> +++ b/xen/arch/arm/smpboot.c
> @@ -293,10 +293,13 @@ unsigned int __init smp_get_max_cpus(void)
>  return cpus;
>  }
>
> +mfn_t trampoline_start = INVALID_MFN_INITIALIZER;
> +
>  void __init
>  smp_prepare_cpus(void)
>  {
>  int rc;
> +void *trampoline;
>
>  cpumask_copy(_present_map, _possible_map);
>
> @@ -304,6 +307,16 @@ smp_prepare_cpus(void)
>  if ( rc )
>  panic("Unable to allocate CPU sibling/core maps\n");
>
> +/* Create a trampoline to confirm early boot code is self-contained */
> +trampoline = alloc_xenheap_page();
> +BUG_ON(!trampoline);
> +
> +memcpy(trampoline, _start, PAGE_SIZE);
> +clean_dcache_va_range(trampoline, PAGE_SIZE);
> +invalidate_icache();
> +
> +printk("Trampoline 0x%lx\n", virt_to_maddr(trampoline));
> +trampoline_start = virt_to_mfn(trampoline);
>  }
>
>  /* Boot the current CPU */
> @@ -439,7 +452,7 @@ static void set_smp_up_cpu(unsigned long mpidr)
>   * smp_up_cpu is part of the identity mapping which is read-only. So
>   * We need to re-map the region so it can be updated.
>   */
> -void *ptr = map_domain_page(virt_to_mfn(_up_cpu));
> +void *ptr = map_domain_page(trampoline_start);
>
>  ptr += PAGE_OFFSET(_up_cpu);
>
> --
> 2.40.1
>

Thank you very much for your help.



Re: [PATCH] x86/cpuid: Change cpuid() from a macro to a static inline

2024-01-16 Thread Federico Serafini

On 16/01/24 14:02, Jan Beulich wrote:

On 16.01.2024 12:58, Andrew Cooper wrote:

Fixes MISRA XXX


Rule 5.5 if I'm not mistaken; had to look it up for the patch sent
earlier in the day. As to "fixes" - when it's not an actual bug, I had
(successfully) asked the bugseng guys to avoid that term, and instead
use "addresses" or "eliminates a ... violation" or some such.


No functional change.

Signed-off-by: Andrew Cooper 


Reviewed-by: Jan Beulich 

Jan


I confirm that it is Rule 5.5.

I would like to point out that although the patch fixes violations of
Rule 5.5, it introduces new violations of Rule 5.3 "An identifier 
declared in an inner scope shall not hide an identifier declared in an 
outer scope": cpuid is used also as an identifier for some formal 
arguments (the pipeline does not fail because Rule 5.3 is not tagged

as "clean" and the introduction of new violations does not cause
a failure).
A solution could be to rename the function adding a prefix or a suffix
to its name.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



Re: [PATCH] x86/cpuid: Change cpuid() from a macro to a static inline

2024-01-16 Thread Andrew Cooper
On 16/01/2024 1:02 pm, Jan Beulich wrote:
> On 16.01.2024 12:58, Andrew Cooper wrote:
>> Fixes MISRA XXX
> Rule 5.5 if I'm not mistaken; had to look it up for the patch sent
> earlier in the day. As to "fixes" - when it's not an actual bug, I had
> (successfully) asked the bugseng guys to avoid that term, and instead
> use "addresses" or "eliminates a ... violation" or some such.

Ok.

>
>> No functional change.
>>
>> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jan Beulich 

Thanks.

~Andrew



[PATCH] xen: Drop out of coroutine context xen_invalidate_map_cache_entry

2024-01-16 Thread Peng Fan (OSS)
From: Peng Fan 

xen_invalidate_map_cache_entry is not expected to run in a
coroutine. Without this, there is crash:

signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
threadid=) at pthread_kill.c:78
at /usr/src/debug/glibc/2.38+git-r0/sysdeps/posix/raise.c:26
fmt=0x9e1ca8a8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0xe0d25740 "!qemu_in_coroutine()",
file=file@entry=0xe0d301a8 "../qemu-xen-dir-remote/block/graph-lock.c", 
line=line@entry=260,
function=function@entry=0xe0e522c0 <__PRETTY_FUNCTION__.3> 
"bdrv_graph_rdlock_main_loop") at assert.c:92
assertion=assertion@entry=0xe0d25740 "!qemu_in_coroutine()",
file=file@entry=0xe0d301a8 "../qemu-xen-dir-remote/block/graph-lock.c", 
line=line@entry=260,
function=function@entry=0xe0e522c0 <__PRETTY_FUNCTION__.3> 
"bdrv_graph_rdlock_main_loop") at assert.c:101
at ../qemu-xen-dir-remote/block/graph-lock.c:260
at 
/home/Freenix/work/sw-stash/xen/upstream/tools/qemu-xen-dir-remote/include/block/graph-lock.h:259
host=host@entry=0x742c8000, size=size@entry=2097152)
at ../qemu-xen-dir-remote/block/io.c:3362
host=0x742c8000, size=2097152)
at ../qemu-xen-dir-remote/block/block-backend.c:2859
host=, size=, max_size=)
at ../qemu-xen-dir-remote/block/block-ram-registrar.c:33
size=2097152, max_size=2097152)
at ../qemu-xen-dir-remote/hw/core/numa.c:883
buffer=buffer@entry=0x743c5000 "")
at ../qemu-xen-dir-remote/hw/xen/xen-mapcache.c:475
buffer=buffer@entry=0x743c5000 "")
at ../qemu-xen-dir-remote/hw/xen/xen-mapcache.c:487
as=as@entry=0xe1ca3ae8 , buffer=0x743c5000,
len=, is_write=is_write@entry=true,
access_len=access_len@entry=32768)
at ../qemu-xen-dir-remote/system/physmem.c:3199
dir=DMA_DIRECTION_FROM_DEVICE, len=,
buffer=, as=0xe1ca3ae8 )
at 
/home/Freenix/work/sw-stash/xen/upstream/tools/qemu-xen-dir-remote/include/sysemu/dma.h:236
elem=elem@entry=0xf620aa30, len=len@entry=32769)
at ../qemu-xen-dir-remote/hw/virtio/virtio.c:758
elem=elem@entry=0xf620aa30, len=len@entry=32769, idx=idx@entry=0)
at ../qemu-xen-dir-remote/hw/virtio/virtio.c:919
elem=elem@entry=0xf620aa30, len=32769)
at ../qemu-xen-dir-remote/hw/virtio/virtio.c:994
req=req@entry=0xf620aa30, status=status@entry=0 '\000')
at ../qemu-xen-dir-remote/hw/block/virtio-blk.c:67
ret=0) at ../qemu-xen-dir-remote/hw/block/virtio-blk.c:136
at ../qemu-xen-dir-remote/block/block-backend.c:1559
--Type  for more, q to quit, c to continue without paging--
at ../qemu-xen-dir-remote/block/block-backend.c:1614
i1=) at ../qemu-xen-dir-remote/util/coroutine-ucontext.c:177
at ../sysdeps/unix/sysv/linux/aarch64/setcontext.S:123

Signed-off-by: Peng Fan 
---
 hw/xen/xen-mapcache.c | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
index f7d974677d..4e1bb665ee 100644
--- a/hw/xen/xen-mapcache.c
+++ b/hw/xen/xen-mapcache.c
@@ -481,11 +481,38 @@ static void 
xen_invalidate_map_cache_entry_unlocked(uint8_t *buffer)
 g_free(entry);
 }
 
-void xen_invalidate_map_cache_entry(uint8_t *buffer)
+typedef struct XenMapCacheData {
+Coroutine *co;
+uint8_t *buffer;
+int ret;
+} XenMapCacheData;
+
+static void xen_invalidate_map_cache_entry_bh(void *opaque)
 {
+XenMapCacheData *data = opaque;
+
 mapcache_lock();
-xen_invalidate_map_cache_entry_unlocked(buffer);
+xen_invalidate_map_cache_entry_unlocked(data->buffer);
 mapcache_unlock();
+
+aio_co_wake(data->co);
+}
+
+void coroutine_mixed_fn xen_invalidate_map_cache_entry(uint8_t *buffer)
+{
+if (qemu_in_coroutine()) {
+XenMapCacheData data = {
+.co = qemu_coroutine_self(),
+.buffer = buffer,
+};
+aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
+xen_invalidate_map_cache_entry_bh, );
+qemu_coroutine_yield();
+} else {
+mapcache_lock();
+xen_invalidate_map_cache_entry_unlocked(buffer);
+mapcache_unlock();
+}
 }
 
 void xen_invalidate_map_cache(void)
-- 
2.35.3




Re: [PATCH v3 22/46] hw/arm/aspeed: use qemu_configure_nic_device()

2024-01-16 Thread Cédric Le Goater

On 1/8/24 21:26, David Woodhouse wrote:

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
  hw/arm/aspeed.c | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
index cc59176563..bed5e4f40b 100644
--- a/hw/arm/aspeed.c
+++ b/hw/arm/aspeed.c
@@ -356,7 +356,6 @@ static void aspeed_machine_init(MachineState *machine)
  AspeedMachineClass *amc = ASPEED_MACHINE_GET_CLASS(machine);
  AspeedSoCClass *sc;
  int i;
-NICInfo *nd = _table[0];
  
  bmc->soc = ASPEED_SOC(object_new(amc->soc_name));

  object_property_add_child(OBJECT(machine), "soc", OBJECT(bmc->soc));
@@ -371,10 +370,10 @@ static void aspeed_machine_init(MachineState *machine)
   _fatal);
  
  for (i = 0; i < sc->macs_num; i++) {

-if ((amc->macs_mask & (1 << i)) && nd->used) {
-qemu_check_nic_model(nd, TYPE_FTGMAC100);
-qdev_set_nic_properties(DEVICE(>soc->ftgmac100[i]), nd);
-nd++;
+if ((amc->macs_mask & (1 << i)) &&
+!qemu_configure_nic_device(DEVICE(>soc->ftgmac100[i]),
+   true, NULL)) {
+break; /* No configs left; stop asking */
  }
  }
  


Acked-by: Cédric Le Goater 

Thanks,

C.





Re: [PATCH v3 40/46] hw/s390x/s390-virtio-ccw: use qemu_create_nic_device()

2024-01-16 Thread Thomas Huth

On 08/01/2024 21.27, David Woodhouse wrote:

From: David Woodhouse 

Signed-off-by: David Woodhouse 
---
  hw/s390x/s390-virtio-ccw.c | 11 ++-
  1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 1169e20b94..202c378131 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -221,16 +221,9 @@ static void s390_init_ipl_dev(const char *kernel_filename,
  
  static void s390_create_virtio_net(BusState *bus, const char *name)

  {
-int i;
-
-for (i = 0; i < nb_nics; i++) {
-NICInfo *nd = _table[i];
-DeviceState *dev;
-
-qemu_check_nic_model(nd, "virtio");
+DeviceState *dev;
  
-dev = qdev_new(name);

-qdev_set_nic_properties(dev, nd);
+while ((dev = qemu_create_nic_device(name, true, "virtio"))) {
  qdev_realize_and_unref(dev, bus, _fatal);
  }
  }


Acked-by: Thomas Huth 




Re: [PATCH v3 10/34] xen/riscv: introduce bitops.h

2024-01-16 Thread Jan Beulich
On 16.01.2024 14:06, Oleksii wrote:
> On Mon, 2024-01-15 at 17:44 +0100, Jan Beulich wrote:
>> On 22.12.2023 16:12, Oleksii Kurochko wrote:
>>> +#define test_and_set_bit   __test_and_set_bit
>>> +#define test_and_clear_bit __test_and_clear_bit
>>
>> I realize test-and-change have no present users, despite being made
>> available by Arm and x86, but I think they would better be provided
>> right away, rather than someone introducing a use then needing to
>> fiddle with RISC-V (and apparently also PPC) code.
> Sure, it makes sense. I'll add test-and-change too.
> 
>> I'm also puzzled by this aliasing: Aren't there cheaper non-atomic
>> insn forms that could be used for the double-underscore-prefixed
>> variants?
> It will be cheaper, but I assume that this API should be safe in the
> case of SMP where different CPUs can access the same variable or
> similar cases with simultaneous access to the variable.

Of course, that's what test_and_...() are for. __test_and_...() are
for cases where there's no concurrency, when hence the cheaper forms
can be used. Thus my asking about the aliasing done above.

>>> +#if BITS_PER_LONG == 64
>>> +    if ((word & 0x) == 0) {
>>> +    num += 32;
>>> +    word >>= 32;
>>> +    }
>>
>> You're ending up with neither Xen nor Linux style this way. May I
>> suggest to settle on either?
> 
> Will it fine to rework header from Linux to Xen style? Does it make
> sense?
> I think this file can be reworked to Xen style as I don't expect that
> it will be changed since it will be merged.

You may keep Linux style or fully switch to Xen style - which one is
largely up to you. All I'm asking is to avoid introducing further
mixed-style source files.

>>> --- /dev/null
>>> +++ b/xen/include/asm-generic/bitops/bitops-bits.h
>>> @@ -0,0 +1,10 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef _ASM_GENERIC_BITOPS_BITS_H_
>>> +#define _ASM_GENERIC_BITOPS_BITS_H_
>>> +
>>> +#define BITOP_BITS_PER_WORD 32
>>> +#define BITOP_MASK(nr)  (1UL << ((nr) %
>>> BITOP_BITS_PER_WORD))
>>
>> Why 1UL and not just 1U, when bits per word is 32?
> There is no specific reason, should 1U. ( I originally used
> BITOPS_BITS_PER_LONG ) and with introduction of asm-generic bitops
> decided to follow what other archs provide.
> 
> Regarding to the second part of the question, I don't understand it
> fully. Considering BITOP_BIT_PER_WORD definition for other archs ( ARM
> and PPC ) it is expected that word is 32 bits.

The 2nd part was explaining why I'm asking. It wasn't another question.

>>> --- /dev/null
>>> +++ b/xen/include/asm-generic/bitops/test-bit.h
>>> @@ -0,0 +1,16 @@
>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>> +#ifndef _ASM_GENERIC_BITOPS_TESTBIT_H_
>>> +#define _ASM_GENERIC_BITOPS_TESTBIT_H_
>>> +
>>> +/**
>>> + * test_bit - Determine whether a bit is set
>>> + * @nr: bit number to test
>>> + * @addr: Address to start counting from
>>> + */
>>> +static inline int test_bit(int nr, const volatile void *addr)
>>> +{
>>> +    const volatile unsigned int *p = addr;
>>
>> With BITOP_BITS_PER_WORD I think you really mean uint32_t here.
> Isn't it the same: 'unsigned int' and 'uint32_t'?

No, or else there wouldn't have been a need to introduce uint_t (and
others) in C99. It just so happens that right now all architectures Xen
can be built for have sizeof(int) == 4 and CHAR_BITS == 8. In an arch-
specific header I would see this as less of an issue, but in a generic
header we'd better avoid encoding wrong assumptions. The one assumption
we generally make is that sizeof(int) >= 4 and CHAR_BITS >= 8 (albeit I
bet really in various places we assume CHAR_BITS == 8).

>> Also you want to make sure asm-generic/bitops/bitops-bits.h is
>> really in use here, or else an arch overriding / not using that
>> header may end up screwed.
> I am not really understand what do you mean. Could you please explain a
> little bit more.

Whichever type you use here, it needs to be in sync with
BITOP_BITS_PER_WORD. Hence you want to include the _local_ bitops-bits.h
here, such that in case of an inconsistent override by an arch the
compiler would complain about the two differring #define-s. (IOW an
arch overriding BITOP_BITS_PER_WORD cannot re-use this header as-is.)

The same may, btw, be true for others of the new headers you add - the
same #include would therefore be needed there as well.

Jan



Re: [PATCH v3 10/34] xen/riscv: introduce bitops.h

2024-01-16 Thread Oleksii
On Mon, 2024-01-15 at 17:44 +0100, Jan Beulich wrote:
> On 22.12.2023 16:12, Oleksii Kurochko wrote:
> > --- /dev/null
> > +++ b/xen/arch/riscv/include/asm/bitops.h
> > @@ -0,0 +1,267 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/* Copyright (C) 2012 Regents of the University of California */
> > +
> > +#ifndef _ASM_RISCV_BITOPS_H
> > +#define _ASM_RISCV_BITOPS_H
> > +
> > +#include 
> > +
> > +#include 
> > +
> > +/* Based on linux/arch/include/linux/bits.h */
> > +
> > +#define BIT_MASK(nr)    (1UL << ((nr) % BITS_PER_LONG))
> > +#define BIT_WORD(nr)    ((nr) / BITS_PER_LONG)
> > +
> > +#define __set_bit(n,p)  set_bit(n,p)
> > +#define __clear_bit(n,p)    clear_bit(n,p)
> > +
> > +/* Based on linux/arch/include/asm/bitops.h */
> > +
> > +#if ( BITS_PER_LONG == 64 )
> > +#define __AMO(op)   "amo" #op ".d"
> > +#elif ( BITS_PER_LONG == 32 )
> > +#define __AMO(op)   "amo" #op ".w"
> > +#else
> > +#error "Unexpected BITS_PER_LONG"
> > +#endif
> > +
> > +#define __test_and_op_bit_ord(op, mod, nr, addr, ord)  \
> > +({ \
> > +    unsigned long __res, __mask;   \
> > +    __mask = BIT_MASK(nr); \
> > +    __asm__ __volatile__ ( \
> > +    __AMO(op) #ord " %0, %2, %1"   \
> > +    : "=r" (__res), "+A" (addr[BIT_WORD(nr)])  \
> > +    : "r" (mod(__mask))\
> > +    : "memory");   \
> > +    ((__res & __mask) != 0);   \
> > +})
> 
> Despite the taking from Linux I think the overall result wants to be
> consistent formatting-wise: You switched to blank indentation (which
> is fine), but you left tabs as padding for the line continuation
> characters.
I think it is because of my IDE. I will be consistent in next patch
version.

Thanks.

> 
> > +#define __op_bit_ord(op, mod, nr, addr, ord)   \
> > +    __asm__ __volatile__ ( \
> > +    __AMO(op) #ord " zero, %1, %0" \
> > +    : "+A" (addr[BIT_WORD(nr)])\
> > +    : "r" (mod(BIT_MASK(nr)))  \
> > +    : "memory");
> > +
> > +#define __test_and_op_bit(op, mod, nr, addr)   \
> 
> (At least) here you even use a mix.
> 
> > +    __test_and_op_bit_ord(op, mod, nr, addr, .aqrl)
> > +#define __op_bit(op, mod, nr, addr)\
> > +    __op_bit_ord(op, mod, nr, addr, )
> > +
> > +/* Bitmask modifiers */
> > +#define __NOP(x)   (x)
> > +#define __NOT(x)   (~(x))
> > +
> > +/**
> > + * __test_and_set_bit - Set a bit and return its old value
> > + * @nr: Bit to set
> > + * @addr: Address to count from
> > + *
> > + * This operation may be reordered on other architectures than
> > x86.
> > + */
> > +static inline int __test_and_set_bit(int nr, volatile void *p)
> > +{
> > +    volatile uint32_t *addr = p;
> > +
> > +    return __test_and_op_bit(or, __NOP, nr, addr);
> > +}
> > +
> > +/**
> > + * __test_and_clear_bit - Clear a bit and return its old value
> > + * @nr: Bit to clear
> > + * @addr: Address to count from
> > + *
> > + * This operation can be reordered on other architectures other
> > than x86.
> > + */
> > +static inline int __test_and_clear_bit(int nr, volatile void *p)
> > +{
> > +    volatile uint32_t *addr = p;
> > +
> > +    return __test_and_op_bit(and, __NOT, nr, addr);
> > +}
> > +
> > +/**
> > + * set_bit - Atomically set a bit in memory
> > + * @nr: the bit to set
> > + * @addr: the address to start counting from
> > + *
> > + * Note: there are no guarantees that this function will not be
> > reordered
> > + * on non x86 architectures, so if you are writing portable code,
> > + * make sure not to rely on its reordering guarantees.
> > + *
> > + * Note that @nr may be almost arbitrarily large; this function is
> > not
> > + * restricted to acting on a single-word quantity.
> > + */
> > +static inline void set_bit(int nr, volatile void *p)
> > +{
> > +    volatile uint32_t *addr = p;
> > +
> > +    __op_bit(or, __NOP, nr, addr);
> > +}
> > +
> > +/**
> > + * clear_bit - Clears a bit in memory
> > + * @nr: Bit to clear
> > + * @addr: Address to start counting from
> > + *
> > + * Note: there are no guarantees that this function will not be
> > reordered
> > + * on non x86 architectures, so if you are writing portable code,
> > + * make sure not to rely on its reordering guarantees.
> > + */
> > +static inline void clear_bit(int nr, volatile void *p)
> > +{
> > +    volatile uint32_t *addr = p;
> > +
> > +    __op_bit(and, __NOT, nr, addr);
> > +}
> > +
> > +#undef __test_and_op_bit
> > +#undef __op_bit
> > +#undef __NOP
> > +#undef __NOT
> > +#undef __AMO
> > +
> > +#define test_and_set_bit   __test_and_set_bit
> > +#define test_and_clear_bit __test_and_clear_bit
> 
> I realize 

Re: [PATCH] x86/cpuid: Change cpuid() from a macro to a static inline

2024-01-16 Thread Jan Beulich
On 16.01.2024 12:58, Andrew Cooper wrote:
> Fixes MISRA XXX

Rule 5.5 if I'm not mistaken; had to look it up for the patch sent
earlier in the day. As to "fixes" - when it's not an actual bug, I had
(successfully) asked the bugseng guys to avoid that term, and instead
use "addresses" or "eliminates a ... violation" or some such.

> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

Jan



[ovmf test] 184371: all pass - PUSHED

2024-01-16 Thread osstest service owner
flight 184371 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/184371/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 9971b99461e930008e3d35bc0a4a310b6afa57f6
baseline version:
 ovmf a4b8944e27f497b0f4dbfb6aa412decab2874b58

Last test of basis   184369  2024-01-16 07:14:43 Z0 days
Testing same since   184371  2024-01-16 09:42:46 Z0 days1 attempts


People who touched revisions under test:
  Abner Chang 
  Doug Flick [MSFT] 
  Douglas Flick [MSFT] 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   a4b8944e27..9971b99461  9971b99461e930008e3d35bc0a4a310b6afa57f6 -> 
xen-tested-master



Re: [PATCH v5 5/8] RISC-V: annotate entry points with type and size

2024-01-16 Thread Oleksii
On Mon, 2024-01-15 at 15:37 +0100, Jan Beulich wrote:
> Use the generic framework in xen/linkage.h. No change in generated
> code
> except of course the converted symbols change to be hidden ones and
> gain
> a valid size.
> 
> Signed-off-by: Jan Beulich 
> ---
> Probably count_args_exp() should move to macros.h, but I first wanted
> to
> see whether anyone can suggest any better approach for checking
> whether
> a defined macro expands to nothing.
> ---
The current one approach looks good to me.
Reviewed-by: Oleksii Kurochko 

~ Oleksii

> v5: Re-base.
> v4: Also drop #undef-s from linker script.
> v3: New.
> 
> --- a/xen/arch/riscv/entry.S
> +++ b/xen/arch/riscv/entry.S
> @@ -5,7 +5,7 @@
>  #include 
>  
>  /* WIP: only works while interrupting Xen context */
> -ENTRY(handle_trap)
> +FUNC(handle_trap)
>  
>  /* Exceptions from xen */
>  save_to_stack:
> @@ -92,3 +92,4 @@ restore_registers:
>  REG_L   sp, CPU_USER_REGS_SP(sp)
>  
>  sret
> +END(handle_trap)
> --- a/xen/arch/riscv/include/asm/asm.h
> +++ b/xen/arch/riscv/include/asm/asm.h
> @@ -7,6 +7,7 @@
>  #define _ASM_RISCV_ASM_H
>  
>  #ifdef __ASSEMBLY__
> +#include 
>  #define __ASM_STR(x) x
>  #else
>  #define __ASM_STR(x) #x
> --- a/xen/arch/riscv/include/asm/config.h
> +++ b/xen/arch/riscv/include/asm/config.h
> @@ -69,12 +69,8 @@
>  
>  /* Linkage for RISCV */
>  #ifdef __ASSEMBLY__
> -#define ALIGN .align 4
> -
> -#define ENTRY(name)    \
> -  .globl name; \
> -  ALIGN;   \
> -  name:
> +#define CODE_ALIGN 16
> +#define CODE_FILL /* empty */
>  #endif
>  
>  #ifdef CONFIG_RISCV_64
> --- a/xen/arch/riscv/riscv64/head.S
> +++ b/xen/arch/riscv/riscv64/head.S
> @@ -8,7 +8,7 @@
>   *   a0 -> hart_id ( bootcpu_id )
>   *   a1 -> dtb_base 
>   */
> -ENTRY(start)
> +FUNC(start)
>  /* Mask all interrupts */
>  csrw    CSR_SIE, zero
>  
> @@ -60,19 +60,21 @@ ENTRY(start)
>  mv  a1, s1
>  
>  tail    start_xen
> +END(start)
>  
>  .section .text, "ax", %progbits
>  
> -ENTRY(reset_stack)
> +FUNC(reset_stack)
>  la  sp, cpu0_boot_stack
>  li  t0, STACK_SIZE
>  add sp, sp, t0
>  
>  ret
> +END(reset_stack)
>  
>  .section .text.ident, "ax", %progbits
>  
> -ENTRY(turn_on_mmu)
> +FUNC(turn_on_mmu)
>  sfence.vma
>  
>  li  t0, RV_STAGE1_MODE
> @@ -84,3 +86,4 @@ ENTRY(turn_on_mmu)
>  csrw    CSR_SATP, t1
>  
>  jr  a0
> +END(turn_on_mmu)
> --- a/xen/arch/riscv/xen.lds.S
> +++ b/xen/arch/riscv/xen.lds.S
> @@ -1,9 +1,6 @@
>  #include 
>  #include 
>  
> -#undef ENTRY
> -#undef ALIGN
> -
>  OUTPUT_ARCH(riscv)
>  ENTRY(start)
>  
> --- a/xen/include/xen/linkage.h
> +++ b/xen/include/xen/linkage.h
> @@ -35,17 +35,28 @@
>  
>  #define END(name) .size name, . - name
>  
> +/*
> + * CODE_FILL in particular may need to expand to nothing (e.g. for
> RISC-V), in
> + * which case we also need to get rid of the comma in the .balign
> directive.
> + */
> +#define count_args_exp(args...) count_args(args)
> +#if count_args_exp(CODE_FILL)
> +# define DO_CODE_ALIGN(align...) LASTARG(CODE_ALIGN, ## align),
> CODE_FILL
> +#else
> +# define DO_CODE_ALIGN(align...) LASTARG(CODE_ALIGN, ## align)
> +#endif
> +
>  #define FUNC(name, align...) \
> -    SYM(name, FUNC, GLOBAL, LASTARG(CODE_ALIGN, ## align),
> CODE_FILL)
> +    SYM(name, FUNC, GLOBAL, DO_CODE_ALIGN(align))
>  #define LABEL(name, align...) \
> -    SYM(name, NONE, GLOBAL, LASTARG(CODE_ALIGN, ## align),
> CODE_FILL)
> +    SYM(name, NONE, GLOBAL, DO_CODE_ALIGN(align))
>  #define DATA(name, align...) \
>  SYM(name, DATA, GLOBAL, LASTARG(DATA_ALIGN, ## align),
> DATA_FILL)
>  
>  #define FUNC_LOCAL(name, align...) \
> -    SYM(name, FUNC, LOCAL, LASTARG(CODE_ALIGN, ## align),
> CODE_FILL)
> +    SYM(name, FUNC, LOCAL, DO_CODE_ALIGN(align))
>  #define LABEL_LOCAL(name, align...) \
> -    SYM(name, NONE, LOCAL, LASTARG(CODE_ALIGN, ## align),
> CODE_FILL)
> +    SYM(name, NONE, LOCAL, DO_CODE_ALIGN(align))
>  #define DATA_LOCAL(name, align...) \
>  SYM(name, DATA, LOCAL, LASTARG(DATA_ALIGN, ## align),
> DATA_FILL)
>  
> 



[PATCH] x86/cpuid: Change cpuid() from a macro to a static inline

2024-01-16 Thread Andrew Cooper
Fixes MISRA XXX

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Volodymyr Babchuk 
CC: Bertrand Marquis 
CC: Michal Orzel 
CC: Roberto Bagnara 
CC: Federico Serafini 
CC: consult...@bugseng.com 

Can someone please remind me which MISRA rule is the one about macros aliasing
identifiers?
---
 xen/arch/x86/include/asm/processor.h | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/include/asm/processor.h 
b/xen/arch/x86/include/asm/processor.h
index ff62b080afbf..b227cdee8ef3 100644
--- a/xen/arch/x86/include/asm/processor.h
+++ b/xen/arch/x86/include/asm/processor.h
@@ -126,14 +126,6 @@ static inline int cpu_nr_siblings(unsigned int cpu)
 return cpu_data[cpu].x86_num_siblings;
 }
 
-/*
- * Generic CPUID function
- * clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx
- * resulting in stale register contents being returned.
- */
-#define cpuid(leaf, eax, ebx, ecx, edx)  \
-cpuid_count(leaf, 0, eax, ebx, ecx, edx)
-
 /* Some CPUID calls want 'count' to be placed in ecx */
 static inline void cpuid_count(
 unsigned int op,
@@ -148,6 +140,21 @@ static inline void cpuid_count(
   : "0" (op), "c" (count) );
 }
 
+/*
+ * Generic CPUID function
+ * clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx
+ * resulting in stale register contents being returned.
+ */
+static inline void cpuid(
+unsigned int leaf,
+unsigned int *eax,
+unsigned int *ebx,
+unsigned int *ecx,
+unsigned int *edx)
+{
+cpuid_count(leaf, 0, eax, ebx, ecx, edx);
+}
+
 /*
  * CPUID functions returning a single datum
  */

base-commit: f3f6c500e2dbd23af77c207e2cf4b496fffa1b0d
-- 
2.30.2




Re: [PATCH v5 13/13] xen/arm: add cache coloring support for Xen

2024-01-16 Thread Julien Grall

Hi,

On 15/01/2024 16:16, Julien Grall wrote:

On 15/01/2024 15:43, Carlo Nonato wrote:

Hi Julien,


Hi Carlo,


On Mon, Jan 15, 2024 at 12:18 PM Julien Grall  wrote:

On 15/01/2024 10:11, Carlo Nonato wrote:
I understand what you're talking about, and it seems reasonable to 
get rid of
xen_colored_temp[] and create_llc_coloring_mappings() since in the 
end they
serve the purpose of mapping the physically colored space that is 
already

mapped using xen_xenmap[] pagetables.
What I don't understand is then how to copy/relocate Xen since I 
don't have a

destination virtual space anymore to use in relocate_xen().


You will need to link xen_xenmap[] in boot_second[...] as well. With
that, you will be able to access the new Xen through the temporary area.


Wouldn't it result in overwriting the current virtual space mapping?
boot_second is the live page table and if I link xen_xenmap[] then
XEN_VIRT_START would point to the new colored space which is still 
empty at

this stage...


If you link at XEN_VIRT_START then yes. But you could link at 
BOOT_RELOC_VIRT_START like you already do today.





[...]


Note that this means the init_ttbr cannot be written directly. But you
can solve this problem by re-mapping the address.


How to remap a single address?


You should be able to use map_domain_page() to map the page where
init_ttbr is.

And if moving init_ttbr in the identity-mapped area means that it's 
no longer
writable, so that I need to remap it, why moving it in that area in 
the first

place. Again I think I'm missing something.


The goal is to have everything used (code, data) before the MMU is
turned on residing in a single page. So secondary CPUs can directly jump
to the colored Xen without any trouble.


This is what confuses me. Why having everything on a single page makes
secondary cpus able to jump directly to colored Xen? (also see below)


Because the code running with the MMU off can access easily access 
everything.






3) To access the identity mapping area I would need some accessor 
that takes
an address and returns it + phys_offset, or is there a better way 
to do it?


I am not sure I understand what you mean. Can you clarify?


In my idea, I would use the identity mapping to access the "old" 
variables,

where "old" means non physically colored. init_ttbr is an example. When
Xen it's copied on the new physical space, init_ttbr is copied with 
it and
if the boot cpu modifies this variable, it's actually touching the 
colored
one and not the old one. This means that secondary CPUs that still 
haven't
jumped to the new space, won't be able to see the new value and will 
never

go online.
So to access this "old" init_ttbr variable I need it's identity 
address,
which is its current virtual address + some physical offset. I was 
asking

you if this is the right approach to use the identity mapping.


Secondary CPUs would directly start on the colored Xen. So they will be
able to access the "new" init_ttbr & co.


How can this be true? I mean, in call_psci_cpu_on() I can start those 
CPUs in

the colored space, but they still use the boot_* pagetables


Are you looking at the 64-bit or 32-bit code? For 64-bit, staging is not 
using boot_* pagetable anymore for secondary CPUs. Instead, they 
directly jump to the runtime page-tables.



and there I can't
easily link the new colored space, or, at least, I'm not succeding in 
doing
that. What I tried at the moment is to link xen_xenmap in boot_second 
after

switch_ttbr because of the problem I described above. But then secondary
CPUs never go online...


It would be helpful if you share some code.




[...]

... as I wrote ealier your current approach seems to have a flaw. 
As you

overwrite xen_bootmodule->{start, size}. setup_mm() will end up to add
the old Xen region to the boot allocator. This is before any secondary
CPUs are booted up.

IOW, the allocator may provide some memory from the old Xen and 
nothing

good will happen from that.

The only way to solve it is to add another module. So the memory is
skipped by setup_mm(). However see below.



Yes that should be memory that in the end would not be needed so 
it must

return to the boot-allocator (if that's what you mean). But how to do
that?


You can't really discard the old temporary Xen. This may work today
because we don't support CPU hotplug or suspend/resume. But there was
some series on the ML to enable it and I don't see any reason why
someone would not want to use the features with cache coloring.

So the old temporary Xen would have to be kept around forever. This is
up to 8MB of memory wasted.

The right approach is to have the secondary CPU boot code 
(including the
variables it is using) fitting in the same page (or possibly 
multiple so

long this is small and physically contiguous). With that it doesn't
matter where is the trampoline, it could stay at the old place, but we
would only waste a few pages rather than up 8MB as it is today.


So what are you 

[PATCH 2/3] xen/arm64: head: Use PRINT_ID() for secondary CPU MMU-off boot code

2024-01-16 Thread Julien Grall
From: Julien Grall 

With the upcoming work to color Xen, the binary will not be anymore
physically contiguous. This will be a problem during boot as the
assembly code will need to work out where each piece of Xen reside.

An easy way to solve the issue is to have all code/data accessed
by the secondary CPUs while the MMU is off within a single page.

Right now, most of the early printk messages are using PRINT() which
will add the message in .rodata. This is unlikely to be within the
same page as the rest of the idmap.

So replace all the PRINT() that can be reachable by the secondary
CPU with MMU-off with PRINT_ID().

Signed-off-by: Julien Grall 
---
 xen/arch/arm/arm64/head.S   | 14 +++---
 xen/arch/arm/arm64/mmu/head.S   |  2 +-
 xen/arch/arm/include/asm/arm64/macros.h |  9 ++---
 3 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index cfc04c755400..fa8b00b6f1db 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -289,9 +289,9 @@ GLOBAL(init_secondary)
 
 #ifdef CONFIG_EARLY_PRINTK
 ldr   x23, =CONFIG_EARLY_UART_BASE_ADDRESS /* x23 := UART base address 
*/
-PRINT("- CPU ")
+PRINT_ID("- CPU ")
 print_reg x24
-PRINT(" booting -\r\n")
+PRINT_ID(" booting -\r\n")
 #endif
 blcheck_cpu_mode
 blcpu_init
@@ -314,10 +314,10 @@ ENDPROC(init_secondary)
  * Clobbers x0 - x5
  */
 check_cpu_mode:
-PRINT("- Current EL ")
+PRINT_ID("- Current EL ")
 mrs   x5, CurrentEL
 print_reg x5
-PRINT(" -\r\n")
+PRINT_ID(" -\r\n")
 
 /* Are we in EL2 */
 cmp   x5, #PSR_MODE_EL2t
@@ -326,8 +326,8 @@ check_cpu_mode:
 ret
 1:
 /* OK, we're boned. */
-PRINT("- Xen must be entered in NS EL2 mode -\r\n")
-PRINT("- Please update the bootloader -\r\n")
+PRINT_ID("- Xen must be entered in NS EL2 mode -\r\n")
+PRINT_ID("- Please update the bootloader -\r\n")
 b fail
 ENDPROC(check_cpu_mode)
 
@@ -361,7 +361,7 @@ ENDPROC(zero_bss)
  * Clobbers x0 - x3
  */
 cpu_init:
-PRINT("- Initialize CPU -\r\n")
+PRINT_ID("- Initialize CPU -\r\n")
 
 /* Set up memory attribute type tables */
 ldr   x0, =MAIRVAL
diff --git a/xen/arch/arm/arm64/mmu/head.S b/xen/arch/arm/arm64/mmu/head.S
index 92b62ae94ce5..fa40b696ddc8 100644
--- a/xen/arch/arm/arm64/mmu/head.S
+++ b/xen/arch/arm/arm64/mmu/head.S
@@ -276,7 +276,7 @@ ENDPROC(create_page_tables)
 enable_mmu:
 mov   x4, x0
 mov   x5, x1
-PRINT("- Turning on paging -\r\n")
+PRINT_ID("- Turning on paging -\r\n")
 
 /*
  * The state of the TLBs is unknown before turning on the MMU.
diff --git a/xen/arch/arm/include/asm/arm64/macros.h 
b/xen/arch/arm/include/asm/arm64/macros.h
index 10e652041f57..6a0108f778a2 100644
--- a/xen/arch/arm/include/asm/arm64/macros.h
+++ b/xen/arch/arm/include/asm/arm64/macros.h
@@ -39,9 +39,12 @@
  * There are multiple flavors:
  *  - PRINT_SECT(section, string): The @string will be located in @section
  *  - PRINT(): The string will be located in .rodata.str.
- *  - PRINT_ID(): When Xen is running on the Identity Mapping, it is
- *only possible to have a limited amount of Xen. This will create
- *the string in .rodata.idmap which will always be mapped.
+ *  - PRINT_ID(): This will create the string in .rodata.idmap which
+ *will always be accessible. This is used when:
+ *  - Xen is running on the identity mapping because not all of Xen is 
mapped
+ *  - Running with the MMU-off on secondary boots as Xen may not be
+ *physically contiguous in memory (e.g. in the case of cache
+ *coloring).
  *
  * Clobbers x0 - x3
  */
-- 
2.40.1




  1   2   >