date:20240523

[xen-unstable test] 186105: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186105 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186105/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186078

Tests which did not succeed, but are not blocking:
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-raw   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186078
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186078
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186078
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186078
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186078
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186105  2024-05-23 09:38:07 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  fail
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  blocked 
 build-i386-libvirt   pass
 build-amd64-prev pass
 build-i386-prev  pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops

Re: [PATCH v4 3/9] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-23 Thread Stefano Stabellini

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Henry,
> 
> On 23/05/2024 08:40, Henry Wang wrote:
> > Currently, the number of SPIs allocated to the domain is only
> > configurable for Dom0less DomUs. Xen domains are supposed to be
> > platform agnostics and therefore the numbers of SPIs for libxl
> > guests should not be based on the hardware.
> > 
> > Introduce a new xl config entry for Arm to provide a method for
> > user to decide the number of SPIs. This would help to avoid
> > bumping the `config->arch.nr_spis` in libxl everytime there is a
> > new platform with increased SPI numbers.
> > 
> > Update the doc and the golang bindings accordingly.
> > 
> > Signed-off-by: Henry Wang 
> > Reviewed-by: Jason Andryuk 
> > ---
> > v4:
> > - Add Jason's Reviewed-by tag.
> > v3:
> > - Reword documentation to avoid ambiguity.
> > v2:
> > - New patch to replace the original patch in v1:
> >"[PATCH 05/15] tools/libs/light: Increase nr_spi to 160"
> > ---
> >   docs/man/xl.cfg.5.pod.in | 14 ++
> >   tools/golang/xenlight/helpers.gen.go |  2 ++
> >   tools/golang/xenlight/types.gen.go   |  1 +
> >   tools/libs/light/libxl_arm.c |  4 ++--
> >   tools/libs/light/libxl_types.idl |  1 +
> >   tools/xl/xl_parse.c  |  3 +++
> >   6 files changed, 23 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index 8f2b375ce9..416d582844 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -3072,6 +3072,20 @@ raised.
> > =back
> >   +=over 4
> > +
> > +=item B
> > +
> > +An optional 32-bit integer parameter specifying the number of SPIs (Shared
> 
> We can't support that much SPIs :). The limit would be 991 SPIs.

I change it


> > +Peripheral Interrupts) to allocate for the domain. If the value specified
> > by
> > +the `nr_spis` parameter is smaller than the number of SPIs calculated by
> > the
> > +toolstack based on the devices allocated for the domain, or the `nr_spis`
> > +parameter is not specified, the value calculated by the toolstack will be
> > used
> > +for the domain. Otherwise, the value specified by the `nr_spis` parameter
> > will
> > +be used.
> 
> I think it would be worth mentioning that the number of SPIs should match the
> highest interrupt ID that will be assigned to the domain (rather than the
> number of SPIs planned to be assigned).

I added it


> > +
> > +=back
> > +
> >   =head3 x86
> > =over 4
> > diff --git a/tools/golang/xenlight/helpers.gen.go
> > b/tools/golang/xenlight/helpers.gen.go
> > index b9cb5b33c7..fe5110474d 100644
> > --- a/tools/golang/xenlight/helpers.gen.go
> > +++ b/tools/golang/xenlight/helpers.gen.go
> > @@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
> >   x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
> >   x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
> >   x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
> > +x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
> >   if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil
> > {
> >   return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
> >   }
> > @@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
> >   xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
> >   xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
> >   xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
> > +xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
> >   if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
> >   return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
> >   }
> > diff --git a/tools/golang/xenlight/types.gen.go
> > b/tools/golang/xenlight/types.gen.go
> > index 5b293755d7..c9e45b306f 100644
> > --- a/tools/golang/xenlight/types.gen.go
> > +++ b/tools/golang/xenlight/types.gen.go
> > @@ -597,6 +597,7 @@ ArchArm struct {
> >   GicVersion GicVersion
> >   Vuart VuartType
> >   SveVl SveType
> > +NrSpis uint32
> >   }
> >   ArchX86 struct {
> >   MsrRelaxed Defbool
> > diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
> > index 1cb89fa584..a4029e3ac8 100644
> > --- a/tools/libs/light/libxl_arm.c
> > +++ b/tools/libs/light/libxl_arm.c
> > @@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
> > LOG(DEBUG, "Configure the domain");
> >   -config->arch.nr_spis = nr_spis;
> > -LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
> > +config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);
> 
> I am not entirely sure about using max(). To me if the user specifies a lower
> limit, then we should throw an error because this is likely an indication that
> the SPIs they will want to assign will clash with the emulated ones.
> 
> So it would be better to warn at domain creation rather than waiting until the
> IRQs are assigned.
> 
> I would like Anthony's opinion on this one. Given he is away this month, I
> guess

Re: [PATCH v4 5/9] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

2024-05-23 Thread Stefano Stabellini

On Thu, 23 May 2024, Julien Grall wrote:
> Hi Henry,
> 
> On 23/05/2024 08:40, Henry Wang wrote:
> > In order to support the dynamic dtbo device assignment to a running
> > VM, the add/remove of the DT overlay and the attach/detach of the
> > device from the DT overlay should happen separately. Therefore,
> > repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
> > overlay to Xen device tree
> 
> I think it would be worth mentioning in the commit message why changing the
> sysctl behavior is fine. The feature is experimental and therefore breaking
> compatibility is ok.

Added


> > , instead of assigning the device to the
> > hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with
> > operations XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment
> > to the domain.
> > 
> > The hypervisor firstly checks the DT overlay passed from the toolstack
> > is valid. Then the device nodes are retrieved from the overlay tracker
> > based on the DT overlay. The attach of the device is implemented by
> > mapping the IRQ and IOMMU resources.
> 
> So, the expectation is the user will always want to attach all the devices in
> the overlay to a single domain. Is that correct?

Yes, also added to the commit message

> > 
> > Signed-off-by: Henry Wang 
> > Signed-off-by: Vikram Garhwal 
> > ---
> > v4:
> > - Split the original patch, only do the device attachment.
> > v3:
> > - Style fixes for arch-selection #ifdefs.
> > - Do not include public/domctl.h, only add a forward declaration of
> >struct xen_domctl_dt_overlay.
> > - Extract the overlay track entry finding logic to a function, drop
> >the unused variables.
> > - Use op code 1&2 for XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH}.
> > v2:
> > - New patch.
> > ---
> >   xen/arch/arm/domctl.c|   3 +
> >   xen/common/dt-overlay.c  | 199 ++-
> >   xen/include/public/domctl.h  |  14 +++
> >   xen/include/public/sysctl.h  |  11 +-
> >   xen/include/xen/dt-overlay.h |   7 ++
> >   5 files changed, 176 insertions(+), 58 deletions(-)
> > 
> > diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
> > index ad56efb0f5..12a12ee781 100644
> > --- a/xen/arch/arm/domctl.c
> > +++ b/xen/arch/arm/domctl.c
> > @@ -5,6 +5,7 @@
> >* Copyright (c) 2012, Citrix Systems
> >*/
> >   +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct
> > domain *d,
> > return rc;
> >   }
> > +case XEN_DOMCTL_dt_overlay:
> > +return dt_overlay_domctl(d, >u.dt_overlay);
> >   default:
> >   return subarch_do_domctl(domctl, d, u_domctl);
> >   }
> > diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
> > index 9cece79067..1087f9b502 100644
> > --- a/xen/common/dt-overlay.c
> > +++ b/xen/common/dt-overlay.c
> > @@ -356,6 +356,42 @@ static int overlay_get_nodes_info(const void *fdto,
> > char **nodes_full_path)
> >   return 0;
> >   }
> >   +/* This function should be called with the overlay_lock taken */
> > +static struct overlay_track *
> > +find_track_entry_from_tracker(const void *overlay_fdt,
> > +  uint32_t overlay_fdt_size)
> > +{
> > +struct overlay_track *entry, *temp;
> > +bool found_entry = false;
> > +
> > +ASSERT(spin_is_locked(_lock));
> > +
> > +/*
> > + * First check if dtbo is correct i.e. it should one of the dtbo which
> > was
> > + * used when dynamically adding the node.
> > + * Limitation: Cases with same node names but different property are
> > not
> > + * supported currently. We are relying on user to provide the same dtbo
> > + * as it was used when adding the nodes.
> > + */
> > +list_for_each_entry_safe( entry, temp, _tracker, entry )
> > +{
> > +if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0
> > )
> > +{
> > +found_entry = true;
> > +break;
> > +}
> > +}
> > +
> > +if ( !found_entry )
> > +{
> > +printk(XENLOG_ERR "Cannot find any matching tracker with input
> > dtbo."
> > +   " Operation is supported only for prior added dtbo.\n");
> > +return NULL;
> > +}
> > +
> > +return entry;
> > +}
> > +
> >   /* Check if node itself can be removed and remove node from IOMMU. */
> >   static int remove_node_resources(struct dt_device_node *device_node)
> >   {
> > @@ -485,8 +521,7 @@ static long handle_remove_overlay_nodes(const void
> > *overlay_fdt,
> >   uint32_t overlay_fdt_size)
> >   {
> >   int rc;
> > -struct overlay_track *entry, *temp, *track;
> > -bool found_entry = false;
> > +struct overlay_track *entry;
> > rc = check_overlay_fdt(overlay_fdt, overlay_fdt_size);
> >   if ( rc )
> > @@ -494,29 +529,10 @@ static long handle_remove_overlay_nodes(const void
> > *overlay_fdt,
> >

Re: [PATCH v4 9/9] docs: Add device tree overlay documentation

2024-05-23 Thread Stefano Stabellini

On Thu, 23 May 2024, Julien Grall wrote:
> Hi Henry,
> 
> On 23/05/2024 08:40, Henry Wang wrote:
> > From: Vikram Garhwal 
> > 
> > Signed-off-by: Vikram Garhwal 
> > Signed-off-by: Stefano Stabellini 
> > Signed-off-by: Henry Wang 
> > ---
> > v4:
> > - No change.
> > v3:
> > - No change.
> > v2:
> > - Update the content based on the changes in this version.
> > ---
> >   docs/misc/arm/overlay.txt | 99 +++
> >   1 file changed, 99 insertions(+)
> >   create mode 100644 docs/misc/arm/overlay.txt
> > 
> > diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
> > new file mode 100644
> > index 00..811a6de369
> > --- /dev/null
> > +++ b/docs/misc/arm/overlay.txt
> > @@ -0,0 +1,99 @@
> > +# Device Tree Overlays support in Xen
> > +
> > +Xen now supports dynamic device assignment to running domains,
> 
> This reads as we "support" the feature. I would prefer if we write "Xen
> expirementally supports..." or similar.

Done


> > +i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and
> > +attaching/detaching them to/from a running domain with given $domid.
> > +
> > +Dynamic node assignment works in two steps:
> > +
> > +## Add/Remove device tree overlay to/from Xen device tree
> > +
> > +1. Xen tools check the dtbo given and parse all other user provided
> > arguments
> > +2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
> > +3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
> > +
> > +## Attach/Detach device from the DT overlay to/from domain
> > +
> > +1. Xen tools check the dtbo given and parse all other user provided
> > arguments
> > +2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
> > +3. Xen hypervisor attach/detach the device to/from the user-provided $domid
> > by
> > +   mapping/unmapping node resources in the DT overlay.
> > +
> > +# Examples
> > +
> > +Here are a few examples on how to use it.
> > +
> > +## Dom0 device add
> > +
> > +For assigning a device tree overlay to Dom0, user should firstly properly
> > +prepare the DT overlay. More information about device tree overlays can be
> > +found in [1]. Then, in Dom0, enter the following:
> > +
> > +(dom0) xl dt-overlay add overlay.dtbo
> > +
> > +This will allocate the devices mentioned in overlay.dtbo to Xen device
> > tree.
> > +
> > +To assign the newly added device from the dtbo to Dom0:
> > +
> > +(dom0) xl dt-overlay attach overlay.dtbo 0
> > +
> > +Next, if the user wants to add the same device tree overlay to dom0
> > +Linux, execute the following:
> > +
> > +(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
> > +(dom0) cat overlay.dtbo >
> > /sys/kernel/config/device-tree/overlays/new_overlay/dtbo
> > +
> > +Finally if needed, the relevant Linux kernel drive can be loaded using:
> > +
> > +(dom0) modprobe module_name.ko
> > +
> > +## Dom0 device remove
> > +
> > +For removing the device from Dom0, first detach the device from Dom0:
> > +
> > +(dom0) xl dt-overlay detach overlay.dtbo 0
> > +
> > +NOTE: The user is expected to unload any Linux kernel modules which
> > +might be accessing the devices in overlay.dtbo before detach the device.
> > +Detaching devices without unloading the modules might result in a crash.
> > +
> > +Then remove the overlay from Xen device tree:
> > +
> > +(dom0) xl dt-overlay remove overlay.dtbo
> > +
> > +## DomU device add/remove
> > +
> > +All the nodes in dtbo will be assigned to a domain; the user will need
> > +to prepare the dtb for the domU. For example, the `interrupt-parent`
> > property
> > +of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`.
> > +Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
> > +
> > +User will need to create the DomU with below properties properly configured
> > +in the xl config file:
> > +- `iomem`
> 
> I don't quite understand how the user can specify the MMIO region if the
> device is attached after the domain is created.

I think this was meant for a domain about to be created (not already
running). I clarified.


> 
> > +- `passthrough` (if IOMMU is needed)
> > +
> > +User will also need to modprobe the relevant drivers.
> > +
> > +Example for domU device add:
> > +
> > +(dom0) xl dt-overlay add overlay.dtbo# If not executed
> > before
> > +(dom0) xl dt-overlay attach overlay.dtbo $domid
> 
> Can how clarify how the MMIO will be mapped? Is it direct mapped? If so,
> couldn't this result to clash with other part of the address space (e.g.
> RAM?).

Yes, it is reusing the same code as dom0, which makes the code nice but
it doesn't support non-1:1 mappings. I think those should be done via
the xen,reg property. My suggestion would be this:

- if xen,reg is present, use it
- if xen,reg is not present, fall back to 1:1 mapping based on reg

For the next version of the series, I'd just document the current
limitation of the implementation. I added this to patch

Re: [PATCH v4 8/9] tools: Introduce the "xl dt-overlay {attach,detach}" commands

2024-05-23 Thread Stefano Stabellini

On Fri, 24 May 2024, Julien Grall wrote:
> Hi Henry,
> 
> On 23/05/2024 08:40, Henry Wang wrote:
> > With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
> > attach/detach devices from the provided DT overlay to domains.
> > Support this by introducing a new set of "xl dt-overlay" commands and
> > related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly
> > rework the command option parsing logic.
> > 
> > Signed-off-by: Henry Wang 
> > Reviewed-by: Jason Andryuk 

Reviewed-by: Stefano Stabellini 


> > ---
> > v4:
> > - Add Jason's Reviewed-by tag.
> > v3:
> > - Introduce new API libxl_dt_overlay_domain() and co., instead of
> >reusing existing API libxl_dt_overlay().
> > - Add in-code comments for the LIBXL_DT_OVERLAY_* macros.
> > - Use find_domain() to avoid getting domain_id from strtol().
> > v2:
> > - New patch.
> > ---
> >   tools/include/libxl.h   | 10 +++
> >   tools/include/xenctrl.h |  3 +++
> >   tools/libs/ctrl/xc_dt_overlay.c | 31 +
> >   tools/libs/light/libxl_dt_overlay.c | 28 +++
> >   tools/xl/xl_cmdtable.c  |  4 +--
> >   tools/xl/xl_vmcontrol.c | 42 -
> >   6 files changed, 104 insertions(+), 14 deletions(-)
> > 
> > diff --git a/tools/include/libxl.h b/tools/include/libxl.h
> > index 62cb07dea6..6cc6d6bf6a 100644
> > --- a/tools/include/libxl.h
> > +++ b/tools/include/libxl.h
> 
> I think you also need to introduce LIBXL_HAVE_...

Added

I have removed the LIBXL_DT_OVERLAY_DOMAIN_DETACH and the relate
mentions. I kept Jasons' ack.

[PATCH v5 7/7] docs: Add device tree overlay documentation

2024-05-23 Thread Stefano Stabellini

From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
 docs/misc/arm/overlay.txt | 82 +++
 1 file changed, 82 insertions(+)
 create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..0a2dee951a
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,82 @@
+# Device Tree Overlays support in Xen
+
+Xen experimentally supports dynamic device assignment to running
+domains, i.e. adding/removing nodes (using .dtbo) to/from Xen device
+tree, and attaching them to a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach device from the DT overlay to domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach the device to the user-provided $domid by
+   mapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to a domain; the user will need
+to prepare the dtb for the domU. For example, the `interrupt-parent`
+property of the DomU overlay should be changed to the Xen hardcoded
+value `0xfde8`, and the xen,reg property should be added to specify the
+address mappings. If xen,reg is not present, it is assumed 1:1 mapping.
+Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
+
+For new domains to be created, the user will need to create the DomU
+with below properties properly configured in the xl config file:
+- `iomem`
+- `passthrough` (if IOMMU is needed)
+
+User will also need to modprobe the relevant drivers. For already
+running domains, the user can use the xl dt-overlay attach command,
+example:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay.dtbo $domid
+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+[1] https://www.kernel.org/doc/Documentation/devicetree/overlay-notes.txt
-- 
2.25.1

[PATCH v5 1/7] tools/xl: Correct the help information and exit code of the dt-overlay command

2024-05-23 Thread Stefano Stabellini

From: Henry Wang 

Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
overlay support")
Suggested-by: Anthony PERARD 
Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
Reviewed-by: Stefano Stabellini 
---
 tools/xl/xl_cmdtable.c  | 2 +-
 tools/xl/xl_vmcontrol.c | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 62bdb2aeaa..1f3c6b5897 100644
--- a/tools/xl/xl_cmdtable.c
+++ b/tools/xl/xl_cmdtable.c
@@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = {
 { "dt-overlay",
   _dt_overlay, 0, 1,
   "Add/Remove a device tree overlay",
-  "add/remove <.dtbo>"
+  "add/remove <.dtbo>",
   "-h print this help\n"
 },
 #endif
diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
index 98f6bd2e76..02575d5d36 100644
--- a/tools/xl/xl_vmcontrol.c
+++ b/tools/xl/xl_vmcontrol.c
@@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
 const int overlay_remove_op = 2;
 
 if (argc < 2) {
-help("dt_overlay");
+help("dt-overlay");
 return EXIT_FAILURE;
 }
 
@@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
 fprintf(stderr, "failed to read the overlay device tree file %s\n",
 overlay_config_file);
 free(overlay_dtb);
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 } else {
 fprintf(stderr, "overlay dtbo file not provided\n");
-return ERROR_FAIL;
+return EXIT_FAILURE;
 }
 
 rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
-- 
2.25.1

[PATCH v5 5/7] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

2024-05-23 Thread Stefano Stabellini

From: Henry Wang 

In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.

Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.

Also take the opportunity to make one coding style fix in sysctl.h.

xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
---
 xen/arch/arm/domctl.c|   3 +
 xen/common/dt-overlay.c  | 207 ++-
 xen/include/public/domctl.h  |  16 ++-
 xen/include/public/sysctl.h  |  11 +-
 xen/include/xen/dt-overlay.h |   8 ++
 5 files changed, 186 insertions(+), 59 deletions(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..12a12ee781 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2012, Citrix Systems
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct 
domain *d,
 
 return rc;
 }
+case XEN_DOMCTL_dt_overlay:
+return dt_overlay_domctl(d, >u.dt_overlay);
 default:
 return subarch_do_domctl(domctl, d, u_domctl);
 }
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 9cece79067..c2b03865a7 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -356,6 +356,42 @@ static int overlay_get_nodes_info(const void *fdto, char 
**nodes_full_path)
 return 0;
 }
 
+/* This function should be called with the overlay_lock taken */
+static struct overlay_track *
+find_track_entry_from_tracker(const void *overlay_fdt,
+  uint32_t overlay_fdt_size)
+{
+struct overlay_track *entry, *temp;
+bool found_entry = false;
+
+ASSERT(spin_is_locked(_lock));
+
+/*
+ * First check if dtbo is correct i.e. it should one of the dtbo which was
+ * used when dynamically adding the node.
+ * Limitation: Cases with same node names but different property are not
+ * supported currently. We are relying on user to provide the same dtbo
+ * as it was used when adding the nodes.
+ */
+list_for_each_entry_safe( entry, temp, _tracker, entry )
+{
+if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
+{
+found_entry = true;
+break;
+}
+}
+
+if ( !found_entry )
+{
+printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
+   " Operation is supported only for prior added dtbo.\n");
+return NULL;
+}
+
+return entry;
+}
+
 /* Check if node itself can be removed and remove node from IOMMU. */
 static int remove_node_resources(struct dt_device_node *device_node)
 {
@@ -485,8 +521,7 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
 uint32_t overlay_fdt_size)
 {
 int rc;
-struct overlay_track *entry, *temp, *track;
-bool found_entry = false;
+struct overlay_track *entry;
 
 rc = check_overlay_fdt(overlay_fdt, overlay_fdt_size);
 if ( rc )
@@ -494,29 +529,10 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
 
 spin_lock(_lock);
 
-/*
- * First check if dtbo is correct i.e. it should one of the dtbo which was
- * used when dynamically adding the node.
- * Limitation: Cases with same node names but different property are not
- * supported currently. We are relying on user to provide the same dtbo
- * as it was used when adding the nodes.
- */
-list_for_each_entry_safe( entry, temp, _tracker, entry )
-{
-if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
-{
-track = entry;
-found_entry = true;
-break;
-}
-}
-
-if ( !found_entry )
+entry = find_track_entry_from_tracker(overlay_fdt, overlay_fdt_size);
+if ( entry == NULL )
 {
 rc = -EINVAL;
-
-printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
-   " Removing nodes is supported only for prior added dtbo.\n");
 goto out;
 
 }
@@

[PATCH v5 4/7] xen/arm/gic: Allow adding interrupt to running VMs

2024-05-23 Thread Stefano Stabellini

From: Henry Wang 

Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.

Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Julien Grall 
---
 xen/arch/arm/gic-vgic.c  | 9 +++--
 xen/arch/arm/gic.c   | 8 
 xen/arch/arm/vgic/vgic.c | 7 +--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/xen/arch/arm/gic-vgic.c b/xen/arch/arm/gic-vgic.c
index 56490dbc43..b99e287224 100644
--- a/xen/arch/arm/gic-vgic.c
+++ b/xen/arch/arm/gic-vgic.c
@@ -442,9 +442,14 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu *v, 
unsigned int virq,
 
 if ( connect )
 {
-/* The VIRQ should not be already enabled by the guest */
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest.
+ */
 if ( !p->desc &&
- !test_bit(GIC_IRQ_GUEST_ENABLED, >status) )
+ !test_bit(GIC_IRQ_GUEST_ENABLED, >status) &&
+ !test_bit(GIC_IRQ_GUEST_VISIBLE, >status) &&
+ !test_bit(GIC_IRQ_GUEST_ACTIVE, >status) )
 p->desc = desc;
 else
 ret = -EBUSY;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 44c40e86de..b3467a76ae 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -135,14 +135,6 @@ int gic_route_irq_to_guest(struct domain *d, unsigned int 
virq,
 ASSERT(virq < vgic_num_irqs(d));
 ASSERT(!is_lpi(virq));
 
-/*
- * When routing an IRQ to guest, the virtual state is not synced
- * back to the physical IRQ. To prevent get unsync, restrict the
- * routing to when the Domain is been created.
- */
-if ( d->creation_finished )
-return -EBUSY;
-
 ret = vgic_connect_hw_irq(d, NULL, virq, desc, true);
 if ( ret )
 return ret;
diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index b9463a5f27..6cabd0496d 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
*vcpu,
 
 if ( connect )  /* assign a mapped IRQ */
 {
-/* The VIRQ should not be already enabled by the guest */
-if ( !irq->hw && !irq->enabled )
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest.
+ */
+if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch )
 {
 irq->hw = true;
 irq->hwintid = desc->irq;
-- 
2.25.1

[PATCH v5 6/7] tools: Introduce the "xl dt-overlay attach" command

2024-05-23 Thread Stefano Stabellini

From: Henry Wang 

With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Jason Andryuk 
Reviewed-by: Stefano Stabellini 
---
 tools/include/libxl.h   | 15 +++
 tools/include/xenctrl.h |  3 +++
 tools/libs/ctrl/xc_dt_overlay.c | 31 +++
 tools/libs/light/libxl_dt_overlay.c | 28 +
 tools/xl/xl_cmdtable.c  |  4 +--
 tools/xl/xl_vmcontrol.c | 39 -
 6 files changed, 106 insertions(+), 14 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 3b5c18b48b..f2e19ec592 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -643,6 +643,12 @@
  */
 #define LIBXL_HAVE_NR_SPIS 1
 
+/*
+ * LIBXL_HAVE_OVERLAY_DOMAIN indicates the presence of
+ * libxl_dt_overlay_domain.
+ */
+#define LIBXL_HAVE_OVERLAY_DOMAIN 1
+
 /*
  * libxl memory management
  *
@@ -2556,8 +2562,17 @@ libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, 
uint32_t domid,
 void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
 #if defined(__arm__) || defined(__aarch64__)
+/* Values should keep consistent with the op from XEN_SYSCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_ADD   1
+#define LIBXL_DT_OVERLAY_REMOVE2
 int libxl_dt_overlay(libxl_ctx *ctx, void *overlay,
  uint32_t overlay_size, uint8_t overlay_op);
+
+/* Values should keep consistent with the op from XEN_DOMCTL_dt_overlay */
+#define LIBXL_DT_OVERLAY_DOMAIN_ATTACH 1
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op);
 #endif
 
 /*
diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 4996855944..9ceca0cffc 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -2657,6 +2657,9 @@ int xc_domain_cacheflush(xc_interface *xch, uint32_t 
domid,
 #if defined(__arm__) || defined(__aarch64__)
 int xc_dt_overlay(xc_interface *xch, void *overlay_fdt,
   uint32_t overlay_fdt_size, uint8_t overlay_op);
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id);
 #endif
 
 /* Compat shims */
diff --git a/tools/libs/ctrl/xc_dt_overlay.c b/tools/libs/ctrl/xc_dt_overlay.c
index c2224c4d15..ea1da522d1 100644
--- a/tools/libs/ctrl/xc_dt_overlay.c
+++ b/tools/libs/ctrl/xc_dt_overlay.c
@@ -48,3 +48,34 @@ err:
 
 return err;
 }
+
+int xc_dt_overlay_domain(xc_interface *xch, void *overlay_fdt,
+ uint32_t overlay_fdt_size, uint8_t overlay_op,
+ uint32_t domain_id)
+{
+int err;
+struct xen_domctl domctl = {
+.cmd = XEN_DOMCTL_dt_overlay,
+.domain = domain_id,
+.u.dt_overlay = {
+.overlay_op = overlay_op,
+.overlay_fdt_size = overlay_fdt_size,
+}
+};
+
+DECLARE_HYPERCALL_BOUNCE(overlay_fdt, overlay_fdt_size,
+ XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( (err = xc_hypercall_bounce_pre(xch, overlay_fdt)) )
+goto err;
+
+set_xen_guest_handle(domctl.u.dt_overlay.overlay_fdt, overlay_fdt);
+
+if ( (err = do_domctl(xch, )) != 0 )
+PERROR("%s failed", __func__);
+
+err:
+xc_hypercall_bounce_post(xch, overlay_fdt);
+
+return err;
+}
diff --git a/tools/libs/light/libxl_dt_overlay.c 
b/tools/libs/light/libxl_dt_overlay.c
index a6c709a6dc..00503b76bd 100644
--- a/tools/libs/light/libxl_dt_overlay.c
+++ b/tools/libs/light/libxl_dt_overlay.c
@@ -69,3 +69,31 @@ out:
 return rc;
 }
 
+int libxl_dt_overlay_domain(libxl_ctx *ctx, uint32_t domain_id,
+void *overlay_dt, uint32_t overlay_dt_size,
+uint8_t overlay_op)
+{
+int rc;
+int r;
+GC_INIT(ctx);
+
+if (check_overlay_fdt(gc, overlay_dt, overlay_dt_size)) {
+LOG(ERROR, "Overlay DTB check failed");
+rc = ERROR_FAIL;
+goto out;
+} else {
+LOG(DEBUG, "Overlay DTB check passed");
+rc = 0;
+}
+
+r = xc_dt_overlay_domain(ctx->xch, overlay_dt, overlay_dt_size, overlay_op,
+ domain_id);
+if (r) {
+LOG(ERROR, "%s: Attaching/Detaching overlay dtb failed.", __func__);
+rc = ERROR_FAIL;
+}
+
+out:
+GC_FREE;
+return rc;
+}
diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
index 1f3c6b5897..42751228c1 100644
--- a/tools/xl/xl_cmdtable.c
+++

[PATCH v5 3/7] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-23 Thread Stefano Stabellini

From: Henry Wang 

Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang 
Signed-off-by: Stefano Stabellini 
Reviewed-by: Jason Andryuk 
---
 docs/man/xl.cfg.5.pod.in | 16 
 tools/golang/xenlight/helpers.gen.go |  2 ++
 tools/golang/xenlight/types.gen.go   |  1 +
 tools/include/libxl.h|  7 +++
 tools/libs/light/libxl_arm.c |  4 ++--
 tools/libs/light/libxl_types.idl |  1 +
 tools/xl/xl_parse.c  |  3 +++
 7 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 8f2b375ce9..ac3f88fd57 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3072,6 +3072,22 @@ raised.
 
 =back
 
+=over 4
+
+=item B
+
+An optional integer parameter specifying the number of SPIs (Shared
+Peripheral Interrupts) to allocate for the domain. Max is 991 SPIs. If
+the value specified by the `nr_spis` parameter is smaller than the
+number of SPIs calculated by the toolstack based on the devices
+allocated for the domain, or the `nr_spis` parameter is not specified,
+the value calculated by the toolstack will be used for the domain.
+Otherwise, the value specified by the `nr_spis` parameter will be used.
+The number of SPIs should match the highest interrupt ID that will be
+assigned to the domain.
+
+=back
+
 =head3 x86
 
 =over 4
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b9cb5b33c7..fe5110474d 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
+x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
 if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
@@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
+xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
 if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
 return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 5b293755d7..c9e45b306f 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -597,6 +597,7 @@ ArchArm struct {
 GicVersion GicVersion
 Vuart VuartType
 SveVl SveType
+NrSpis uint32
 }
 ArchX86 struct {
 MsrRelaxed Defbool
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..3b5c18b48b 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -636,6 +636,13 @@
  */
 #define LIBXL_HAVE_XEN_9PFS 1
 
+/*
+ * LIBXL_HAVE_NR_SPIS indicates the presence of the nr_spis field in
+ * libxl_domain_build_info that specifies the number of SPIs interrupts
+ * for the guest.
+ */
+#define LIBXL_HAVE_NR_SPIS 1
+
 /*
  * libxl memory management
  *
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
 
 LOG(DEBUG, "Configure the domain");
 
-config->arch.nr_spis = nr_spis;
-LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);
+LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);
 
 switch (d_config->b_info.arch_arm.gic_version) {
 case LIBXL_GIC_VERSION_DEFAULT:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 79e9c656cc..4e65e6fda5 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -722,6 +722,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("arch_arm", Struct(None, [("gic_version", libxl_gic_version),
("vuart", libxl_vuart_type),
("sve_vl", libxl_sve_type),
+   ("nr_spis", uint32),
   ])),
 ("arch_x86", Struct(None, [("msr_relaxed", libxl_defbool),

[PATCH v5 2/7] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-23 Thread Stefano Stabellini

From: Henry Wang 

There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang 
Reviewed-by: Julien Grall 
---
 docs/misc/arm/device-tree/booting.txt | 16 
 xen/arch/arm/dom0less-build.c | 11 +--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/docs/misc/arm/device-tree/booting.txt 
b/docs/misc/arm/device-tree/booting.txt
index bbd955e9c2..f1fd069c87 100644
--- a/docs/misc/arm/device-tree/booting.txt
+++ b/docs/misc/arm/device-tree/booting.txt
@@ -260,6 +260,22 @@ with the following properties:
 value specified by Xen command line parameter gnttab_max_maptrack_frames
 (or its default value if unspecified, i.e. 1024) is used.
 
+- passthrough
+
+A string property specifying whether IOMMU mappings are enabled for the
+domain and hence whether it will be enabled for passthrough hardware.
+Possible property values are:
+
+- "enabled"
+IOMMU mappings are enabled for the domain. Note that this option is the
+default if the user provides the device partial passthrough device tree
+for the domain.
+
+- "disabled"
+IOMMU mappings are disabled for the domain and so hardware may not be
+passed through. This option is the default if this property is missing
+and the user does not provide the device partial device tree for the 
domain.
+
 Under the "xen,domain" compatible node, one or more sub-nodes are present
 for the DomU kernel and ramdisk.
 
diff --git a/xen/arch/arm/dom0less-build.c b/xen/arch/arm/dom0less-build.c
index 74f053c242..5830a7051d 100644
--- a/xen/arch/arm/dom0less-build.c
+++ b/xen/arch/arm/dom0less-build.c
@@ -848,6 +848,8 @@ static int __init construct_domU(struct domain *d,
 void __init create_domUs(void)
 {
 struct dt_device_node *node;
+const char *dom0less_iommu;
+bool iommu = false;
 const struct dt_device_node *cpupool_node,
 *chosen = dt_find_node_by_path("/chosen");
 
@@ -895,8 +897,13 @@ void __init create_domUs(void)
 panic("Missing property 'cpus' for domain %s\n",
   dt_node_name(node));
 
-if ( dt_find_compatible_node(node, NULL, "multiboot,device-tree") &&
- iommu_enabled )
+if ( !dt_property_read_string(node, "passthrough", _iommu) &&
+ !strcmp(dom0less_iommu, "enabled") )
+iommu = true;
+
+if ( iommu_enabled &&
+ (iommu || dt_find_compatible_node(node, NULL,
+   "multiboot,device-tree")) )
 d_cfg.flags |= XEN_DOMCTL_CDF_iommu;
 
 if ( !dt_property_read_u32(node, "nr_spis", _cfg.arch.nr_spis) )
-- 
2.25.1

[PATCH v5 0/7] Remaining patches for dynamic node programming using overlay dtbo

2024-05-23 Thread Stefano Stabellini

Hi all,

This is the remaining series for the full functional "dynamic node
programming using overlay dtbo" feature. The first part [1] has
already been merged.

Quoting from the original series, the first part has already made
Xen aware of new device tree node which means updating the dt_host
with overlay node information, and in this series, the goal is to
map IRQ and IOMMU during runtime, where we will do the actual IOMMU
and IRQ mapping and unmapping to a running domain. Also, documentation
of the "dynamic node programming using overlay dtbo" feature is added.

During the discussion in v3, I was recommended to split the overlay
devices attach/detach to/from running domains to separated patches [3].
But I decided to only expose the xl user interfaces together to the
users after device attach/detach is fully functional, so I didn't
split the toolstack patch (#8).

Patch 1 is a fix of the existing code which is noticed during my local
tests, details please see the commit message.

Gitlab CI for this series can be found in [2].

[1] 
https://lore.kernel.org/xen-devel/20230906011631.30310-1-vikram.garh...@amd.com/
[2] https://gitlab.com/xen-project/people/henryw/xen/-/pipelines/1301720278
[3] 
https://lore.kernel.org/xen-devel/e743d3d2-5884-4e55-8627-85985ba33...@amd.com/


Changes in v5:
- address Julien's comments
- remove patches and mentions of the "detach" operation
- add a check for xen,reg and return error if present

- Stefano

Re: [PATCH v4 1/9] tools/xl: Correct the help information and exit code of the dt-overlay command

2024-05-23 Thread Stefano Stabellini

On Thu, 23 May 2024, Henry Wang wrote:
> Fix the name mismatch in the xl dt-overlay command, the
> command name should be "dt-overlay" instead of "dt_overlay".
> Add the missing "," in the cmdtable.
> 
> Fix the exit code of the dt-overlay command, use EXIT_FAILURE
> instead of ERROR_FAIL.
> 
> Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree 
> overlay support")
> Suggested-by: Anthony PERARD 
> Signed-off-by: Henry Wang 
> Reviewed-by: Jason Andryuk 

Reviewed-by: Stefano Stabellini 


> ---
> v4:
> - No change.
> v3:
> - Add Jason's Reviewed-by tag.
> v2:
> - New patch
> ---
>  tools/xl/xl_cmdtable.c  | 2 +-
>  tools/xl/xl_vmcontrol.c | 6 +++---
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/xl/xl_cmdtable.c b/tools/xl/xl_cmdtable.c
> index 62bdb2aeaa..1f3c6b5897 100644
> --- a/tools/xl/xl_cmdtable.c
> +++ b/tools/xl/xl_cmdtable.c
> @@ -635,7 +635,7 @@ const struct cmd_spec cmd_table[] = {
>  { "dt-overlay",
>_dt_overlay, 0, 1,
>"Add/Remove a device tree overlay",
> -  "add/remove <.dtbo>"
> +  "add/remove <.dtbo>",
>"-h print this help\n"
>  },
>  #endif
> diff --git a/tools/xl/xl_vmcontrol.c b/tools/xl/xl_vmcontrol.c
> index 98f6bd2e76..02575d5d36 100644
> --- a/tools/xl/xl_vmcontrol.c
> +++ b/tools/xl/xl_vmcontrol.c
> @@ -1278,7 +1278,7 @@ int main_dt_overlay(int argc, char **argv)
>  const int overlay_remove_op = 2;
>  
>  if (argc < 2) {
> -help("dt_overlay");
> +help("dt-overlay");
>  return EXIT_FAILURE;
>  }
>  
> @@ -1302,11 +1302,11 @@ int main_dt_overlay(int argc, char **argv)
>  fprintf(stderr, "failed to read the overlay device tree file 
> %s\n",
>  overlay_config_file);
>  free(overlay_dtb);
> -return ERROR_FAIL;
> +return EXIT_FAILURE;
>  }
>  } else {
>  fprintf(stderr, "overlay dtbo file not provided\n");
> -return ERROR_FAIL;
> +return EXIT_FAILURE;
>  }
>  
>  rc = libxl_dt_overlay(ctx, overlay_dtb, overlay_dtb_size, op);
> -- 
> 2.34.1
> 
>

[xen-4.17-testing test] 186109: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186109 xen-4.17-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186109/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-arm64-pvops 6 kernel-build fail REGR. vs. 185864
 build-amd64   6 xen-build  fail in 186087 REGR. vs. 185864
 build-amd64-xsm   6 xen-build  fail in 186087 REGR. vs. 185864
 build-i3866 xen-build  fail in 186087 REGR. vs. 185864
 build-amd64-prev  6 xen-build  fail in 186087 REGR. vs. 185864
 build-i386-xsm6 xen-build  fail in 186087 REGR. vs. 185864
 build-i386-prev   6 xen-build  fail in 186087 REGR. vs. 185864

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-credit1  10 host-ping-check-xenfail pass in 186087
 test-armhf-armhf-xl   8 xen-boot   fail pass in 186087

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)blocked in 186087 n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
in 186087 n/a
 test-xtf-amd64-amd64-51 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-migrupgrade  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1)   blocked in 186087 n/a
 build-amd64-libvirt   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 1 build-check(1) blocked in 
186087 n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)  blocked in 186087 n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)blocked in 186087 n/a
 build-i386-libvirt1 build-check(1)   blocked in 186087 n/a
 test-xtf-amd64-amd64-11 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64 1 build-check(1) blocked in 186087 
n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1) blocked in 186087 n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)  blocked in 186087 n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)  blocked in 186087 n/a
 test-amd64-amd64-xl-raw   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 1 build-check(1) blocked in 
186087 n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm 1 build-check(1) blocked in 
186087 n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)  blocked in 186087 n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1) blocked in 186087 n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)  blocked in 186087 n/a
 test-xtf-amd64-amd64-41 build-check(1)   blocked in 186087 n/a
 test-xtf-amd64-amd64-31 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 1 build-check(1) blocked in 186087 
n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1) blocked in 186087 n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
in 186087 n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)  blocked in 186087 n/a
 test-xtf-amd64-amd64-21 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-livepatch1 build-check(1)   blocked in 186087 n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked in 
186087 n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)

Re: [XEN PATCH v2 07/15] x86: guard cpu_has_{svm/vmx} macros with CONFIG_{SVM/VMX}

2024-05-23 Thread Stefano Stabellini

On Thu, 23 May 2024, Jan Beulich wrote:
> On 23.05.2024 15:07, Sergiy Kibrik wrote:
> > 16.05.24 14:12, Jan Beulich:
> >> On 15.05.2024 11:12, Sergiy Kibrik wrote:
> >>> --- a/xen/arch/x86/include/asm/cpufeature.h
> >>> +++ b/xen/arch/x86/include/asm/cpufeature.h
> >>> @@ -81,7 +81,8 @@ static inline bool boot_cpu_has(unsigned int feat)
> >>>   #define cpu_has_sse3boot_cpu_has(X86_FEATURE_SSE3)
> >>>   #define cpu_has_pclmulqdq   boot_cpu_has(X86_FEATURE_PCLMULQDQ)
> >>>   #define cpu_has_monitor boot_cpu_has(X86_FEATURE_MONITOR)
> >>> -#define cpu_has_vmx boot_cpu_has(X86_FEATURE_VMX)
> >>> +#define cpu_has_vmx ( IS_ENABLED(CONFIG_VMX) && \
> >>> +  boot_cpu_has(X86_FEATURE_VMX))
> >>>   #define cpu_has_eistboot_cpu_has(X86_FEATURE_EIST)
> >>>   #define cpu_has_ssse3   boot_cpu_has(X86_FEATURE_SSSE3)
> >>>   #define cpu_has_fma boot_cpu_has(X86_FEATURE_FMA)
> >>> @@ -109,7 +110,8 @@ static inline bool boot_cpu_has(unsigned int feat)
> >>>   
> >>>   /* CPUID level 0x8001.ecx */
> >>>   #define cpu_has_cmp_legacy  boot_cpu_has(X86_FEATURE_CMP_LEGACY)
> >>> -#define cpu_has_svm boot_cpu_has(X86_FEATURE_SVM)
> >>> +#define cpu_has_svm ( IS_ENABLED(CONFIG_SVM) && \
> >>> +  boot_cpu_has(X86_FEATURE_SVM))
> >>>   #define cpu_has_sse4a   boot_cpu_has(X86_FEATURE_SSE4A)
> >>>   #define cpu_has_xop boot_cpu_has(X86_FEATURE_XOP)
> >>>   #define cpu_has_skinit  boot_cpu_has(X86_FEATURE_SKINIT)
> >>
> >> Hmm, leaving aside the style issue (stray blanks after opening parentheses,
> >> and as a result one-off indentation on the wrapped lines) I'm not really
> >> certain we can do this. The description goes into detail why we would want
> >> this, but it doesn't cover at all why it is safe for all present (and
> >> ideally also future) uses. I wouldn't be surprised if we had VMX/SVM checks
> >> just to derive further knowledge from that, without them being directly
> >> related to the use of VMX/SVM. Take a look at calculate_hvm_max_policy(),
> >> for example. While it looks to be okay there, it may give you an idea of
> >> what I mean.
> >>
> >> Things might become better separated if instead for such checks we used
> >> host and raw CPU policies instead of cpuinfo_x86.x86_capability[]. But
> >> that's still pretty far out, I'm afraid.
> > 
> > I've followed a suggestion you made for patch in previous series:
> > 
> > https://lore.kernel.org/xen-devel/8fbd604e-5e5d-410c-880f-2ad257bbe...@suse.com/
> 
> See the "If not, ..." that I had put there. Doing the change just mechanically
> isn't enough, you also need to make clear (in the description) that you
> verified it's safe to have this way.

What does it mean to "verified it's safe to have this way"? "Safe" in
what way?


> > yet if this approach can potentially be unsafe (I'm not completely sure 
> > it's safe), should we instead fallback to the way it was done in v1 
> > series? I.e. guard calls to vmx/svm-specific calls where needed, like in 
> > these 3 patches:
> > 
> > 1) 
> > https://lore.kernel.org/xen-devel/20240416063328.3469386-1-sergiy_kib...@epam.com/
> > 
> > 2) 
> > https://lore.kernel.org/xen-devel/20240416063740.3469592-1-sergiy_kib...@epam.com/
> > 
> > 3) 
> > https://lore.kernel.org/xen-devel/20240416063947.3469718-1-sergiy_kib...@epam.com/
> 
> I don't like this sprinkling around of IS_ENABLED() very much. Maybe we want
> to have two new helpers (say using_svm() and using_vmx()), to be used in place
> of most but possibly not all cpu_has_{svm,vmx}? Doing such a transformation
> would then kind of implicitly answer the safety question above, as at every
> use site you'd need to judge whether the replacement is correct. If it's
> correct everywhere, the construct(s) as proposed in this version could then be
> considered to be used in this very shape (instead of introducing the two new
> helpers). But of course the transition could also be done gradually then,
> touching only those uses that previously you touched in 1), 2), and 3).

Re: [PATCH v4 8/9] tools: Introduce the "xl dt-overlay {attach,detach}" commands

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach/detach devices from the provided DT overlay to domains.
Support this by introducing a new set of "xl dt-overlay" commands and
related documentation, i.e. "xl dt-overlay {attach,detach}". Slightly
rework the command option parsing logic.

Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
---
v4:
- Add Jason's Reviewed-by tag.
v3:
- Introduce new API libxl_dt_overlay_domain() and co., instead of
   reusing existing API libxl_dt_overlay().
- Add in-code comments for the LIBXL_DT_OVERLAY_* macros.
- Use find_domain() to avoid getting domain_id from strtol().
v2:
- New patch.
---
  tools/include/libxl.h   | 10 +++
  tools/include/xenctrl.h |  3 +++
  tools/libs/ctrl/xc_dt_overlay.c | 31 +
  tools/libs/light/libxl_dt_overlay.c | 28 +++
  tools/xl/xl_cmdtable.c  |  4 +--
  tools/xl/xl_vmcontrol.c | 42 -
  6 files changed, 104 insertions(+), 14 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 62cb07dea6..6cc6d6bf6a 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h


I think you also need to introduce LIBXL_HAVE_...

Cheers,

--
Julien Grall

Re: [PATCH v4 3/9] tools/arm: Introduce the "nr_spis" xl config entry

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang 
Reviewed-by: Jason Andryuk 
---
v4:
- Add Jason's Reviewed-by tag.
v3:
- Reword documentation to avoid ambiguity.
v2:
- New patch to replace the original patch in v1:
   "[PATCH 05/15] tools/libs/light: Increase nr_spi to 160"
---
  docs/man/xl.cfg.5.pod.in | 14 ++
  tools/golang/xenlight/helpers.gen.go |  2 ++
  tools/golang/xenlight/types.gen.go   |  1 +
  tools/libs/light/libxl_arm.c |  4 ++--
  tools/libs/light/libxl_types.idl |  1 +
  tools/xl/xl_parse.c  |  3 +++
  6 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 8f2b375ce9..416d582844 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -3072,6 +3072,20 @@ raised.
  
  =back
  
+=over 4

+
+=item B
+
+An optional 32-bit integer parameter specifying the number of SPIs (Shared


We can't support that much SPIs :). The limit would be 991 SPIs.


+Peripheral Interrupts) to allocate for the domain. If the value specified by
+the `nr_spis` parameter is smaller than the number of SPIs calculated by the
+toolstack based on the devices allocated for the domain, or the `nr_spis`
+parameter is not specified, the value calculated by the toolstack will be used
+for the domain. Otherwise, the value specified by the `nr_spis` parameter will
+be used.


I think it would be worth mentioning that the number of SPIs should 
match the highest interrupt ID that will be assigned to the domain 
(rather than the number of SPIs planned to be assigned).



+
+=back
+
  =head3 x86
  
  =over 4

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index b9cb5b33c7..fe5110474d 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1154,6 +1154,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
  x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
  x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
  x.ArchArm.SveVl = SveType(xc.arch_arm.sve_vl)
+x.ArchArm.NrSpis = uint32(xc.arch_arm.nr_spis)
  if err := x.ArchX86.MsrRelaxed.fromC(_x86.msr_relaxed);err != nil {
  return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
  }
@@ -1670,6 +1671,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
  xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
  xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
  xc.arch_arm.sve_vl = C.libxl_sve_type(x.ArchArm.SveVl)
+xc.arch_arm.nr_spis = C.uint32_t(x.ArchArm.NrSpis)
  if err := x.ArchX86.MsrRelaxed.toC(_x86.msr_relaxed); err != nil {
  return fmt.Errorf("converting field ArchX86.MsrRelaxed: %v", err)
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 5b293755d7..c9e45b306f 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -597,6 +597,7 @@ ArchArm struct {
  GicVersion GicVersion
  Vuart VuartType
  SveVl SveType
+NrSpis uint32
  }
  ArchX86 struct {
  MsrRelaxed Defbool
diff --git a/tools/libs/light/libxl_arm.c b/tools/libs/light/libxl_arm.c
index 1cb89fa584..a4029e3ac8 100644
--- a/tools/libs/light/libxl_arm.c
+++ b/tools/libs/light/libxl_arm.c
@@ -181,8 +181,8 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
  
  LOG(DEBUG, "Configure the domain");
  
-config->arch.nr_spis = nr_spis;

-LOG(DEBUG, " - Allocate %u SPIs", nr_spis);
+config->arch.nr_spis = max(nr_spis, d_config->b_info.arch_arm.nr_spis);


I am not entirely sure about using max(). To me if the user specifies a 
lower limit, then we should throw an error because this is likely an 
indication that the SPIs they will want to assign will clash with the 
emulated ones.


So it would be better to warn at domain creation rather than waiting 
until the IRQs are assigned.


I would like Anthony's opinion on this one. Given he is away this month, 
I guess we could get this patch merged (with other comments addressed) 
and have a follow-up if wanted before 4.19.



+LOG(DEBUG, " - Allocate %u SPIs", config->arch.nr_spis);
  
  switch (d_config->b_info.arch_arm.gic_version) {

  case LIBXL_GIC_VERSION_DEFAULT:
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 79e9c656cc..4e65e6fda5 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -722,6 +722,7

[linux-linus test] 186103: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186103 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186103/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 186052
 build-amd64   6 xen-buildfail REGR. vs. 186052
 build-i3866 xen-buildfail REGR. vs. 186052
 build-i386-xsm6 xen-buildfail REGR. vs. 186052
 build-armhf   6 xen-buildfail REGR. vs. 186052

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-raw   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-bios  1 build-check(1)   blocked  n/a
 test-amd64-amd64-examine-uefi  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-armhf-armhf-examine  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-raw

Re: [PATCH v4 9/9] docs: Add device tree overlay documentation

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

From: Vikram Garhwal 

Signed-off-by: Vikram Garhwal 
Signed-off-by: Stefano Stabellini 
Signed-off-by: Henry Wang 
---
v4:
- No change.
v3:
- No change.
v2:
- Update the content based on the changes in this version.
---
  docs/misc/arm/overlay.txt | 99 +++
  1 file changed, 99 insertions(+)
  create mode 100644 docs/misc/arm/overlay.txt

diff --git a/docs/misc/arm/overlay.txt b/docs/misc/arm/overlay.txt
new file mode 100644
index 00..811a6de369
--- /dev/null
+++ b/docs/misc/arm/overlay.txt
@@ -0,0 +1,99 @@
+# Device Tree Overlays support in Xen
+
+Xen now supports dynamic device assignment to running domains,


This reads as we "support" the feature. I would prefer if we write "Xen 
expirementally supports..." or similar.



+i.e. adding/removing nodes (using .dtbo) to/from Xen device tree, and
+attaching/detaching them to/from a running domain with given $domid.
+
+Dynamic node assignment works in two steps:
+
+## Add/Remove device tree overlay to/from Xen device tree
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor applies/removes the dtbo to/from Xen device tree.
+
+## Attach/Detach device from the DT overlay to/from domain
+
+1. Xen tools check the dtbo given and parse all other user provided arguments
+2. Xen tools pass the dtbo to Xen hypervisor via hypercall.
+3. Xen hypervisor attach/detach the device to/from the user-provided $domid by
+   mapping/unmapping node resources in the DT overlay.
+
+# Examples
+
+Here are a few examples on how to use it.
+
+## Dom0 device add
+
+For assigning a device tree overlay to Dom0, user should firstly properly
+prepare the DT overlay. More information about device tree overlays can be
+found in [1]. Then, in Dom0, enter the following:
+
+(dom0) xl dt-overlay add overlay.dtbo
+
+This will allocate the devices mentioned in overlay.dtbo to Xen device tree.
+
+To assign the newly added device from the dtbo to Dom0:
+
+(dom0) xl dt-overlay attach overlay.dtbo 0
+
+Next, if the user wants to add the same device tree overlay to dom0
+Linux, execute the following:
+
+(dom0) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(dom0) cat overlay.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally if needed, the relevant Linux kernel drive can be loaded using:
+
+(dom0) modprobe module_name.ko
+
+## Dom0 device remove
+
+For removing the device from Dom0, first detach the device from Dom0:
+
+(dom0) xl dt-overlay detach overlay.dtbo 0
+
+NOTE: The user is expected to unload any Linux kernel modules which
+might be accessing the devices in overlay.dtbo before detach the device.
+Detaching devices without unloading the modules might result in a crash.
+
+Then remove the overlay from Xen device tree:
+
+(dom0) xl dt-overlay remove overlay.dtbo
+
+## DomU device add/remove
+
+All the nodes in dtbo will be assigned to a domain; the user will need
+to prepare the dtb for the domU. For example, the `interrupt-parent` property
+of the DomU overlay should be changed to the Xen hardcoded value `0xfde8`.
+Below assumes the properly written DomU dtbo is `overlay_domu.dtbo`.
+
+User will need to create the DomU with below properties properly configured
+in the xl config file:
+- `iomem`


I don't quite understand how the user can specify the MMIO region if the 
device is attached after the domain is created.



+- `passthrough` (if IOMMU is needed)
+
+User will also need to modprobe the relevant drivers.
+
+Example for domU device add:
+
+(dom0) xl dt-overlay add overlay.dtbo# If not executed before
+(dom0) xl dt-overlay attach overlay.dtbo $domid


Can how clarify how the MMIO will be mapped? Is it direct mapped? If so, 
couldn't this result to clash with other part of the address space (e.g. 
RAM?).



+(dom0) xl console $domid # To access $domid console
+
+Next, if the user needs to modify/prepare the overlay.dtbo suitable for
+the domU:
+
+(domU) mkdir -p /sys/kernel/config/device-tree/overlays/new_overlay
+(domU) cat overlay_domu.dtbo > 
/sys/kernel/config/device-tree/overlays/new_overlay/dtbo
+
+Finally, if needed, the relevant Linux kernel drive can be probed:
+
+(domU) modprobe module_name.ko
+
+Example for domU overlay remove:
+
+(dom0) xl dt-overlay detach overlay.dtbo $domid
+(dom0) xl dt-overlay remove overlay.dtbo


I assume we have safety check in place to ensure we can't remove the 
device if it is already attached. Is that correct?


Cheers,

--
Julien Grall

Re: [PATCH v4 7/9] xen/arm: Support device detachment from domains

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

Similarly as the device attachment from DT overlay to domain, this
commit implements the device detachment from domain. The DOMCTL
XEN_DOMCTL_dt_overlay op is extended to have the operation
XEN_DOMCTL_DT_OVERLAY_DETACH. The detachment of the device is
implemented by unmapping the IRQ and IOMMU resources. Note that with
these changes, the device de-registration from the IOMMU driver should
only happen at the time when the DT overlay is removed from the Xen
device tree.

Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
---
v4:
- Split the original patch, only do device detachment from domain.
---
  xen/common/dt-overlay.c | 243 
  xen/include/public/domctl.h |   3 +-
  2 files changed, 194 insertions(+), 52 deletions(-)

diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 1087f9b502..693b6e4777 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -392,24 +392,100 @@ find_track_entry_from_tracker(const void *overlay_fdt,
  return entry;
  }
  
+static int remove_irq(unsigned long s, unsigned long e, void *data)

+{
+struct domain *d = data;
+int rc = 0;
+
+/*
+ * IRQ should always have access unless there are duplication of
+ * of irqs in device tree. There are few cases of xen device tree
+ * where there are duplicate interrupts for the same node.
+ */
+if (!irq_access_permitted(d, s))


Because of this check, it means that ...


+return 0;
+/*
+ * TODO: We don't handle shared IRQs for now. So, it is assumed that
+ * the IRQs was not shared with another domain.
+ */
+rc = irq_deny_access(d, s);
+if ( rc )
+{
+printk(XENLOG_ERR "unable to revoke access for irq %ld\n", s);
+return rc;
+}
+
+rc = release_guest_irq(d, s);


... release_guest_irq() fails on the next retry it will pass. I don't 
think this is what we want.


Instead, we probably want to re-order the call.


+if ( rc )
+{
+printk(XENLOG_ERR "unable to release irq %ld\n", s);
+return rc;
+}
+
+return rc;
+}
+
+static int remove_all_irqs(struct rangeset *irq_ranges, struct domain *d)
+{
+return rangeset_report_ranges(irq_ranges, 0, ~0UL, remove_irq, d);
+}
+
+static int remove_iomem(unsigned long s, unsigned long e, void *data)
+{
+struct domain *d = data;
+int rc = 0;
+p2m_type_t t;
+mfn_t mfn;
+
+mfn = p2m_lookup(d, _gfn(s), );


What are you trying to addres with this check? For instance, the fact 
that the first MFN is mapped, doesn't guarantee the rest is.



+if ( mfn_x(mfn) == 0 || mfn_x(mfn) == ~0UL )


I don't understand why we are checking for 0 here. In theory, it is 
valid MFN. Also, the second part wants to be INVALID_MFN.



+return -EINVAL;
+
+rc = iomem_deny_access(d, s, e);


iomem_deny_access() works on MFN but here you pass an MFN. Are you 
assuming the GFN == MFN? How would that work for domains that are not 
direct mapped?



+if ( rc )
+{
+printk(XENLOG_ERR "Unable to remove %pd access to %#lx - %#lx\n",
+   d, s, e);
+return rc;
+}
+
+rc = unmap_mmio_regions(d, _gfn(s), e - s, _mfn(s));
+if ( rc )
+return rc;
+
+return rc;
+}
+
+static int remove_all_iomems(struct rangeset *iomem_ranges, struct domain *d)
+{
+return rangeset_report_ranges(iomem_ranges, 0, ~0UL, remove_iomem, d);
+}
+
  /* Check if node itself can be removed and remove node from IOMMU. */
-static int remove_node_resources(struct dt_device_node *device_node)
+static int remove_node_resources(struct dt_device_node *device_node,
+ struct domain *d)
  {
  int rc = 0;
  unsigned int len;
  domid_t domid;
  
-domid = dt_device_used_by(device_node);

+if ( !d )


I looked at the code, I am a bit unsure how "d" can be NULL. Do you have 
any pointer?



+{
+domid = dt_device_used_by(device_node);
  
-dt_dprintk("Checking if node %s is used by any domain\n",

-   device_node->full_name);
+dt_dprintk("Checking if node %s is used by any domain\n",
+   device_node->full_name);
  
-/* Remove the node if only it's assigned to hardware domain or domain io. */

-if ( domid != hardware_domain->domain_id && domid != DOMID_IO )
-{
-printk(XENLOG_ERR "Device %s is being used by domain %u. Removing nodes 
failed\n",
-   device_node->full_name, domid);
-return -EINVAL;
+/*
+ * We also check if device is assigned to DOMID_IO as when a domain
+ * is destroyed device is assigned to DOMID_IO.
+ */
+if ( domid != DOMID_IO )
+{
+printk(XENLOG_ERR "Device %s is being assigned to %u. Device is 
assigned to %d\n",
+   device_node->full_name, DOMID_IO, domid);
+return -EINVAL;
+}
  }

Re: [PATCH v4 5/9] xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree


I think it would be worth mentioning in the commit message why changing 
the sysctl behavior is fine. The feature is experimental and therefore 
breaking compatibility is ok.



, instead of assigning the device to the
hardware domain at the same time. Add the XEN_DOMCTL_dt_overlay with
operations XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment
to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources.


So, the expectation is the user will always want to attach all the 
devices in the overlay to a single domain. Is that correct?




Signed-off-by: Henry Wang 
Signed-off-by: Vikram Garhwal 
---
v4:
- Split the original patch, only do the device attachment.
v3:
- Style fixes for arch-selection #ifdefs.
- Do not include public/domctl.h, only add a forward declaration of
   struct xen_domctl_dt_overlay.
- Extract the overlay track entry finding logic to a function, drop
   the unused variables.
- Use op code 1&2 for XEN_DOMCTL_DT_OVERLAY_{ATTACH,DETACH}.
v2:
- New patch.
---
  xen/arch/arm/domctl.c|   3 +
  xen/common/dt-overlay.c  | 199 ++-
  xen/include/public/domctl.h  |  14 +++
  xen/include/public/sysctl.h  |  11 +-
  xen/include/xen/dt-overlay.h |   7 ++
  5 files changed, 176 insertions(+), 58 deletions(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..12a12ee781 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -5,6 +5,7 @@
   * Copyright (c) 2012, Citrix Systems
   */
  
+#include 

  #include 
  #include 
  #include 
@@ -176,6 +177,8 @@ long arch_do_domctl(struct xen_domctl *domctl, struct 
domain *d,
  
  return rc;

  }
+case XEN_DOMCTL_dt_overlay:
+return dt_overlay_domctl(d, >u.dt_overlay);
  default:
  return subarch_do_domctl(domctl, d, u_domctl);
  }
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c
index 9cece79067..1087f9b502 100644
--- a/xen/common/dt-overlay.c
+++ b/xen/common/dt-overlay.c
@@ -356,6 +356,42 @@ static int overlay_get_nodes_info(const void *fdto, char 
**nodes_full_path)
  return 0;
  }
  
+/* This function should be called with the overlay_lock taken */

+static struct overlay_track *
+find_track_entry_from_tracker(const void *overlay_fdt,
+  uint32_t overlay_fdt_size)
+{
+struct overlay_track *entry, *temp;
+bool found_entry = false;
+
+ASSERT(spin_is_locked(_lock));
+
+/*
+ * First check if dtbo is correct i.e. it should one of the dtbo which was
+ * used when dynamically adding the node.
+ * Limitation: Cases with same node names but different property are not
+ * supported currently. We are relying on user to provide the same dtbo
+ * as it was used when adding the nodes.
+ */
+list_for_each_entry_safe( entry, temp, _tracker, entry )
+{
+if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
+{
+found_entry = true;
+break;
+}
+}
+
+if ( !found_entry )
+{
+printk(XENLOG_ERR "Cannot find any matching tracker with input dtbo."
+   " Operation is supported only for prior added dtbo.\n");
+return NULL;
+}
+
+return entry;
+}
+
  /* Check if node itself can be removed and remove node from IOMMU. */
  static int remove_node_resources(struct dt_device_node *device_node)
  {
@@ -485,8 +521,7 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
  uint32_t overlay_fdt_size)
  {
  int rc;
-struct overlay_track *entry, *temp, *track;
-bool found_entry = false;
+struct overlay_track *entry;
  
  rc = check_overlay_fdt(overlay_fdt, overlay_fdt_size);

  if ( rc )
@@ -494,29 +529,10 @@ static long handle_remove_overlay_nodes(const void 
*overlay_fdt,
  
  spin_lock(_lock);
  
-/*

- * First check if dtbo is correct i.e. it should one of the dtbo which was
- * used when dynamically adding the node.
- * Limitation: Cases with same node names but different property are not
- * supported currently. We are relying on user to provide the same dtbo
- * as it was used when adding the nodes.
- */
-list_for_each_entry_safe( entry, temp, _tracker, entry )
-{
-if ( memcmp(entry->overlay_fdt, overlay_fdt, overlay_fdt_size) == 0 )
-{
-track = entry;
-found_entry =

Re: [PATCH v4 0/9] Remaining patches for dynamic node programming using overlay dtbo

2024-05-23 Thread Julien Grall





On 23/05/2024 08:40, Henry Wang wrote:

Hi all,


Hi Henry,


This is the remaining series for the full functional "dynamic node
programming using overlay dtbo" feature. The first part [1] has
already been merged.

Quoting from the original series, the first part has already made
Xen aware of new device tree node which means updating the dt_host
with overlay node information, and in this series, the goal is to
map IRQ and IOMMU during runtime, where we will do the actual IOMMU
and IRQ mapping and unmapping to a running domain. Also, documentation
of the "dynamic node programming using overlay dtbo" feature is added.

During the discussion in v3, I was recommended to split the overlay
devices attach/detach to/from running domains to separated patches [3].
But I decided to only expose the xl user interfaces together to the
users after device attach/detach is fully functional, so I didn't
split the toolstack patch (#8).


So I was asking to split so we can get some of the work merged for 4.19. 
Can you clarify, whether the intention is to merge only patches #1-5?


Cheers,

--
Julien Grall

Re: [PATCH v4 4/9] xen/arm/gic: Allow adding interrupt to running VMs

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.

Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.

Signed-off-by: Henry Wang 


With one remark below:

Reviewed-by: Julien Grall 


diff --git a/xen/arch/arm/vgic/vgic.c b/xen/arch/arm/vgic/vgic.c
index b9463a5f27..048e12c562 100644
--- a/xen/arch/arm/vgic/vgic.c
+++ b/xen/arch/arm/vgic/vgic.c
@@ -876,8 +876,11 @@ int vgic_connect_hw_irq(struct domain *d, struct vcpu 
*vcpu,
  
  if ( connect )  /* assign a mapped IRQ */

  {
-/* The VIRQ should not be already enabled by the guest */
-if ( !irq->hw && !irq->enabled )
+/*
+ * The VIRQ should not be already enabled by the guest nor
+ * active/pending in the guest


Typo: Missing full stop.

It can be fixed on commit.


+ */
+if ( !irq->hw && !irq->enabled && !irq->active && !irq->pending_latch )
  {
  irq->hw = true;
  irq->hwintid = desc->irq;


Cheers,

--
Julien Grall

Re: [PATCH v4 2/9] xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

2024-05-23 Thread Julien Grall


Hi Henry,

On 23/05/2024 08:40, Henry Wang wrote:

There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang 


Reviewed-by: Julien Grall 

Cheers,

--
Julien Grall

[xen-unstable-smoke test] 186117: tolerable all pass - PUSHED

2024-05-23 Thread osstest service owner

flight 186117 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186117/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  2a40b106e92aaa7ce808c8608dd6473edc67f608
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186064  2024-05-21 15:04:02 Z2 days
Failing since186104  2024-05-23 09:00:22 Z0 days4 attempts
Testing same since   186117  2024-05-23 17:02:09 Z0 days1 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Alessandro Zucchelli 
  Andrew Cooper 
  Bobby Eshleman 
  Christian Lindig 
  George Dunlap 
  Jan Beulich 
  Julien Grall 
  Olaf Hering 
  Oleksandr Andrushchenko 
  Oleksii Kurochko 
  Roger Pau Monné 
  Stewart Hildebrand 
  Tamas K Lengyel 
  Volodymyr Babchuk 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   ced21fbb28..2a40b106e9  2a40b106e92aaa7ce808c8608dd6473edc67f608 -> smoke

[libvirt test] 186099: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186099 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186099/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 186070
 build-amd64   6 xen-buildfail REGR. vs. 186070
 build-i386-xsm6 xen-buildfail REGR. vs. 186070
 build-i3866 xen-buildfail REGR. vs. 186070
 build-armhf   6 xen-buildfail REGR. vs. 186070

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass

version targeted for testing:
 libvirt  66b052263d6ff046c60f4fce263e07c0d9bdd059
baseline version:
 libvirt  7dda4a03ac77bbe14b12b7b8f3a509a0e09f3129

Last test of basis   186070  2024-05-22 04:20:52 Z1 days
Testing same since   186099  2024-05-23 04:18:41 Z0 days1 attempts


People who touched revisions under test:
  Michal Privoznik 

jobs:
 build-amd64-xsm  fail
 build-arm64-xsm  pass
 build-i386-xsm   fail
 build-amd64  fail
 build-arm64  pass
 build-armhf  fail
 build-i386   fail
 build-amd64-libvirt  blocked 
 build-arm64-libvirt  pass
 build-armhf-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-amd64-libvirt blocked 
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-amd64-libvirt-qcow2   blocked 
 test-arm64-arm64-libvirt-qcow2   pass
 test-amd64-amd64-libvirt-raw blocked 
 test-arm64-arm64-libvirt-raw pass
 test-amd64-amd64-libvirt-vhd blocked 
 test-armhf-armhf-libvirt-vhd blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at

Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook

2024-05-23 Thread Andrew Cooper

On 08/05/2024 1:39 pm, Alejandro Vallejo wrote:
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 8a24419c..2f06bff1b2cc 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
>  }
>  
> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t 
> *h)
> +static int cf_check lapic_check_hidden(const struct domain *d,
> +   hvm_domain_context_t *h)
>  {
>  unsigned int vcpuid = hvm_load_instance(h);
> -struct vcpu *v;
> -struct vlapic *s;
> +struct hvm_hw_lapic s;
>  
>  if ( !has_vlapic(d) )
>  return -ENODEV;
>  
>  /* Which vlapic to load? */
> -if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )

As you're editing this anyway, swap for

    if ( !domain_vcpu(d, vcpuid) )

please.

>  {
>  dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
>  d->domain_id, vcpuid);
>  return -EINVAL;
>  }
> -s = vcpu_vlapic(v);
>  
> -if ( hvm_load_entry_zeroextend(LAPIC, h, >hw) != 0 )
> +if ( hvm_load_entry_zeroextend(LAPIC, h, ) )
> +return -ENODATA;
> +
> +/* EN=0 with EXTD=1 is illegal */
> +if ( (s.apic_base_msr & (APIC_BASE_ENABLE | APIC_BASE_EXTD)) ==
> + APIC_BASE_EXTD )
> +return -EINVAL;

This is very insufficient auditing for the incoming value, but it turns
out that there's no nice logic for this at all.

As it's just a less obfuscated form of the logic from
lapic_load_hidden(), it's probably fine to stay as it is for now.

The major changes since this logic was written originally are that the
CPU policy correct (so we can reject EXTD on VMs which can't see
x2apic), and that we now prohibit VMs moving the xAPIC MMIO window away
from its default location (as this would require per-vCPU P2Ms in order
to virtualise properly.)

~Andrew

Re: [PATCH v6 7/8] xen: mapcache: Add support for grant mappings

2024-05-23 Thread Stefano Stabellini

On Thu, 23 May 2024, Edgar E. Iglesias wrote:
> On Thu, May 23, 2024 at 9:47 AM Manos Pitsidianakis 
>  wrote:
>   On Thu, 16 May 2024 18:48, "Edgar E. Iglesias" 
>  wrote:
>   >From: "Edgar E. Iglesias" 
>   >
>   >Add a second mapcache for grant mappings. The mapcache for
>   >grants needs to work with XC_PAGE_SIZE granularity since
>   >we can't map larger ranges than what has been granted to us.
>   >
>   >Like with foreign mappings (xen_memory), machines using grants
>   >are expected to initialize the xen_grants MR and map it
>   >into their address-map accordingly.
>   >
>   >Signed-off-by: Edgar E. Iglesias 
>   >Reviewed-by: Stefano Stabellini 
>   >---
>   > hw/xen/xen-hvm-common.c         |  12 ++-
>   > hw/xen/xen-mapcache.c           | 163 ++--
>   > include/hw/xen/xen-hvm-common.h |   3 +
>   > include/sysemu/xen.h            |   7 ++
>   > 4 files changed, 152 insertions(+), 33 deletions(-)
>   >
>   >diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
>   >index a0a0252da0..b8ace1c368 100644
>   >--- a/hw/xen/xen-hvm-common.c
>   >+++ b/hw/xen/xen-hvm-common.c
>   >@@ -10,12 +10,18 @@
>   > #include "hw/boards.h"
>   > #include "hw/xen/arch_hvm.h"
>   >
>   >-MemoryRegion xen_memory;
>   >+MemoryRegion xen_memory, xen_grants;
>   >
>   >-/* Check for xen memory.  */
>   >+/* Check for any kind of xen memory, foreign mappings or grants.  */
>   > bool xen_mr_is_memory(MemoryRegion *mr)
>   > {
>   >-    return mr == _memory;
>   >+    return mr == _memory || mr == _grants;
>   >+}
>   >+
>   >+/* Check specifically for grants.  */
>   >+bool xen_mr_is_grants(MemoryRegion *mr)
>   >+{
>   >+    return mr == _grants;
>   > }
>   >
>   > void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion 
> *mr,
>   >diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
>   >index a07c47b0b1..1cbc2aeaa9 100644
>   >--- a/hw/xen/xen-mapcache.c
>   >+++ b/hw/xen/xen-mapcache.c
>   >@@ -14,6 +14,7 @@
>   >
>   > #include 
>   >
>   >+#include "hw/xen/xen-hvm-common.h"
>   > #include "hw/xen/xen_native.h"
>   > #include "qemu/bitmap.h"
>   >
>   >@@ -21,6 +22,8 @@
>   > #include "sysemu/xen-mapcache.h"
>   > #include "trace.h"
>   >
>   >+#include 
>   >+#include 
>   >
>   > #if HOST_LONG_BITS == 32
>   > #  define MCACHE_MAX_SIZE     (1UL<<31) /* 2GB Cap */
>   >@@ -41,6 +44,7 @@ typedef struct MapCacheEntry {
>   >     unsigned long *valid_mapping;
>   >     uint32_t lock;
>   > #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
>   >+#define XEN_MAPCACHE_ENTRY_GRANT (1 << 1)
> 
>   Might we get more entry kinds in the future? (for example foreign maps).
>   Maybe this could be an enum.
> 
> 
> Perhaps. Foreign mappings are already supported, this flag separates ordinary 
> foreign mappings from grant foreign mappings.
> IMO, since this is not an external interface it's probably better to change 
> it once we have a concrete use-case at hand.
> 
>  
>   >     uint8_t flags;
>   >     hwaddr size;
>   >     struct MapCacheEntry *next;
>   >@@ -71,6 +75,8 @@ typedef struct MapCache {
>   > } MapCache;
>   >
>   > static MapCache *mapcache;
>   >+static MapCache *mapcache_grants;
>   >+static xengnttab_handle *xen_region_gnttabdev;
>   >
>   > static inline void mapcache_lock(MapCache *mc)
>   > {
>   >@@ -131,6 +137,12 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, 
> void *opaque)
>   >     unsigned long max_mcache_size;
>   >     unsigned int bucket_shift;
>   >
>   >+    xen_region_gnttabdev = xengnttab_open(NULL, 0);
>   >+    if (xen_region_gnttabdev == NULL) {
>   >+        error_report("mapcache: Failed to open gnttab device");
>   >+        exit(EXIT_FAILURE);
>   >+    }
>   >+
>   >     if (HOST_LONG_BITS == 32) {
>   >         bucket_shift = 16;
>   >     } else {
>   >@@ -159,6 +171,15 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f, 
> void *opaque)
>   >     mapcache = xen_map_cache_init_single(f, opaque,
>   >                                          bucket_shift,
>   >                                          max_mcache_size);
>   >+
>   >+    /*
>   >+     * Grant mappings must use XC_PAGE_SIZE granularity since we can't
>   >+     * map anything beyond the number of pages granted to us.
>   >+     */
>   >+    mapcache_grants = xen_map_cache_init_single(f, opaque,
>   >+                                                XC_PAGE_SHIFT,
>   >+                                                max_mcache_size);
>   >+
>   >     setrlimit(RLIMIT_AS, _as);
>   > }
>   >
>   >@@ -168,17

Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-23 Thread Stefano Stabellini

On Thu, 23 May 2024, Roger Pau Monné wrote:
> On Fri, May 17, 2024 at 03:33:51PM +0200, Roger Pau Monne wrote:
> > Enabling it using an HVM param is fragile, and complicates the logic when
> > deciding whether options that interact with altp2m can also be enabled.
> > 
> > Leave the HVM param value for consumption by the guest, but prevent it from
> > being set.  Enabling is now done using and additional altp2m specific field 
> > in
> > xen_domctl_createdomain.
> > 
> > Note that albeit only currently implemented in x86, altp2m could be 
> > implemented
> > in other architectures, hence why the field is added to 
> > xen_domctl_createdomain
> > instead of xen_arch_domainconfig.
> > 
> > Signed-off-by: Roger Pau Monné 
> > ---
> > Changes since v2:
> >  - Introduce a new altp2m field in xen_domctl_createdomain.
> > 
> > Changes since v1:
> >  - New in this version.
> > ---
> >  tools/libs/light/libxl_create.c | 23 ++-
> >  tools/libs/light/libxl_x86.c| 26 --
> >  tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
> >  xen/arch/arm/domain.c   |  6 ++
> 
> Could I get an Ack from one of the Arm maintainers for the trivial Arm
> change?

Acked-by: Stefano Stabellini

[PATCH v1 0/1] xen/arm: smmuv3: Mark more init-only functions with __init

2024-05-23 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

I was scanning for code that we could potentially move from the
.text section into .init.text and found a few candidates.

I'm not sure if this makes sense, perhaps we don't want to mark
these functions for other reasons but my scripts found this chain
of SMMUv3 init functions as only reachable by .inittext code.
Perhaps it's a little late in the release cycle to consider this...

Best regards,
Edgar


Edgar E. Iglesias (1):
  xen/arm: smmuv3: Mark more init-only functions with __init

 xen/drivers/passthrough/arm/smmu-v3.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

-- 
2.40.1

[PATCH v1 1/1] xen/arm: smmuv3: Mark more init-only functions with __init

2024-05-23 Thread Edgar E. Iglesias

From: "Edgar E. Iglesias" 

Move more functions that are only called at init to
the .init.text section.

Signed-off-by: Edgar E. Iglesias 
---
 xen/drivers/passthrough/arm/smmu-v3.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/arm/smmu-v3.c 
b/xen/drivers/passthrough/arm/smmu-v3.c
index 6904962467..cee5724022 100644
--- a/xen/drivers/passthrough/arm/smmu-v3.c
+++ b/xen/drivers/passthrough/arm/smmu-v3.c
@@ -1545,7 +1545,7 @@ static int arm_smmu_dt_xlate(struct device *dev,
 }
 
 /* Probing and initialisation functions */
-static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
+static int __init arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
   struct arm_smmu_queue *q,
   void __iomem *page,
   unsigned long prod_off,
@@ -1588,7 +1588,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device 
*smmu,
return 0;
 }
 
-static int arm_smmu_init_queues(struct arm_smmu_device *smmu)
+static int __init arm_smmu_init_queues(struct arm_smmu_device *smmu)
 {
int ret;
 
@@ -1724,7 +1724,7 @@ static int arm_smmu_init_strtab(struct arm_smmu_device 
*smmu)
return 0;
 }
 
-static int arm_smmu_init_structures(struct arm_smmu_device *smmu)
+static int __init arm_smmu_init_structures(struct arm_smmu_device *smmu)
 {
int ret;
 
@@ -1746,7 +1746,8 @@ static int arm_smmu_write_reg_sync(struct arm_smmu_device 
*smmu, u32 val,
 }
 
 /* GBPA is "special" */
-static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr)
+static int __init arm_smmu_update_gbpa(struct arm_smmu_device *smmu,
+   u32 set, u32 clr)
 {
int ret;
u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA;
@@ -1842,7 +1843,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device 
*smmu)
 static inline void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { }
 #endif /* CONFIG_MSI */
 
-static void arm_smmu_free_irqs(struct arm_smmu_device *smmu)
+static void __init arm_smmu_free_irqs(struct arm_smmu_device *smmu)
 {
int irq;
 
@@ -1926,7 +1927,7 @@ err_free_evtq_irq:
return ret;
 }
 
-static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
+static int __init arm_smmu_setup_irqs(struct arm_smmu_device *smmu)
 {
int ret, irq;
u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN;
@@ -1988,7 +1989,7 @@ static int arm_smmu_device_disable(struct arm_smmu_device 
*smmu)
return ret;
 }
 
-static int arm_smmu_device_reset(struct arm_smmu_device *smmu)
+static int __init arm_smmu_device_reset(struct arm_smmu_device *smmu)
 {
int ret;
u32 reg, enables;
@@ -2405,7 +2406,7 @@ static void arm_smmu_free_structures(struct 
arm_smmu_device *smmu)
xfree(smmu->strtab_cfg.l1_desc);
 }
 
-static int arm_smmu_device_probe(struct platform_device *pdev)
+static int __init arm_smmu_device_probe(struct platform_device *pdev)
 {
int irq, ret;
paddr_t ioaddr, iosize;
-- 
2.40.1

Re: [PATCH v2 6/8] xen/lib: Add topology generator for x86

2024-05-23 Thread Roger Pau Monné

On Wed, May 08, 2024 at 01:39:25PM +0100, Alejandro Vallejo wrote:
> Add a helper to populate topology leaves in the cpu policy from
> threads/core and cores/package counts.
> 
> No functional change, as it's not connected to anything yet.

There is a functional change in test-cpu-policy.c.

Maybe the commit message needs to be updated to reflect the added
testing to test-cpu-policy.c using the newly introduced helper to
generate topologies?

> 
> Signed-off-by: Alejandro Vallejo 
> ---
> v2:
>   * New patch. Extracted from v1/patch6
> ---
>  tools/tests/cpu-policy/test-cpu-policy.c | 128 +++
>  xen/include/xen/lib/x86/cpu-policy.h |  16 +++
>  xen/lib/x86/policy.c |  86 +++
>  3 files changed, 230 insertions(+)
> 
> diff --git a/tools/tests/cpu-policy/test-cpu-policy.c 
> b/tools/tests/cpu-policy/test-cpu-policy.c
> index 301df2c00285..0ba8c418b1b3 100644
> --- a/tools/tests/cpu-policy/test-cpu-policy.c
> +++ b/tools/tests/cpu-policy/test-cpu-policy.c
> @@ -650,6 +650,132 @@ static void test_is_compatible_failure(void)
>  }
>  }
>  
> +static void test_topo_from_parts(void)
> +{
> +static const struct test {
> +unsigned int threads_per_core;
> +unsigned int cores_per_pkg;
> +struct cpu_policy policy;
> +} tests[] = {
> +{
> +.threads_per_core = 3, .cores_per_pkg = 1,
> +.policy = {
> +.x86_vendor = X86_VENDOR_AMD,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 3, .level = 0, .type = 1, 
> .id_shift = 2, },
> +[1] = { .nr_logical = 1, .level = 1, .type = 2, 
> .id_shift = 2, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 1, .cores_per_pkg = 3,
> +.policy = {
> +.x86_vendor = X86_VENDOR_AMD,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 1, .level = 0, .type = 1, 
> .id_shift = 0, },
> +[1] = { .nr_logical = 3, .level = 1, .type = 2, 
> .id_shift = 2, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 7, .cores_per_pkg = 5,
> +.policy = {
> +.x86_vendor = X86_VENDOR_AMD,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 7, .level = 0, .type = 1, 
> .id_shift = 3, },
> +[1] = { .nr_logical = 5, .level = 1, .type = 2, 
> .id_shift = 6, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 2, .cores_per_pkg = 128,
> +.policy = {
> +.x86_vendor = X86_VENDOR_AMD,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 2, .level = 0, .type = 1, 
> .id_shift = 1, },
> +[1] = { .nr_logical = 128, .level = 1, .type = 2, 
> .id_shift = 8, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 3, .cores_per_pkg = 1,
> +.policy = {
> +.x86_vendor = X86_VENDOR_INTEL,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 3, .level = 0, .type = 1, 
> .id_shift = 2, },
> +[1] = { .nr_logical = 3, .level = 1, .type = 2, 
> .id_shift = 2, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 1, .cores_per_pkg = 3,
> +.policy = {
> +.x86_vendor = X86_VENDOR_INTEL,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 1, .level = 0, .type = 1, 
> .id_shift = 0, },
> +[1] = { .nr_logical = 3, .level = 1, .type = 2, 
> .id_shift = 2, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 7, .cores_per_pkg = 5,
> +.policy = {
> +.x86_vendor = X86_VENDOR_INTEL,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 7, .level = 0, .type = 1, 
> .id_shift = 3, },
> +[1] = { .nr_logical = 35, .level = 1, .type = 2, 
> .id_shift = 6, },
> +},
> +},
> +},
> +{
> +.threads_per_core = 2, .cores_per_pkg = 128,
> +.policy = {
> +.x86_vendor = X86_VENDOR_INTEL,
> +.topo.subleaf = {
> +[0] = { .nr_logical = 2, .level = 0, .type = 1, 
> .id_shift = 1, },
> +[1] = { .nr_logical = 256, .level = 1, .type = 2, 
> .id_shift = 8, },

You don't need the array index in the initialization:

.topo.subleaf = {
{ .nr_logical = 2, .level = 0, .type = 1, .id_shift = 1, },
{ .nr_logical = 256, .level = 1, .type = 2,
  .id_shift = 8, },
}

And lines should be limited to 80

Re: [PATCH v10 02/14] xen: introduce generic non-atomic test_*bit()

2024-05-23 Thread Oleksii K.

On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
> 
> 
> On 23/05/2024 15:11, Oleksii K. wrote:
> > On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> > > Hi Oleksii,
> > Hi Julien,
> > 
> > > 
> > > On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > > > diff --git a/xen/arch/arm/arm64/livepatch.c
> > > > b/xen/arch/arm/arm64/livepatch.c
> > > > index df2cebedde..4bc8ed9be5 100644
> > > > --- a/xen/arch/arm/arm64/livepatch.c
> > > > +++ b/xen/arch/arm/arm64/livepatch.c
> > > > @@ -10,7 +10,6 @@
> > > >    #include 
> > > >    #include 
> > > >    
> > > > -#include 
> > > 
> > > It is a bit unclear how this change is related to the patch. Can
> > > you
> > > explain in the commit message?
> > Probably it doesn't need anymore. I will double check and if this
> > change is not needed, I will just drop it in the next patch
> > version.
> > 
> > > 
> > > >    #include 
> > > >    #include 
> > > >    #include 
> > > > diff --git a/xen/arch/arm/include/asm/bitops.h
> > > > b/xen/arch/arm/include/asm/bitops.h
> > > > index 5104334e48..8e16335e76 100644
> > > > --- a/xen/arch/arm/include/asm/bitops.h
> > > > +++ b/xen/arch/arm/include/asm/bitops.h
> > > > @@ -22,9 +22,6 @@
> > > >    #define __set_bit(n,p)    set_bit(n,p)
> > > >    #define __clear_bit(n,p)  clear_bit(n,p)
> > > >    
> > > > -#define BITOP_BITS_PER_WORD 32
> > > > -#define BITOP_MASK(nr)  (1UL << ((nr) %
> > > > BITOP_BITS_PER_WORD))
> > > > -#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > >    #define BITS_PER_BYTE   8
> > > 
> > > OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
> > > expect
> > > the value to change across arch.
> > I can move it to generic one header too in the next patch version.
> > 
> > > 
> > > [...]
> > > 
> > > > diff --git a/xen/include/xen/bitops.h
> > > > b/xen/include/xen/bitops.h
> > > > index f14ad0d33a..6eeeff0117 100644
> > > > --- a/xen/include/xen/bitops.h
> > > > +++ b/xen/include/xen/bitops.h
> > > > @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned
> > > > long
> > > > x)
> > > >     * scope
> > > >     */
> > > >    
> > > > +#define BITOP_BITS_PER_WORD 32
> > > > +typedef uint32_t bitop_uint_t;
> > > > +
> > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > BITOP_BITS_PER_WORD))
> > > > +
> > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > +
> > > > +extern void __bitop_bad_size(void);
> > > > +
> > > > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > > > sizeof(bitop_uint_t))
> > > > +
> > > >    /* - Please tidy above here 
> > > > -
> > > >  */
> > > >    
> > > >    #include 
> > > >    
> > > > +/**
> > > > + * generic__test_and_set_bit - Set a bit and return its old
> > > > value
> > > > + * @nr: Bit to set
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
> > > 
> > > Sorry for only mentioning this on v10. I think this comment
> > > should be
> > > duplicated (or moved to) on top of test_bit() because this is
> > > what
> > > everyone will use. This will avoid the developper to follow the
> > > function
> > > calls and only notice the x86 version which says "This function
> > > is
> > > atomic and may not be reordered." and would be wrong for all the
> > > other arch.
> > It makes sense to add this comment on top of test_bit(), but I am
> > curious if it is needed to mention that for x86 arch_test_bit() "is
> > atomic and may not be reordered":
> 
> I would say no because any developper modifying common code can't 
> relying it.
> 
> > 
> >   * This operation is non-atomic and can be reordered. ( Exception:
> > for
> > * x86 arch_test_bit() is atomic and may not be reordered )
> >   * If two examples of this operation race, one can appear to
> > succeed
> >   * but actually fail.  You must protect multiple accesses with a
> > lock.
> >   */
> > 
> > > 
> > > > +static always_inline bool
> > > > +generic__test_and_set_bit(int nr, volatile void *addr)
> > > > +{
> > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > > > BITOP_WORD(nr);
> > > > +    bitop_uint_t old = *p;
> > > > +
> > > > +    *p = old | mask;
> > > > +    return (old & mask);
> > > > +}
> > > > +
> > > > +/**
> > > > + * generic__test_and_clear_bit - Clear a bit and return its
> > > > old
> > > > value
> > > > + * @nr: Bit to clear
> > > > + * @addr: Address to count from
> > > > + *
> > > > + * This operation is non-atomic and can be reordered.
> > > > + * If two examples of this operation race, one can appear to
> > > > succeed
> > > > + * but actually fail.  You must protect multiple accesses with
> > > > a
> > > > lock.
> > > > + */
>

[xen-unstable-smoke test] 186108: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186108 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186108/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186064

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  9e58da32cc844b3fb7612fc35ece3a96f8cbf744
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186064  2024-05-21 15:04:02 Z2 days
Failing since186104  2024-05-23 09:00:22 Z0 days3 attempts
Testing same since   186108  2024-05-23 14:00:21 Z0 days1 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Alessandro Zucchelli 
  Bobby Eshleman 
  Jan Beulich 
  Julien Grall 
  Oleksandr Andrushchenko 
  Oleksii Kurochko 
  Roger Pau Monné 
  Stewart Hildebrand 
  Tamas K Lengyel 
  Volodymyr Babchuk 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 406 lines long.)

Re: [PATCH 6/7] x86/cpuid: Fix handling of XSAVE dynamic leaves

2024-05-23 Thread Jan Beulich

On 23.05.2024 13:16, Andrew Cooper wrote:
> First, if XSAVE is available in hardware but not visible to the guest, the
> dynamic leaves shouldn't be filled in.
> 
> Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
> host/guest state automatically, but there is provision for "host only" bits to
> be set, so the implications are still accurate.
> 
> Introduce xstate_compressed_size() to mirror the uncompressed one.  Cross
> check it at boot.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 
Irrespective ...

> v3:
>  * Adjust commit message about !XSAVE guests
>  * Rebase over boot time cross check
>  * Use raw policy

... it should probably have occurred to me earlier on to ask: Why raw policy?
Isn't the host one the more appropriate one to use for any kind of internal
decisions?

Jan

Re: [PATCH v2 5/8] tools/hvmloader: Retrieve (x2)APIC IDs from the APs themselves

2024-05-23 Thread Roger Pau Monné

On Wed, May 08, 2024 at 01:39:24PM +0100, Alejandro Vallejo wrote:
> Make it so the APs expose their own APIC IDs in a LUT. We can use that LUT to
> populate the MADT, decoupling the algorithm that relates CPU IDs and APIC IDs
> from hvmloader.
> 
> While at this also remove ap_callin, as writing the APIC ID may serve the same
> purpose.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> v2:
>   * New patch. Replaces adding cpu policy to hvmloader in v1.
> ---
>  tools/firmware/hvmloader/config.h|  6 -
>  tools/firmware/hvmloader/hvmloader.c |  4 +--
>  tools/firmware/hvmloader/smp.c   | 40 +++-
>  tools/firmware/hvmloader/util.h  |  5 
>  xen/arch/x86/include/asm/hvm/hvm.h   |  1 +
>  5 files changed, 47 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/config.h 
> b/tools/firmware/hvmloader/config.h
> index c82adf6dc508..edf6fa9c908c 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -4,6 +4,8 @@
>  #include 
>  #include 
>  
> +#include 
> +
>  enum virtual_vga { VGA_none, VGA_std, VGA_cirrus, VGA_pt };
>  extern enum virtual_vga virtual_vga;
>  
> @@ -49,8 +51,10 @@ extern uint8_t ioapic_version;
>  
>  #define IOAPIC_ID   0x01
>  
> +extern uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> +
>  #define LAPIC_BASE_ADDRESS  0xfee0
> -#define LAPIC_ID(vcpu_id)   ((vcpu_id) * 2)
> +#define LAPIC_ID(vcpu_id)   (CPU_TO_X2APICID[(vcpu_id)])
>  
>  #define PCI_ISA_DEVFN   0x08/* dev 1, fn 0 */
>  #define PCI_ISA_IRQ_MASK0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> diff --git a/tools/firmware/hvmloader/hvmloader.c 
> b/tools/firmware/hvmloader/hvmloader.c
> index c58841e5b556..1eba92229925 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -342,11 +342,11 @@ int main(void)
>  
>  printf("CPU speed is %u MHz\n", get_cpu_mhz());
>  
> +smp_initialise();
> +
>  apic_setup();
>  pci_setup();
>  
> -smp_initialise();
> -
>  perform_tests();
>  
>  if ( bios->bios_info_setup )
> diff --git a/tools/firmware/hvmloader/smp.c b/tools/firmware/hvmloader/smp.c
> index a668f15d7e1f..4d75f239c2f5 100644
> --- a/tools/firmware/hvmloader/smp.c
> +++ b/tools/firmware/hvmloader/smp.c
> @@ -29,7 +29,34 @@
>  
>  #include 
>  
> -static int ap_callin, ap_cpuid;
> +static int ap_cpuid;
> +
> +/**
> + * Lookup table of x2APIC IDs.
> + *
> + * Each entry is populated its respective CPU as they come online. This is 
> required
> + * for generating the MADT with minimal assumptions about ID relationships.
> + */
> +uint32_t CPU_TO_X2APICID[HVM_MAX_VCPUS];
> +
> +static uint32_t read_apic_id(void)
> +{
> +uint32_t apic_id;
> +
> +cpuid(1, NULL, _id, NULL, NULL);
> +apic_id >>= 24;
> +
> +/*
> + * APIC IDs over 255 are represented by 255 in leaf 1 and are meant to be
> + * read from topology leaves instead. Xen exposes x2APIC IDs in leaf 0xb,
> + * but only if the x2APIC feature is present. If there are that many CPUs
> + * it's guaranteed to be there so we can avoid checking for it 
> specifically
> + */

Maybe I'm missing something, but given the current code won't Xen just
return the low 8 bits from the x2APIC ID?  I don't see any code in
guest_cpuid() that adjusts the IDs to be 255 when > 255.

> +if ( apic_id == 255 )
> +cpuid(0xb, NULL, NULL, NULL, _id);

Won't the correct logic be to check if x2APIC is set in CPUID, and
then fetch the APIC ID from leaf 0xb, otherwise fallback to fetching
the APID ID from leaf 1?

> +
> +return apic_id;
> +}
>  
>  static void ap_start(void)
>  {
> @@ -37,12 +64,12 @@ static void ap_start(void)
>  cacheattr_init();
>  printf("done.\n");
>  
> +wmb();
> +ACCESS_ONCE(CPU_TO_X2APICID[ap_cpuid]) = read_apic_id();

A comment would be helpful here, that CPU_TO_X2APICID[ap_cpuid] is
used as synchronization that the AP has started.

You probably want to assert that read_apic_id() doesn't return 0,
otherwise we are skewed.

> +
>  if ( !ap_cpuid )
>  return;
>  
> -wmb();
> -ap_callin = 1;
> -
>  while ( 1 )
>  asm volatile ( "hlt" );
>  }
> @@ -86,10 +113,11 @@ static void boot_cpu(unsigned int cpu)
>  BUG();
>  
>  /*
> - * Wait for the secondary processor to complete initialisation.
> + * Wait for the secondary processor to complete initialisation,
> + * which is signaled by its x2APIC ID being writted to the LUT.
>   * Do not touch shared resources meanwhile.
>   */
> -while ( !ap_callin )
> +while ( !ACCESS_ONCE(CPU_TO_X2APICID[cpu]) )
>  cpu_relax();

As a further improvement, we could launch all APs in pararell, and use
a for loop to wait until all positions of the CPU_TO_X2APICID array
are set.

>  
>  /* Take the secondary processor offline. */
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index

Re: [PATCH 4/7] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()

2024-05-23 Thread Jan Beulich

On 23.05.2024 13:16, Andrew Cooper wrote:
> @@ -611,6 +587,40 @@ static bool valid_xcr0(uint64_t xcr0)
>  return true;
>  }
>  
> +unsigned int xstate_uncompressed_size(uint64_t xcr0)
> +{
> +unsigned int size = XSTATE_AREA_MIN_SIZE, i;
> +
> +ASSERT((xcr0 & ~X86_XCR0_STATES) == 0);

I'm puzzled by the combination of this assertion and ...

> +if ( xcr0 == xfeature_mask )
> +return xsave_cntxt_size;

... this conditional return. Yes, right now we don't support/use any XSS
components, but without any comment the assertion looks overly restrictive
to me.

> @@ -818,14 +834,14 @@ void xstate_init(struct cpuinfo_x86 *c)
>   * xsave_cntxt_size is the max size required by enabled features.
>   * We know FP/SSE and YMM about eax, and nothing about edx at 
> present.
>   */
> -xsave_cntxt_size = hw_uncompressed_size(feature_mask);
> +xsave_cntxt_size = cpuid_count_ebx(0xd, 0);
>  printk("xstate: size: %#x and states: %#"PRIx64"\n",
> xsave_cntxt_size, xfeature_mask);
>  }
>  else
>  {
>  BUG_ON(xfeature_mask != feature_mask);
> -BUG_ON(xsave_cntxt_size != hw_uncompressed_size(feature_mask));
> +BUG_ON(xsave_cntxt_size != cpuid_count_ebx(0xd, 0));
>  }

Hmm, this may make re-basing of said earlier patch touching this code yet
more interesting. Or maybe it actually simplifies things, will need to see
... The overall comment remains though: Patches pending for so long should
really take priority over creating yet more new ones. But what do I do - I
can't enforce this, unless I was now going to block your work the same way.
Which I don't mean to do.

Jan

Re: [PATCH 7/7] x86/defns: Clean up X86_{XCR0,XSS}_* constants

2024-05-23 Thread Jan Beulich

On 23.05.2024 13:16, Andrew Cooper wrote:
> With the exception of one case in read_bndcfgu() which can use ilog2(),
> the *_POS defines are unused.
> 
> X86_XCR0_X87 is the name used by both the SDM and APM, rather than
> X86_XCR0_FP.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich

Re: [PATCH 3/7] x86/boot: Collect the Raw CPU Policy earlier on boot

2024-05-23 Thread Jan Beulich

On 23.05.2024 13:16, Andrew Cooper wrote:
> This is a tangle, but it's a small step in the right direction.
> 
> xstate_init() is shortly going to want data from the Raw policy.
> calculate_raw_cpu_policy() is sufficiently separate from the other policies to
> be safe to do.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Would you mind taking a look at
https://lists.xen.org/archives/html/xen-devel/2021-04/msg01335.html
to make clear (to me at least) in how far we can perhaps find common grounds
on what wants doing when? (Of course the local version I have has been
constantly re-based, so some of the function names would have changed from
what's visible there.)

> --- a/xen/arch/x86/cpu-policy.c
> +++ b/xen/arch/x86/cpu-policy.c
> @@ -845,7 +845,6 @@ static void __init calculate_hvm_def_policy(void)
>  
>  void __init init_guest_cpu_policies(void)
>  {
> -calculate_raw_cpu_policy();
>  calculate_host_policy();
>  
>  if ( IS_ENABLED(CONFIG_PV) )
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1888,7 +1888,9 @@ void asmlinkage __init noreturn __start_xen(unsigned 
> long mbi_p)
>  
>  tsx_init(); /* Needs microcode.  May change HLE/RTM feature bits. */
>  
> -identify_cpu(_cpu_data);
> +calculate_raw_cpu_policy(); /* Needs microcode.  No other dependenices. 
> */
> +
> +identify_cpu(_cpu_data); /* Needs microcode and raw policy. */

You don't introduce any dependency on raw policy here, and there cannot possibly
have been such a dependency before (unless there was a bug somewhere). Therefore
I consider this latter comment misleading at this point.

Jan

[ANNOUNCE} Postpone June Community call

2024-05-23 Thread Kelly Choi

Hi all,

The next community call is on Thursday 6th June 2024, which clashes with
Xen Summit in Lisbon.

I propose we move the call a week later to *Thursday 13th June 2024, 4-5pm
(UK time). *

Many thanks,
Kelly Choi

Community Manager
Xen Project

Re: [PATCH 2/7] x86/xstate: Cross-check dynamic XSTATE sizes at boot

2024-05-23 Thread Jan Beulich

On 23.05.2024 13:16, Andrew Cooper wrote:
> Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in for
> every call.  This is expensive, being used for domain create/migrate, as well
> as to service certain guest CPUID instructions.
> 
> Instead, arrange to check the sizes once at boot.  See the code comments for
> details.  Right now, it just checks hardware against the algorithm
> expectations.  Later patches will add further cross-checking.
> 
> Introduce the missing X86_XCR0_* and X86_XSS_* constants, and a couple of
> missing CPUID bits.  This is to maximise coverage in the sanity check, even if
> we don't expect to use/virtualise some of these features any time soon.  Leave
> HDC and HWP alone for now.  We don't have CPUID bits from them stored nicely.

Since you say "the missing", ...

> --- a/xen/arch/x86/include/asm/x86-defns.h
> +++ b/xen/arch/x86/include/asm/x86-defns.h
> @@ -77,7 +77,7 @@
>  #define X86_CR4_PKS0x0100 /* Protection Key Supervisor */
>  
>  /*
> - * XSTATE component flags in XCR0
> + * XSTATE component flags in XCR0 | MSR_XSS
>   */
>  #define X86_XCR0_FP_POS   0
>  #define X86_XCR0_FP   (1ULL << X86_XCR0_FP_POS)
> @@ -95,11 +95,34 @@
>  #define X86_XCR0_ZMM  (1ULL << X86_XCR0_ZMM_POS)
>  #define X86_XCR0_HI_ZMM_POS   7
>  #define X86_XCR0_HI_ZMM   (1ULL << X86_XCR0_HI_ZMM_POS)
> +#define X86_XSS_PROC_TRACE(_AC(1, ULL) <<  8)
>  #define X86_XCR0_PKRU_POS 9
>  #define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS)
> +#define X86_XSS_PASID (_AC(1, ULL) << 10)
> +#define X86_XSS_CET_U (_AC(1, ULL) << 11)
> +#define X86_XSS_CET_S (_AC(1, ULL) << 12)
> +#define X86_XSS_HDC   (_AC(1, ULL) << 13)
> +#define X86_XSS_UINTR (_AC(1, ULL) << 14)
> +#define X86_XSS_LBR   (_AC(1, ULL) << 15)
> +#define X86_XSS_HWP   (_AC(1, ULL) << 16)
> +#define X86_XCR0_TILE_CFG (_AC(1, ULL) << 17)
> +#define X86_XCR0_TILE_DATA(_AC(1, ULL) << 18)

... I'm wondering if you deliberately left out APX (bit 19).

Since you're re-doing some of what I have long had in patches already,
I'd also like to ask whether the last underscores each in the two AMX
names really are useful in your opinion. While rebasing isn't going
to be difficult either way, it would be yet simpler with
X86_XCR0_TILECFG and X86_XCR0_TILEDATA, as I've had it in my patches
for over 3 years.

> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -604,9 +604,156 @@ static bool valid_xcr0(uint64_t xcr0)
>  if ( !(xcr0 & X86_XCR0_BNDREGS) != !(xcr0 & X86_XCR0_BNDCSR) )
>  return false;
>  
> +/* TILE_CFG and TILE_DATA must be the same. */
> +if ( !(xcr0 & X86_XCR0_TILE_CFG) != !(xcr0 & X86_XCR0_TILE_DATA) )
> +return false;
> +
>  return true;
>  }
>  
> +struct xcheck_state {
> +uint64_t states;
> +uint32_t uncomp_size;
> +uint32_t comp_size;
> +};
> +
> +static void __init check_new_xstate(struct xcheck_state *s, uint64_t new)
> +{
> +uint32_t hw_size;
> +
> +BUILD_BUG_ON(X86_XCR0_STATES & X86_XSS_STATES);
> +
> +BUG_ON(s->states & new); /* States only increase. */
> +BUG_ON(!valid_xcr0(s->states | new)); /* Xen thinks it's a good value. */
> +BUG_ON(new & ~(X86_XCR0_STATES | X86_XSS_STATES)); /* Known state. */
> +BUG_ON((new & X86_XCR0_STATES) &&
> +   (new & X86_XSS_STATES)); /* User or supervisor, not both. */
> +
> +s->states |= new;
> +if ( new & X86_XCR0_STATES )
> +{
> +if ( !set_xcr0(s->states & X86_XCR0_STATES) )
> +BUG();
> +}
> +else
> +set_msr_xss(s->states & X86_XSS_STATES);
> +
> +/*
> + * Check the uncompressed size.  Some XSTATEs are out-of-order and fill 
> in
> + * prior holes in the state area, so we check that the size doesn't
> + * decrease.
> + */
> +hw_size = cpuid_count_ebx(0xd, 0);
> +
> +if ( hw_size < s->uncomp_size )
> +panic("XSTATE 0x%016"PRIx64", new bits {%63pbl}, uncompressed hw 
> size %#x < prev size %#x\n",
> +  s->states, , hw_size, s->uncomp_size);
> +
> +s->uncomp_size = hw_size;
> +
> +/*
> + * Check the compressed size, if available.  All components strictly
> + * appear in index order.  In principle there are no holes, but some
> + * components have their base address 64-byte aligned for efficiency
> + * reasons (e.g. AMX-TILE) and there are other components small enough to
> + * fit in the gap (e.g. PKRU) without increasing the overall length.
> + */
> +hw_size = cpuid_count_ebx(0xd, 1);
> +
> +if ( cpu_has_xsavec )
> +{
> +if ( hw_size < s->comp_size )
> +panic("XSTATE 0x%016"PRIx64", new bits {%63pbl}, compressed hw 
> size %#x < prev size %#x\n",
> +  s->states, , hw_size, s->comp_size);
> +
> +s->comp_size = hw_size;
> +}
> +else

Re: [PATCH 1/7] x86/xstate: Fix initialisation of XSS cache

2024-05-23 Thread Jan Beulich

On 23.05.2024 13:16, Andrew Cooper wrote:
> The clobbering of this_cpu(xcr0) and this_cpu(xss) to architecturally invalid
> values is to force the subsequent set_xcr0() and set_msr_xss() to reload the
> hardware register.
> 
> While XCR0 is reloaded in xstate_init(), MSR_XSS isn't.  This causes
> get_msr_xss() to return the invalid value, and logic of the form:
> 
>   old = get_msr_xss();
>   set_msr_xss(new);
>   ...
>   set_msr_xss(old);
> 
> to try and restore the architecturally invalid value.
> 
> The architecturally invalid value must be purged from the cache, meaning the
> hardware register must be written at least once.  This in turn highlights that
> the invalid value must only be used in the case that the hardware register is
> available.
> 
> Fixes: f7f4a523927f ("x86/xstate: reset cached register values on resume")
> Signed-off-by: Andrew Cooper 

Reviewed-by: Jan Beulich 

However, I view it as pretty unfair that now I will need to re-base
https://lists.xen.org/archives/html/xen-devel/2021-04/msg01336.html
over ...

> --- a/xen/arch/x86/xstate.c
> +++ b/xen/arch/x86/xstate.c
> @@ -641,13 +641,6 @@ void xstate_init(struct cpuinfo_x86 *c)
>  return;
>  }
>  
> -/*
> - * Zap the cached values to make set_xcr0() and set_msr_xss() really
> - * write it.
> - */
> -this_cpu(xcr0) = 0;
> -this_cpu(xss) = ~0;
> -
>  cpuid_count(XSTATE_CPUID, 0, , , , );
>  feature_mask = (((u64)edx << 32) | eax) & XCNTXT_MASK;
>  BUG_ON(!valid_xcr0(feature_mask));
> @@ -657,8 +650,19 @@ void xstate_init(struct cpuinfo_x86 *c)
>   * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
>   */
>  set_in_cr4(X86_CR4_OSXSAVE);
> +
> +/*
> + * Zap the cached values to make set_xcr0() and set_msr_xss() really 
> write
> + * the hardware register.
> + */
> +this_cpu(xcr0) = 0;
>  if ( !set_xcr0(feature_mask) )
>  BUG();
> +if ( cpu_has_xsaves )
> +{
> +this_cpu(xss) = ~0;
> +set_msr_xss(0);
> +}

... this change, kind of breaking again your nice arrangement. Seeing for how
long that change has been pending, it _really_ should have gone in ahead of
this one, with you then sorting how you'd like things to be arranged in the
combined result, rather than me re-posting and then either again not getting
any feedback for years, or you disliking what I've done. Oh well ...

Jan

Re: [PATCH 4.5/8] tools/hvmloader: Further simplify SMP setup

2024-05-23 Thread Roger Pau Monné

On Thu, May 09, 2024 at 06:50:57PM +0100, Andrew Cooper wrote:
> Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
> global with a regular function parameter.  This requires telling the compiler
> that we'd like the parameter in a register rather than on the stack.
> 
> While adjusting, rename to cpu_setup().  It's always been used on the BSP,
> making the name ap_start() specifically misleading.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Roger Pau Monné 

Thanks, Roger.

Re: [PATCH v2 1/2] x86/hvm/trace: Use a different trace type for AMD processors

2024-05-23 Thread Andrew Cooper

On 23/05/2024 3:10 pm, George Dunlap wrote:
> A long-standing usability sub-optimality with xenalyze is the
> necessity to specify `--svm-mode` when analyzing AMD processors.  This
> fundamentally comes about because the same trace event ID is used for
> both VMX and SVM, but the contents of the trace must be interpreted
> differently.
>
> Instead, allocate separate trace events for VMX and SVM vmexits in
> Xen; this will allow all readers to properly interpret the meaning of
> the vmexit reason.
>
> In xenalyze, first remove the redundant call to init_hvm_data();
> there's no way to get to hvm_vmexit_process() without it being already
> initialized by the set_vcpu_type call in hvm_process().
>
> Replace this with set_hvm_exit_reson_data(), and move setting of
> hvm->exit_reason_* into that function.
>
> Modify hvm_process and hvm_vmexit_process to handle all four potential
> values appropriately.
>
> If SVM entries are encountered, set opt.svm_mode so that other
> SVM-specific functionality is triggered.
>
> Remove the `--svm-mode` command-line option, since it's now redundant.
>
> Signed-off-by: George Dunlap 

Acked-by: Andrew Cooper

Re: [xen-4.17-testing test] 186087: regressions - FAIL

2024-05-23 Thread Jan Beulich

On 23.05.2024 16:40, osstest service owner wrote:
> flight 186087 xen-4.17-testing real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/186087/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  build-amd64   6 xen-buildfail REGR. vs. 
> 185864
>  build-amd64-xsm   6 xen-buildfail REGR. vs. 
> 185864
>  build-i386-xsm6 xen-buildfail REGR. vs. 
> 185864
>  build-i3866 xen-buildfail REGR. vs. 
> 185864
>  build-amd64-prev  6 xen-buildfail REGR. vs. 
> 185864
>  build-i386-prev   6 xen-buildfail REGR. vs. 
> 185864

These look to be recurring, yet at the same time these look to be infrastructure
issues. This not happening for the first time I'm not sure we can simply wait
and hope for the problem to clear itself.

Jan

Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm

2024-05-23 Thread Roger Pau Monné

On Thu, May 23, 2024 at 04:40:06PM +0200, Jan Beulich wrote:
> On 23.05.2024 16:37, Roger Pau Monné wrote:
> > On Wed, May 08, 2024 at 01:39:21PM +0100, Alejandro Vallejo wrote:
> >> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> >> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> >> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu 
> >> *v)
> >>  alternative_vcall(hvm_funcs.update_vlapic_mode, v);
> >>  }
> >>  
> >> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
> >> +{
> >> +if ( hvm_funcs.sync_pir_to_irr )
> >> +alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
> > 
> > Nit: for consistency the wrappers are usually named hvm_,
> > so in this case it would be hvm_sync_pir_to_irr(), or the hvm_funcs
> > field should be renamed to vlapic_sync_pir_to_irr.
> 
> Funny you should mention that: See my earlier comment as well as what
> was committed.

Oh, sorry, didn't realize you already replied, adjusted and committed.

Thanks, Roger.

Re: [PATCH v2 3/8] x86/vlapic: Move lapic_load_hidden migration checks to the check hook

2024-05-23 Thread Roger Pau Monné

On Wed, May 08, 2024 at 01:39:22PM +0100, Alejandro Vallejo wrote:
> While at it, add a check for the reserved field in the hidden save area.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> v2:
>   * New patch. Addresses the missing check for rsvd_zero in v1.

Oh, it would be better if this was done at the time when rsvd_zero is
introduced.  I think this should be moved ahead of the series, so that
the patch that introduces rsvd_zero can add the check in
lapic_check_hidden().

> ---
>  xen/arch/x86/hvm/vlapic.c | 41 ---
>  1 file changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 8a24419c..2f06bff1b2cc 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1573,35 +1573,54 @@ static void lapic_load_fixup(struct vlapic *vlapic)
> v, vlapic->loaded.id, vlapic->loaded.ldr, good_ldr);
>  }
>  
> -static int cf_check lapic_load_hidden(struct domain *d, hvm_domain_context_t 
> *h)
> +static int cf_check lapic_check_hidden(const struct domain *d,
> +   hvm_domain_context_t *h)
>  {
>  unsigned int vcpuid = hvm_load_instance(h);
> -struct vcpu *v;
> -struct vlapic *s;
> +struct hvm_hw_lapic s;
>  
>  if ( !has_vlapic(d) )
>  return -ENODEV;
>  
>  /* Which vlapic to load? */
> -if ( vcpuid >= d->max_vcpus || (v = d->vcpu[vcpuid]) == NULL )
> +if ( vcpuid >= d->max_vcpus || d->vcpu[vcpuid] == NULL )
>  {
>  dprintk(XENLOG_G_ERR, "HVM restore: dom%d has no apic%u\n",
>  d->domain_id, vcpuid);
>  return -EINVAL;
>  }
> -s = vcpu_vlapic(v);
>  
> -if ( hvm_load_entry_zeroextend(LAPIC, h, >hw) != 0 )
> +if ( hvm_load_entry_zeroextend(LAPIC, h, ) )

Can't you use hvm_get_entry() to perform the sanity checks:

const struct hvm_hw_lapic *s = hvm_get_entry(LAPIC, h);

Thanks, Roger.

Re: [XEN PATCH v2 07/15] x86: guard cpu_has_{svm/vmx} macros with CONFIG_{SVM/VMX}

2024-05-23 Thread Jan Beulich

On 23.05.2024 15:07, Sergiy Kibrik wrote:
> 16.05.24 14:12, Jan Beulich:
>> On 15.05.2024 11:12, Sergiy Kibrik wrote:
>>> --- a/xen/arch/x86/include/asm/cpufeature.h
>>> +++ b/xen/arch/x86/include/asm/cpufeature.h
>>> @@ -81,7 +81,8 @@ static inline bool boot_cpu_has(unsigned int feat)
>>>   #define cpu_has_sse3boot_cpu_has(X86_FEATURE_SSE3)
>>>   #define cpu_has_pclmulqdq   boot_cpu_has(X86_FEATURE_PCLMULQDQ)
>>>   #define cpu_has_monitor boot_cpu_has(X86_FEATURE_MONITOR)
>>> -#define cpu_has_vmx boot_cpu_has(X86_FEATURE_VMX)
>>> +#define cpu_has_vmx ( IS_ENABLED(CONFIG_VMX) && \
>>> +  boot_cpu_has(X86_FEATURE_VMX))
>>>   #define cpu_has_eistboot_cpu_has(X86_FEATURE_EIST)
>>>   #define cpu_has_ssse3   boot_cpu_has(X86_FEATURE_SSSE3)
>>>   #define cpu_has_fma boot_cpu_has(X86_FEATURE_FMA)
>>> @@ -109,7 +110,8 @@ static inline bool boot_cpu_has(unsigned int feat)
>>>   
>>>   /* CPUID level 0x8001.ecx */
>>>   #define cpu_has_cmp_legacy  boot_cpu_has(X86_FEATURE_CMP_LEGACY)
>>> -#define cpu_has_svm boot_cpu_has(X86_FEATURE_SVM)
>>> +#define cpu_has_svm ( IS_ENABLED(CONFIG_SVM) && \
>>> +  boot_cpu_has(X86_FEATURE_SVM))
>>>   #define cpu_has_sse4a   boot_cpu_has(X86_FEATURE_SSE4A)
>>>   #define cpu_has_xop boot_cpu_has(X86_FEATURE_XOP)
>>>   #define cpu_has_skinit  boot_cpu_has(X86_FEATURE_SKINIT)
>>
>> Hmm, leaving aside the style issue (stray blanks after opening parentheses,
>> and as a result one-off indentation on the wrapped lines) I'm not really
>> certain we can do this. The description goes into detail why we would want
>> this, but it doesn't cover at all why it is safe for all present (and
>> ideally also future) uses. I wouldn't be surprised if we had VMX/SVM checks
>> just to derive further knowledge from that, without them being directly
>> related to the use of VMX/SVM. Take a look at calculate_hvm_max_policy(),
>> for example. While it looks to be okay there, it may give you an idea of
>> what I mean.
>>
>> Things might become better separated if instead for such checks we used
>> host and raw CPU policies instead of cpuinfo_x86.x86_capability[]. But
>> that's still pretty far out, I'm afraid.
> 
> I've followed a suggestion you made for patch in previous series:
> 
> https://lore.kernel.org/xen-devel/8fbd604e-5e5d-410c-880f-2ad257bbe...@suse.com/

See the "If not, ..." that I had put there. Doing the change just mechanically
isn't enough, you also need to make clear (in the description) that you
verified it's safe to have this way.

> yet if this approach can potentially be unsafe (I'm not completely sure 
> it's safe), should we instead fallback to the way it was done in v1 
> series? I.e. guard calls to vmx/svm-specific calls where needed, like in 
> these 3 patches:
> 
> 1) 
> https://lore.kernel.org/xen-devel/20240416063328.3469386-1-sergiy_kib...@epam.com/
> 
> 2) 
> https://lore.kernel.org/xen-devel/20240416063740.3469592-1-sergiy_kib...@epam.com/
> 
> 3) 
> https://lore.kernel.org/xen-devel/20240416063947.3469718-1-sergiy_kib...@epam.com/

I don't like this sprinkling around of IS_ENABLED() very much. Maybe we want
to have two new helpers (say using_svm() and using_vmx()), to be used in place
of most but possibly not all cpu_has_{svm,vmx}? Doing such a transformation
would then kind of implicitly answer the safety question above, as at every
use site you'd need to judge whether the replacement is correct. If it's
correct everywhere, the construct(s) as proposed in this version could then be
considered to be used in this very shape (instead of introducing the two new
helpers). But of course the transition could also be done gradually then,
touching only those uses that previously you touched in 1), 2), and 3).

Jan

Re: [XEN PATCH v2 06/15] x86/p2m: guard altp2m code with CONFIG_ALTP2M option

2024-05-23 Thread Jan Beulich

On 23.05.2024 12:44, Sergiy Kibrik wrote:
> 16.05.24 14:01, Jan Beulich:
>> On 15.05.2024 11:10, Sergiy Kibrik wrote:
>>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>>> @@ -670,7 +670,7 @@ static inline bool hvm_hap_supported(void)
>>>   /* returns true if hardware supports alternate p2m's */
>>>   static inline bool hvm_altp2m_supported(void)
>>>   {
>>> -return hvm_funcs.caps.altp2m;
>>> +return IS_ENABLED(CONFIG_ALTP2M) && hvm_funcs.caps.altp2m;
>>
>> Which in turn raises the question whether the altp2m struct field shouldn't
>> become conditional upon CONFIG_ALTP2M too (or rather: instead, as the change
>> here then would need to be done differently). Yet maybe that would entail
>> further changes elsewhere, so may well better be left for later.
> 
>   but hvm_funcs.caps.altp2m is only a capability bit -- is it worth to 
> become conditional?

Well, the comment was more based on the overall principle than the actual
space savings that might result. Plus as said - likely that would not work
anyway without further changes elsewhere. So perhaps okay to leave as you
have it.

>>> --- a/xen/arch/x86/mm/Makefile
>>> +++ b/xen/arch/x86/mm/Makefile
>>> @@ -1,7 +1,7 @@
>>>   obj-y += shadow/
>>>   obj-$(CONFIG_HVM) += hap/
>>>   
>>> -obj-$(CONFIG_HVM) += altp2m.o
>>> +obj-$(CONFIG_ALTP2M) += altp2m.o
>>
>> This change I think wants to move to patch 5.
>>
> 
> If this moves to patch 5 then HVM=y && ALTP2M=n configuration 
> combination will break the build in between patch 5 and 6, so I've 
> decided to put it together with fixes of these build failures in patch 6.

Hmm, yes, I think I see what you mean.

> Maybe I can merge patch 5 & 6 together then ?

Perhaps more consistent that way, yes.

Jan

[xen-4.17-testing test] 186087: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186087 xen-4.17-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186087/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 185864
 build-amd64-xsm   6 xen-buildfail REGR. vs. 185864
 build-i386-xsm6 xen-buildfail REGR. vs. 185864
 build-i3866 xen-buildfail REGR. vs. 185864
 build-amd64-prev  6 xen-buildfail REGR. vs. 185864
 build-i386-prev   6 xen-buildfail REGR. vs. 185864

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-raw   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemut-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-win7-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-livepatch1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-migrupgrade  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-51 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-11 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-21 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-31 build-check(1)   blocked  n/a
 test-xtf-amd64-amd64-41 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 185864
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass

Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm

2024-05-23 Thread Jan Beulich

On 23.05.2024 16:37, Roger Pau Monné wrote:
> On Wed, May 08, 2024 at 01:39:21PM +0100, Alejandro Vallejo wrote:
>> --- a/xen/arch/x86/include/asm/hvm/hvm.h
>> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
>> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu 
>> *v)
>>  alternative_vcall(hvm_funcs.update_vlapic_mode, v);
>>  }
>>  
>> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
>> +{
>> +if ( hvm_funcs.sync_pir_to_irr )
>> +alternative_vcall(hvm_funcs.sync_pir_to_irr, v);
> 
> Nit: for consistency the wrappers are usually named hvm_,
> so in this case it would be hvm_sync_pir_to_irr(), or the hvm_funcs
> field should be renamed to vlapic_sync_pir_to_irr.

Funny you should mention that: See my earlier comment as well as what
was committed.

Jan

Re: [PATCH v2 2/8] xen/x86: Simplify header dependencies in x86/hvm

2024-05-23 Thread Roger Pau Monné

On Wed, May 08, 2024 at 01:39:21PM +0100, Alejandro Vallejo wrote:
> Otherwise it's not possible to call functions described in hvm/vlapic.h from 
> the
> inline functions of hvm/hvm.h.
> 
> This is because a static inline in vlapic.h depends on hvm.h, and pulls it
> transitively through vpt.h. The ultimate cause is having hvm.h included in any
> of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
> 
> No functional change.
> 
> Signed-off-by: Alejandro Vallejo 

Acked-by: Roger Pau Monné 

One cosmetic comment below.

> ---
> v2:
>   * New patch. Prereq to moving vlapic_cpu_policy_changed() onto hvm.h
> ---
>  xen/arch/x86/hvm/irq.c| 6 +++---
>  xen/arch/x86/hvm/vlapic.c | 4 ++--
>  xen/arch/x86/include/asm/hvm/hvm.h| 6 ++
>  xen/arch/x86/include/asm/hvm/vlapic.h | 6 --
>  xen/arch/x86/include/asm/hvm/vpt.h| 1 -
>  5 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c
> index 4a9fe82cbd8d..4f5479b12c98 100644
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -512,13 +512,13 @@ struct hvm_intack hvm_vcpu_has_pending_irq(struct vcpu 
> *v)
>  int vector;
>  
>  /*
> - * Always call vlapic_sync_pir_to_irr so that PIR is synced into IRR when
> - * using posted interrupts. Note this is also done by
> + * Always call hvm_vlapic_sync_pir_to_irr so that PIR is synced into IRR
> + * when using posted interrupts. Note this is also done by
>   * vlapic_has_pending_irq but depending on which interrupts are pending
>   * hvm_vcpu_has_pending_irq will return early without calling
>   * vlapic_has_pending_irq.
>   */
> -vlapic_sync_pir_to_irr(v);
> +hvm_vlapic_sync_pir_to_irr(v);
>  
>  if ( unlikely(v->arch.nmi_pending) )
>  return hvm_intack_nmi;
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 61a96474006b..8a24419c 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -98,7 +98,7 @@ static void vlapic_clear_irr(int vector, struct vlapic 
> *vlapic)
>  
>  static int vlapic_find_highest_irr(struct vlapic *vlapic)
>  {
> -vlapic_sync_pir_to_irr(vlapic_vcpu(vlapic));
> +hvm_vlapic_sync_pir_to_irr(vlapic_vcpu(vlapic));
>  
>  return vlapic_find_highest_vector(>regs->data[APIC_IRR]);
>  }
> @@ -1516,7 +1516,7 @@ static int cf_check lapic_save_regs(struct vcpu *v, 
> hvm_domain_context_t *h)
>  if ( !has_vlapic(v->domain) )
>  return 0;
>  
> -vlapic_sync_pir_to_irr(v);
> +hvm_vlapic_sync_pir_to_irr(v);
>  
>  return hvm_save_entry(LAPIC_REGS, v->vcpu_id, h, vcpu_vlapic(v)->regs);
>  }
> diff --git a/xen/arch/x86/include/asm/hvm/hvm.h 
> b/xen/arch/x86/include/asm/hvm/hvm.h
> index e1f0585d75a9..84911f3ebcb4 100644
> --- a/xen/arch/x86/include/asm/hvm/hvm.h
> +++ b/xen/arch/x86/include/asm/hvm/hvm.h
> @@ -798,6 +798,12 @@ static inline void hvm_update_vlapic_mode(struct vcpu *v)
>  alternative_vcall(hvm_funcs.update_vlapic_mode, v);
>  }
>  
> +static inline void hvm_vlapic_sync_pir_to_irr(struct vcpu *v)
> +{
> +if ( hvm_funcs.sync_pir_to_irr )
> +alternative_vcall(hvm_funcs.sync_pir_to_irr, v);

Nit: for consistency the wrappers are usually named hvm_,
so in this case it would be hvm_sync_pir_to_irr(), or the hvm_funcs
field should be renamed to vlapic_sync_pir_to_irr.

Thanks, Roger.

Re: [xen-unstable-smoke test] 186107: regressions - FAIL

2024-05-23 Thread Jan Beulich

On 23.05.2024 15:45, osstest service owner wrote:
> flight 186107 xen-unstable-smoke real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/186107/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  build-armhf   6 xen-buildfail REGR. vs. 
> 186064

Found ninja-1.11.1 at /usr/bin/ninja

ERROR: Clock skew detected. File /usr/bin/bash has a time stamp 
1682259478.4465s in the future.

A full log can be found at 
/home/osstest/build.186107.build-armhf/xen/tools/qemu-xen-build/meson-logs/meson-log.txt

ERROR: meson setup failed

make: Entering directory 
'/home/osstest/build.186107.build-armhf/xen/tools/qemu-xen-build'
config-host.mak is out-of-date, running configure
  GIT ui/keycodemapdb meson tests/fp/berkeley-testfloat-3 
tests/fp/berkeley-softfloat-3 dtc
bash: line 4: ./config.status: No such file or directory
make: *** No rule to make target 'config-host.mak', needed by 
'Makefile.prereqs'.  Stop.
make: *** Waiting for unfinished jobs
make: Leaving directory 
'/home/osstest/build.186107.build-armhf/xen/tools/qemu-xen-build'
make[2]: *** [Makefile:212: subdir-all-qemu-xen-dir] Error 2
make[2]: Leaving directory '/home/osstest/build.186107.build-armhf/xen/tools'
make[1]: *** 
[/home/osstest/build.186107.build-armhf/xen/tools/../tools/Rules.mk:199: 
subdirs-all] Error 2
make[1]: Leaving directory '/home/osstest/build.186107.build-armhf/xen/tools'
make: *** [Makefile:63: build-tools] Error 2

Suggest to me that there's some issue with the build host.

Jan

Re: [PATCH v10 02/14] xen: introduce generic non-atomic test_*bit()

2024-05-23 Thread Julien Grall





On 23/05/2024 15:11, Oleksii K. wrote:

On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:

Hi Oleksii,

Hi Julien,



On 17/05/2024 14:54, Oleksii Kurochko wrote:

diff --git a/xen/arch/arm/arm64/livepatch.c
b/xen/arch/arm/arm64/livepatch.c
index df2cebedde..4bc8ed9be5 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -10,7 +10,6 @@
   #include 
   #include 
   
-#include 


It is a bit unclear how this change is related to the patch. Can you
explain in the commit message?

Probably it doesn't need anymore. I will double check and if this
change is not needed, I will just drop it in the next patch version.




   #include 
   #include 
   #include 
diff --git a/xen/arch/arm/include/asm/bitops.h
b/xen/arch/arm/include/asm/bitops.h
index 5104334e48..8e16335e76 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -22,9 +22,6 @@
   #define __set_bit(n,p)    set_bit(n,p)
   #define __clear_bit(n,p)  clear_bit(n,p)
   
-#define BITOP_BITS_PER_WORD 32

-#define BITOP_MASK(nr)  (1UL << ((nr) %
BITOP_BITS_PER_WORD))
-#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
   #define BITS_PER_BYTE   8


OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
expect
the value to change across arch.

I can move it to generic one header too in the next patch version.



[...]


diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index f14ad0d33a..6eeeff0117 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long
x)
    * scope
    */
   
+#define BITOP_BITS_PER_WORD 32

+typedef uint32_t bitop_uint_t;
+
+#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
BITOP_BITS_PER_WORD))
+
+#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
+
+extern void __bitop_bad_size(void);
+
+#define bitop_bad_size(addr) (sizeof(*(addr)) <
sizeof(bitop_uint_t))
+
   /* - Please tidy above here -
 */
   
   #include 
   
+/**

+ * generic__test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to
succeed
+ * but actually fail.  You must protect multiple accesses with a
lock.
+ */


Sorry for only mentioning this on v10. I think this comment should be
duplicated (or moved to) on top of test_bit() because this is what
everyone will use. This will avoid the developper to follow the
function
calls and only notice the x86 version which says "This function is
atomic and may not be reordered." and would be wrong for all the
other arch.

It makes sense to add this comment on top of test_bit(), but I am
curious if it is needed to mention that for x86 arch_test_bit() "is
atomic and may not be reordered":


I would say no because any developper modifying common code can't 
relying it.




  * This operation is non-atomic and can be reordered. ( Exception: for
* x86 arch_test_bit() is atomic and may not be reordered )
  * If two examples of this operation race, one can appear to succeed
  * but actually fail.  You must protect multiple accesses with a lock.
  */




+static always_inline bool
+generic__test_and_set_bit(int nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old | mask;
+    return (old & mask);
+}
+
+/**
+ * generic__test_and_clear_bit - Clear a bit and return its old
value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to
succeed
+ * but actually fail.  You must protect multiple accesses with a
lock.
+ */


Same applies here and ...


+static always_inline bool
+generic__test_and_clear_bit(int nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old & ~mask;
+    return (old & mask);
+}
+
+/* WARNING: non atomic and it can be reordered! */


... here.


+static always_inline bool
+generic__test_and_change_bit(int nr, volatile void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
BITOP_WORD(nr);
+    bitop_uint_t old = *p;
+
+    *p = old ^ mask;
+    return (old & mask);
+}
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static always_inline bool generic_test_bit(int nr, const volatile
void *addr)
+{
+    bitop_uint_t mask = BITOP_MASK(nr);
+    const volatile bitop_uint_t *p =
+    (const volatile bitop_uint_t *)addr +
BITOP_WORD(nr);
+
+

Re: [PATCH v2 1/8] xen/x86: Add initial x2APIC ID to the per-vLAPIC save area

2024-05-23 Thread Roger Pau Monné

On Wed, May 08, 2024 at 01:39:20PM +0100, Alejandro Vallejo wrote:
> This allows the initial x2APIC ID to be sent on the migration stream. The
> hardcoded mapping x2apic_id=2*vcpu_id is maintained for the time being.
> Given the vlapic data is zero-extended on restore, fix up migrations from
> hosts without the field by setting it to the old convention if zero.
> 
> x2APIC IDs are calculated from the CPU policy where the guest topology is
> defined. For the time being, the function simply returns the old
> relationship, but will eventually return results consistent with the
> topology.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> v2:
>   * Removed usage of SET_xAPIC_ID().
>   * Restored previous logic when exposing leaf 0xb, and gate it for HVM only.
>   * Rewrote comment in lapic_load_fixup, including the implicit assumption.
>   * Moved vlapic_cpu_policy_changed() into hvm_cpuid_policy_changed())
>   * const-ified policy in vlapic_cpu_policy_changed()
> ---
>  xen/arch/x86/cpuid.c   | 15 -
>  xen/arch/x86/hvm/vlapic.c  | 30 --
>  xen/arch/x86/include/asm/hvm/hvm.h |  1 +
>  xen/arch/x86/include/asm/hvm/vlapic.h  |  2 ++
>  xen/include/public/arch-x86/hvm/save.h |  2 ++
>  xen/include/xen/lib/x86/cpu-policy.h   |  9 
>  xen/lib/x86/policy.c   | 11 ++
>  7 files changed, 57 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
> index 7a38e032146a..242c21ec5bb6 100644
> --- a/xen/arch/x86/cpuid.c
> +++ b/xen/arch/x86/cpuid.c
> @@ -139,10 +139,9 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>  const struct cpu_user_regs *regs;
>  
>  case 0x1:
> -/* TODO: Rework topology logic. */
>  res->b &= 0x00ffu;
>  if ( is_hvm_domain(d) )
> -res->b |= (v->vcpu_id * 2) << 24;
> +res->b |= vlapic_x2apic_id(vcpu_vlapic(v)) << 24;
>  
>  /* TODO: Rework vPMU control in terms of toolstack choices. */
>  if ( vpmu_available(v) &&
> @@ -311,19 +310,13 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
>  break;
>  
>  case 0xb:
> -/*
> - * In principle, this leaf is Intel-only.  In practice, it is tightly
> - * coupled with x2apic, and we offer an x2apic-capable APIC emulation
> - * to guests on AMD hardware as well.
> - *
> - * TODO: Rework topology logic.
> - */
> -if ( p->basic.x2apic )
> +/* Don't expose topology information to PV guests */

Not sure whether we want to keep part of the comment about exposing
x2APIC to guests even when x2APIC is not present in the host.  I think
this code has changed and the comment is kind of stale now.

> +if ( is_hvm_domain(d) && p->basic.x2apic )
>  {
>  *(uint8_t *)>c = subleaf;
>  
>  /* Fix the x2APIC identifier. */
> -res->d = v->vcpu_id * 2;
> +res->d = vlapic_x2apic_id(vcpu_vlapic(v));
>  }
>  break;
>  
> diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
> index 05072a21bf38..61a96474006b 100644
> --- a/xen/arch/x86/hvm/vlapic.c
> +++ b/xen/arch/x86/hvm/vlapic.c
> @@ -1069,7 +1069,7 @@ static uint32_t x2apic_ldr_from_id(uint32_t id)
>  static void set_x2apic_id(struct vlapic *vlapic)
>  {
>  const struct vcpu *v = vlapic_vcpu(vlapic);
> -uint32_t apic_id = v->vcpu_id * 2;
> +uint32_t apic_id = vlapic->hw.x2apic_id;
>  uint32_t apic_ldr = x2apic_ldr_from_id(apic_id);
>  
>  /*
> @@ -1083,6 +1083,22 @@ static void set_x2apic_id(struct vlapic *vlapic)
>  vlapic_set_reg(vlapic, APIC_LDR, apic_ldr);
>  }
>  
> +void vlapic_cpu_policy_changed(struct vcpu *v)
> +{
> +struct vlapic *vlapic = vcpu_vlapic(v);
> +const struct cpu_policy *cp = v->domain->arch.cpu_policy;
> +
> +/*
> + * Don't override the initial x2APIC ID if we have migrated it or
> + * if the domain doesn't have vLAPIC at all.
> + */
> +if ( !has_vlapic(v->domain) || vlapic->loaded.hw )
> +return;
> +
> +vlapic->hw.x2apic_id = x86_x2apic_id_from_vcpu_id(cp, v->vcpu_id);
> +vlapic_set_reg(vlapic, APIC_ID, SET_xAPIC_ID(vlapic->hw.x2apic_id));

Nit: in case we decide to start APICs in x2APIC mode, might be good to
take this into account here and use vlapic_x2apic_mode(vlapic) to
select whether SET_xAPIC_ID() needs to be used or not:

vlapic_set_reg(vlapic, APIC_ID,
vlapic_x2apic_mode(vlapic) ? vlapic->hw.x2apic_id
   : SET_xAPIC_ID(vlapic->hw.x2apic_id));

Or similar.

> +}
> +
>  int guest_wrmsr_apic_base(struct vcpu *v, uint64_t val)
>  {
>  const struct cpu_policy *cp = v->domain->arch.cpu_policy;
> @@ -1449,7 +1465,7 @@ void vlapic_reset(struct vlapic *vlapic)
>  if ( v->vcpu_id == 0 )
>  vlapic->hw.apic_base_msr |= APIC_BASE_BSP;
>  
> -vlapic_set_reg(vlapic, APIC_ID,

Re: [PATCH v4 0/2] Add API for making parts of a MMIO page R/O and use it in XHCI console

2024-05-23 Thread Jan Beulich

On 23.05.2024 16:22, Marek Marczykowski-Górecki wrote:
> On Wed, May 22, 2024 at 05:39:02PM +0200, Marek Marczykowski-Górecki wrote:
>> On older systems, XHCI xcap had a layout that no other (interesting) 
>> registers
>> were placed on the same page as the debug capability, so Linux was fine with
>> making the whole page R/O. But at least on Tiger Lake and Alder Lake, Linux
>> needs to write to some other registers on the same page too.
>>
>> Add a generic API for making just parts of an MMIO page R/O and use it to fix
>> USB3 console with share=yes or share=hwdom options. More details in commit
>> messages.
>>
>> Marek Marczykowski-Górecki (2):
>>   x86/mm: add API for marking only part of a MMIO page read only
>>   drivers/char: Use sub-page ro API to make just xhci dbc cap RO
> 
> Does any other x86 maintainer feel comfortable ack-ing this series? Jan
> already reviewed 2/2 here (but not 1/2 in this version),

Which, btw, isn't to mean I'm not going to look at it. But 2/2 was the
lower hanging fruit ...

Jan

> but also said
> he is not comfortable with letting this in without a second maintainer
> approval: 
> https://lore.kernel.org/xen-devel/7655e401-b927-4250-ae63-05361a5ee...@suse.com/

[PATCH v2 1/2] x86/hvm/trace: Use a different trace type for AMD processors

2024-05-23 Thread George Dunlap

A long-standing usability sub-optimality with xenalyze is the
necessity to specify `--svm-mode` when analyzing AMD processors.  This
fundamentally comes about because the same trace event ID is used for
both VMX and SVM, but the contents of the trace must be interpreted
differently.

Instead, allocate separate trace events for VMX and SVM vmexits in
Xen; this will allow all readers to properly interpret the meaning of
the vmexit reason.

In xenalyze, first remove the redundant call to init_hvm_data();
there's no way to get to hvm_vmexit_process() without it being already
initialized by the set_vcpu_type call in hvm_process().

Replace this with set_hvm_exit_reson_data(), and move setting of
hvm->exit_reason_* into that function.

Modify hvm_process and hvm_vmexit_process to handle all four potential
values appropriately.

If SVM entries are encountered, set opt.svm_mode so that other
SVM-specific functionality is triggered.

Remove the `--svm-mode` command-line option, since it's now redundant.

Signed-off-by: George Dunlap 
---
v2:
- Rebase to tip of staging
- Rebase over xentrace_format removal
- Fix typo in commit message
- Remove --svm-mode command-line flag

CC: Andrew Cooper 
CC: Jan Beulich 
CC: Roger Pau Monne 
CC: Anthony Perard 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Olaf Hering 
---
 tools/xentrace/xenalyze.c  | 37 +++--
 xen/arch/x86/hvm/svm/svm.c |  4 ++--
 xen/arch/x86/hvm/vmx/vmx.c |  4 ++--
 xen/include/public/trace.h |  6 --
 4 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index ce6a85d50b..9c4463b0e8 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -1437,14 +1437,6 @@ void init_hvm_data(struct hvm_data *h, struct vcpu_data 
*v) {
 
 h->init = 1;
 
-if(opt.svm_mode) {
-h->exit_reason_max = HVM_SVM_EXIT_REASON_MAX;
-h->exit_reason_name = hvm_svm_exit_reason_name;
-} else {
-h->exit_reason_max = HVM_VMX_EXIT_REASON_MAX;
-h->exit_reason_name = hvm_vmx_exit_reason_name;
-}
-
 if(opt.histogram_interrupt_eip) {
 int count = 
((1ULLexit_reason_max = HVM_SVM_EXIT_REASON_MAX;
+h->exit_reason_name = hvm_svm_exit_reason_name;
+} else {
+h->exit_reason_max = HVM_VMX_EXIT_REASON_MAX;
+h->exit_reason_name = hvm_vmx_exit_reason_name;
+}
+}
+
 /* PV data */
 enum {
 PV_HYPERCALL=1,
@@ -5088,13 +5092,13 @@ void hvm_vmexit_process(struct record_info *ri, struct 
hvm_data *h,
 
 r = (typeof(r))ri->d;
 
-if(!h->init)
-init_hvm_data(h, v);
+if(!h->exit_reason_name)
+set_hvm_exit_reason_data(h, ri->event);
 
 h->vmexit_valid=1;
 bzero(>inflight, sizeof(h->inflight));
 
-if(ri->event == TRC_HVM_VMEXIT64) {
+if(ri->event & TRC_64_FLAG) {
 if(v->guest_paging_levels != 4)
 {
 if ( verbosity >= 6 )
@@ -5316,8 +5320,10 @@ void hvm_process(struct pcpu_info *p)
 break;
 default:
 switch(ri->event) {
-case TRC_HVM_VMEXIT:
-case TRC_HVM_VMEXIT64:
+case TRC_HVM_VMX_EXIT:
+case TRC_HVM_VMX_EXIT64:
+case TRC_HVM_SVM_EXIT:
+case TRC_HVM_SVM_EXIT64:
 UPDATE_VOLUME(p, hvm[HVM_VOL_VMEXIT], ri->size);
 hvm_vmexit_process(ri, h, v);
 break;
@@ -10884,11 +10890,6 @@ const struct argp_option cmd_opts[] =  {
   .arg = "HZ",
   .doc = "Cpu speed of the tracing host, used to convert tsc into 
seconds.", },
 
-{ .name = "svm-mode",
-  .key = OPT_SVM_MODE,
-  .group = OPT_GROUP_HARDWARE,
-  .doc = "Assume AMD SVM-style vmexit error codes.  (Default is Intel 
VMX.)", },
-
 { .name = "progress",
   .key = OPT_PROGRESS,
   .doc = "Progress dialog.  Requires the zenity (GTK+) executable.", },
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index db530d55f2..988250dbc1 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2571,10 +2571,10 @@ void asmlinkage svm_vmexit_handler(void)
 exit_reason = vmcb->exitcode;
 
 if ( hvm_long_mode_active(v) )
-TRACE_TIME(TRC_HVM_VMEXIT64 | (vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 
0),
+TRACE_TIME(TRC_HVM_SVM_EXIT64 | (vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 
0),
exit_reason, regs->rip, regs->rip >> 32);
 else
-TRACE_TIME(TRC_HVM_VMEXIT | (vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 0),
+TRACE_TIME(TRC_HVM_SVM_EXIT | (vcpu_guestmode ? TRC_HVM_NESTEDFLAG : 
0),
exit_reason, regs->eip);
 
 if ( vcpu_guestmode )
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 8ba996546f..f16faa6a61 100644
---

[PATCH v2 2/2] tools/xenalyze: Ignore HVM_EMUL events harder

2024-05-23 Thread George Dunlap

To unify certain common sanity checks, checks are done very early in
processing based only on the top-level type.

Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
assumptions about how the top-level types worked.  Namely, traces of
this type will show up outside of HVM contexts: in idle domains and in
PV domains.

Make an explicit exception for TRC_HVM_EMUL types in a number of places:

 - Pass the record info pointer to toplevel_assert_check, so that it
   can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
   checks

 - Don't attempt to set the vcpu data_type in hvm_process for
   TRC_HVM_EMUL records.

Signed-off-by: George Dunlap 
Acked-by: Andrew Cooper 
---
CC: Andrew Cooper 
CC: Anthony Perard 
CC: Olaf Hering 
---
 tools/xentrace/xenalyze.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/tools/xentrace/xenalyze.c b/tools/xentrace/xenalyze.c
index 9c4463b0e8..d95e52695f 100644
--- a/tools/xentrace/xenalyze.c
+++ b/tools/xentrace/xenalyze.c
@@ -21,6 +21,7 @@
 #define _XOPEN_SOURCE 600
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -5305,8 +5306,11 @@ void hvm_process(struct pcpu_info *p)
 
 assert(p->current);
 
-if(vcpu_set_data_type(p->current, VCPU_DATA_HVM))
-return;
+/* HVM_EMUL types show up in all contexts */
+if(ri->evt.sub != 0x4) {
+if(vcpu_set_data_type(p->current, VCPU_DATA_HVM))
+return;
+}
 
 switch ( ri->evt.sub ) {
 case 2: /* HVM_HANDLER */
@@ -9447,9 +9451,10 @@ static struct tl_assert_mask 
tl_assert_checks[TOPLEVEL_MAX] = {
 /* There are a lot of common assumptions for the various processing
  * routines.  Check them all in one place, doing something else if
  * they don't pass. */
-int toplevel_assert_check(int toplevel, struct pcpu_info *p)
+int toplevel_assert_check(int toplevel, struct record_info *ri, struct 
pcpu_info *p)
 {
 struct tl_assert_mask mask;
+bool is_hvm_emul = (toplevel == TOPLEVEL_HVM) && (ri->evt.sub == 0x4);
 
 mask = tl_assert_checks[toplevel];
 
@@ -9459,7 +9464,7 @@ int toplevel_assert_check(int toplevel, struct pcpu_info 
*p)
 goto fail;
 }
 
-if( mask.not_idle_domain )
+if( mask.not_idle_domain && !is_hvm_emul)
 {
 /* Can't do this check w/o first doing above check */
 assert(mask.p_current);
@@ -9478,7 +9483,8 @@ int toplevel_assert_check(int toplevel, struct pcpu_info 
*p)
 v = p->current;
 
 if ( ! (v->data_type == VCPU_DATA_NONE
-|| v->data_type == mask.vcpu_data_mode) )
+|| v->data_type == mask.vcpu_data_mode
+|| is_hvm_emul) )
 {
 /* This may happen for track_dirty_vram, which causes a 
SHADOW_WRMAP_BF trace f/ dom0 */
 fprintf(warn, "WARNING: Unexpected vcpu data type for d%dv%d on 
proc %d! Expected %d got %d. Not processing\n",
@@ -9525,7 +9531,7 @@ void process_record(struct pcpu_info *p) {
 return;
 
 /* Unify toplevel assertions */
-if ( toplevel_assert_check(toplevel, p) )
+if ( toplevel_assert_check(toplevel, ri, p) )
 {
 switch(toplevel) {
 case TRC_GEN_MAIN:
-- 
2.25.1

Re: [PATCH v4 0/2] Add API for making parts of a MMIO page R/O and use it in XHCI console

2024-05-23 Thread Marek Marczykowski-Górecki

On Wed, May 22, 2024 at 05:39:02PM +0200, Marek Marczykowski-Górecki wrote:
> On older systems, XHCI xcap had a layout that no other (interesting) registers
> were placed on the same page as the debug capability, so Linux was fine with
> making the whole page R/O. But at least on Tiger Lake and Alder Lake, Linux
> needs to write to some other registers on the same page too.
> 
> Add a generic API for making just parts of an MMIO page R/O and use it to fix
> USB3 console with share=yes or share=hwdom options. More details in commit
> messages.
> 
> Marek Marczykowski-Górecki (2):
>   x86/mm: add API for marking only part of a MMIO page read only
>   drivers/char: Use sub-page ro API to make just xhci dbc cap RO

Does any other x86 maintainer feel comfortable ack-ing this series? Jan
already reviewed 2/2 here (but not 1/2 in this version), but also said
he is not comfortable with letting this in without a second maintainer
approval: 
https://lore.kernel.org/xen-devel/7655e401-b927-4250-ae63-05361a5ee...@suse.com/

> 
>  xen/arch/x86/hvm/emulate.c  |   2 +-
>  xen/arch/x86/hvm/hvm.c  |   4 +-
>  xen/arch/x86/include/asm/mm.h   |  25 +++-
>  xen/arch/x86/mm.c   | 273 +-
>  xen/arch/x86/pv/ro-page-fault.c |   6 +-
>  xen/drivers/char/xhci-dbc.c |  36 ++--
>  6 files changed, 327 insertions(+), 19 deletions(-)
> 
> base-commit: b0082b908391b29b7c4dd5e6c389ebd6481926f8
> -- 
> git-series 0.9.1

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature

Re: [PATCH v10 02/14] xen: introduce generic non-atomic test_*bit()

2024-05-23 Thread Oleksii K.

On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> Hi Oleksii,
Hi Julien,

> 
> On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > diff --git a/xen/arch/arm/arm64/livepatch.c
> > b/xen/arch/arm/arm64/livepatch.c
> > index df2cebedde..4bc8ed9be5 100644
> > --- a/xen/arch/arm/arm64/livepatch.c
> > +++ b/xen/arch/arm/arm64/livepatch.c
> > @@ -10,7 +10,6 @@
> >   #include 
> >   #include 
> >   
> > -#include 
> 
> It is a bit unclear how this change is related to the patch. Can you 
> explain in the commit message?
Probably it doesn't need anymore. I will double check and if this
change is not needed, I will just drop it in the next patch version.

> 
> >   #include 
> >   #include 
> >   #include 
> > diff --git a/xen/arch/arm/include/asm/bitops.h
> > b/xen/arch/arm/include/asm/bitops.h
> > index 5104334e48..8e16335e76 100644
> > --- a/xen/arch/arm/include/asm/bitops.h
> > +++ b/xen/arch/arm/include/asm/bitops.h
> > @@ -22,9 +22,6 @@
> >   #define __set_bit(n,p)    set_bit(n,p)
> >   #define __clear_bit(n,p)  clear_bit(n,p)
> >   
> > -#define BITOP_BITS_PER_WORD 32
> > -#define BITOP_MASK(nr)  (1UL << ((nr) %
> > BITOP_BITS_PER_WORD))
> > -#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> >   #define BITS_PER_BYTE   8
> 
> OOI, any reason BITS_PER_BYTE has not been moved as well? I don't
> expect 
> the value to change across arch.
I can move it to generic one header too in the next patch version.

> 
> [...]
> 
> > diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
> > index f14ad0d33a..6eeeff0117 100644
> > --- a/xen/include/xen/bitops.h
> > +++ b/xen/include/xen/bitops.h
> > @@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long
> > x)
> >    * scope
> >    */
> >   
> > +#define BITOP_BITS_PER_WORD 32
> > +typedef uint32_t bitop_uint_t;
> > +
> > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > BITOP_BITS_PER_WORD))
> > +
> > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > +
> > +extern void __bitop_bad_size(void);
> > +
> > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > sizeof(bitop_uint_t))
> > +
> >   /* - Please tidy above here -
> >  */
> >   
> >   #include 
> >   
> > +/**
> > + * generic__test_and_set_bit - Set a bit and return its old value
> > + * @nr: Bit to set
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> 
> Sorry for only mentioning this on v10. I think this comment should be
> duplicated (or moved to) on top of test_bit() because this is what 
> everyone will use. This will avoid the developper to follow the
> function 
> calls and only notice the x86 version which says "This function is 
> atomic and may not be reordered." and would be wrong for all the
> other arch.
It makes sense to add this comment on top of test_bit(), but I am
curious if it is needed to mention that for x86 arch_test_bit() "is
atomic and may not be reordered":

 * This operation is non-atomic and can be reordered. ( Exception: for
* x86 arch_test_bit() is atomic and may not be reordered )
 * If two examples of this operation race, one can appear to succeed
 * but actually fail.  You must protect multiple accesses with a lock.
 */

> 
> > +static always_inline bool
> > +generic__test_and_set_bit(int nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old | mask;
> > +    return (old & mask);
> > +}
> > +
> > +/**
> > + * generic__test_and_clear_bit - Clear a bit and return its old
> > value
> > + * @nr: Bit to clear
> > + * @addr: Address to count from
> > + *
> > + * This operation is non-atomic and can be reordered.
> > + * If two examples of this operation race, one can appear to
> > succeed
> > + * but actually fail.  You must protect multiple accesses with a
> > lock.
> > + */
> 
> Same applies here and ...
> 
> > +static always_inline bool
> > +generic__test_and_clear_bit(int nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old & ~mask;
> > +    return (old & mask);
> > +}
> > +
> > +/* WARNING: non atomic and it can be reordered! */
> 
> ... here.
> 
> > +static always_inline bool
> > +generic__test_and_change_bit(int nr, volatile void *addr)
> > +{
> > +    bitop_uint_t mask = BITOP_MASK(nr);
> > +    volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr +
> > BITOP_WORD(nr);
> > +    bitop_uint_t old = *p;
> > +
> > +    *p = old ^ mask;
> > +    return (old & mask);
> > +}
> > +/**
> > + * generic_test_bit -

Re: [PATCH 4/5] x86/kernel: Move page table macros to new header

2024-05-23 Thread Borislav Petkov

On Thu, May 23, 2024 at 03:59:43PM +0200, Thomas Gleixner wrote:
> On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> > ---
> >  arch/x86/kernel/head_64.S| 22 ++
> >  arch/x86/kernel/pgtable_64_helpers.h | 28 
> 
> That's the wrong place as you want to include it from arch/x86/platform.
> 
> arch/x86/include/asm/

... and there already is a header waiting:

arch/x86/include/asm/pgtable_64.h

so no need for a new one.

Thx.

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH 4/5] x86/kernel: Move page table macros to new header

2024-05-23 Thread Thomas Gleixner

On Wed, Apr 10 2024 at 15:48, Jason Andryuk wrote:
> ---
>  arch/x86/kernel/head_64.S| 22 ++
>  arch/x86/kernel/pgtable_64_helpers.h | 28 

That's the wrong place as you want to include it from arch/x86/platform.

arch/x86/include/asm/

Thanks,

tglx

[xen-unstable-smoke test] 186107: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186107 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186107/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186064

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  d6a7fd83039af36c28bd0ae2174f12c3888ce993
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186064  2024-05-21 15:04:02 Z1 days
Testing same since   186104  2024-05-23 09:00:22 Z0 days2 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Bobby Eshleman 
  Jan Beulich 
  Julien Grall 
  Oleksandr Andrushchenko 
  Oleksii Kurochko 
  Roger Pau Monné 
  Stewart Hildebrand 
  Volodymyr Babchuk 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 387 lines long.)

Re: [PATCH v10 03/14] xen/bitops: implement fls{l}() in common logic

2024-05-23 Thread Julien Grall


Hi,

On 22/05/2024 09:15, Jan Beulich wrote:

On 22.05.2024 09:37, Oleksii K. wrote:

On Tue, 2024-05-21 at 13:18 +0200, Jan Beulich wrote:

On 17.05.2024 15:54, Oleksii Kurochko wrote:

To avoid the compilation error below, it is needed to update to
places
in common/page_alloc.c where flsl() is used as now flsl() returns
unsigned int:

./include/xen/kernel.h:18:21: error: comparison of distinct pointer
types lacks a cast [-Werror]
    18 | (void) (&_x == &_y);    \
   | ^~
     common/page_alloc.c:1843:34: note: in expansion of macro 'min'
  1843 | unsigned int inc_order = min(MAX_ORDER, flsl(e
- s) - 1);

generic_fls{l} was used instead of __builtin_clz{l}(x) as if x is
0,
the result in undefined.

The prototype of the per-architecture fls{l}() functions was
changed to
return 'unsigned int' to align with the generic implementation of
these
functions and avoid introducing signed/unsigned mismatches.

Signed-off-by: Oleksii Kurochko 
---
  The patch is almost independent from Andrew's patch series
  (
https://lore.kernel.org/xen-devel/20240313172716.2325427-1-andrew.coop...@citrix.com/T/#t
)
  except test_fls() function which IMO can be merged as a separate
patch after Andrew's patch
  will be fully ready.


If there wasn't this dependency (I don't think it's "almost
independent"),
I'd be offering R-b with again one nit below.


Aren't all changes, except those in xen/common/bitops.c, independent? I
could move these changes in xen/common/bitops.c to a separate commit. I
think it is safe to commit them ( an introduction of common logic for
fls{l}() and tests ) separately since the CI tests have passed.


Technically they might be, but contextually there are further conflicts.
Just try "patch --dry-run" on top of a plain staging tree. You really
need to settle, perhaps consulting Andrew, whether you want to go on top
of his change, or ahead of it. I'm not willing to approve a patch that's
presented one way but then is (kind of) expected to go in the other way.


I agree with what Jan wrote. I don't have any strong opinion on which 
order they should be merged. But, if your series is intended to be 
merged before Andrew's one then please rebase to vanilla staging.


I looked at the rest of the patch and it LGTM.

Cheers,

--
Julien Grall

Re: [XEN PATCH v2 07/15] x86: guard cpu_has_{svm/vmx} macros with CONFIG_{SVM/VMX}

2024-05-23 Thread Sergiy Kibrik


16.05.24 14:12, Jan Beulich:

On 15.05.2024 11:12, Sergiy Kibrik wrote:

--- a/xen/arch/x86/include/asm/cpufeature.h
+++ b/xen/arch/x86/include/asm/cpufeature.h
@@ -81,7 +81,8 @@ static inline bool boot_cpu_has(unsigned int feat)
  #define cpu_has_sse3boot_cpu_has(X86_FEATURE_SSE3)
  #define cpu_has_pclmulqdq   boot_cpu_has(X86_FEATURE_PCLMULQDQ)
  #define cpu_has_monitor boot_cpu_has(X86_FEATURE_MONITOR)
-#define cpu_has_vmx boot_cpu_has(X86_FEATURE_VMX)
+#define cpu_has_vmx ( IS_ENABLED(CONFIG_VMX) && \
+  boot_cpu_has(X86_FEATURE_VMX))
  #define cpu_has_eistboot_cpu_has(X86_FEATURE_EIST)
  #define cpu_has_ssse3   boot_cpu_has(X86_FEATURE_SSSE3)
  #define cpu_has_fma boot_cpu_has(X86_FEATURE_FMA)
@@ -109,7 +110,8 @@ static inline bool boot_cpu_has(unsigned int feat)
  
  /* CPUID level 0x8001.ecx */

  #define cpu_has_cmp_legacy  boot_cpu_has(X86_FEATURE_CMP_LEGACY)
-#define cpu_has_svm boot_cpu_has(X86_FEATURE_SVM)
+#define cpu_has_svm ( IS_ENABLED(CONFIG_SVM) && \
+  boot_cpu_has(X86_FEATURE_SVM))
  #define cpu_has_sse4a   boot_cpu_has(X86_FEATURE_SSE4A)
  #define cpu_has_xop boot_cpu_has(X86_FEATURE_XOP)
  #define cpu_has_skinit  boot_cpu_has(X86_FEATURE_SKINIT)


Hmm, leaving aside the style issue (stray blanks after opening parentheses,
and as a result one-off indentation on the wrapped lines) I'm not really
certain we can do this. The description goes into detail why we would want
this, but it doesn't cover at all why it is safe for all present (and
ideally also future) uses. I wouldn't be surprised if we had VMX/SVM checks
just to derive further knowledge from that, without them being directly
related to the use of VMX/SVM. Take a look at calculate_hvm_max_policy(),
for example. While it looks to be okay there, it may give you an idea of
what I mean.

Things might become better separated if instead for such checks we used
host and raw CPU policies instead of cpuinfo_x86.x86_capability[]. But
that's still pretty far out, I'm afraid.



I've followed a suggestion you made for patch in previous series:

https://lore.kernel.org/xen-devel/8fbd604e-5e5d-410c-880f-2ad257bbe...@suse.com/

yet if this approach can potentially be unsafe (I'm not completely sure 
it's safe), should we instead fallback to the way it was done in v1 
series? I.e. guard calls to vmx/svm-specific calls where needed, like in 
these 3 patches:


1) 
https://lore.kernel.org/xen-devel/20240416063328.3469386-1-sergiy_kib...@epam.com/


2) 
https://lore.kernel.org/xen-devel/20240416063740.3469592-1-sergiy_kib...@epam.com/


3) 
https://lore.kernel.org/xen-devel/20240416063947.3469718-1-sergiy_kib...@epam.com/



  -Sergiy

Re: [PATCH v10 02/14] xen: introduce generic non-atomic test_*bit()

2024-05-23 Thread Julien Grall


Hi Oleksii,

On 17/05/2024 14:54, Oleksii Kurochko wrote:

diff --git a/xen/arch/arm/arm64/livepatch.c b/xen/arch/arm/arm64/livepatch.c
index df2cebedde..4bc8ed9be5 100644
--- a/xen/arch/arm/arm64/livepatch.c
+++ b/xen/arch/arm/arm64/livepatch.c
@@ -10,7 +10,6 @@
  #include 
  #include 
  
-#include 


It is a bit unclear how this change is related to the patch. Can you 
explain in the commit message?



  #include 
  #include 
  #include 
diff --git a/xen/arch/arm/include/asm/bitops.h 
b/xen/arch/arm/include/asm/bitops.h
index 5104334e48..8e16335e76 100644
--- a/xen/arch/arm/include/asm/bitops.h
+++ b/xen/arch/arm/include/asm/bitops.h
@@ -22,9 +22,6 @@
  #define __set_bit(n,p)set_bit(n,p)
  #define __clear_bit(n,p)  clear_bit(n,p)
  
-#define BITOP_BITS_PER_WORD 32

-#define BITOP_MASK(nr)  (1UL << ((nr) % BITOP_BITS_PER_WORD))
-#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
  #define BITS_PER_BYTE   8


OOI, any reason BITS_PER_BYTE has not been moved as well? I don't expect 
the value to change across arch.


[...]


diff --git a/xen/include/xen/bitops.h b/xen/include/xen/bitops.h
index f14ad0d33a..6eeeff0117 100644
--- a/xen/include/xen/bitops.h
+++ b/xen/include/xen/bitops.h
@@ -65,10 +65,141 @@ static inline int generic_flsl(unsigned long x)
   * scope
   */
  
+#define BITOP_BITS_PER_WORD 32

+typedef uint32_t bitop_uint_t;
+
+#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) % BITOP_BITS_PER_WORD))
+
+#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
+
+extern void __bitop_bad_size(void);
+
+#define bitop_bad_size(addr) (sizeof(*(addr)) < sizeof(bitop_uint_t))
+
  /* - Please tidy above here - */
  
  #include 
  
+/**

+ * generic__test_and_set_bit - Set a bit and return its old value
+ * @nr: Bit to set
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */


Sorry for only mentioning this on v10. I think this comment should be 
duplicated (or moved to) on top of test_bit() because this is what 
everyone will use. This will avoid the developper to follow the function 
calls and only notice the x86 version which says "This function is 
atomic and may not be reordered." and would be wrong for all the other arch.



+static always_inline bool
+generic__test_and_set_bit(int nr, volatile void *addr)
+{
+bitop_uint_t mask = BITOP_MASK(nr);
+volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+bitop_uint_t old = *p;
+
+*p = old | mask;
+return (old & mask);
+}
+
+/**
+ * generic__test_and_clear_bit - Clear a bit and return its old value
+ * @nr: Bit to clear
+ * @addr: Address to count from
+ *
+ * This operation is non-atomic and can be reordered.
+ * If two examples of this operation race, one can appear to succeed
+ * but actually fail.  You must protect multiple accesses with a lock.
+ */


Same applies here and ...


+static always_inline bool
+generic__test_and_clear_bit(int nr, volatile void *addr)
+{
+bitop_uint_t mask = BITOP_MASK(nr);
+volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+bitop_uint_t old = *p;
+
+*p = old & ~mask;
+return (old & mask);
+}
+
+/* WARNING: non atomic and it can be reordered! */


... here.


+static always_inline bool
+generic__test_and_change_bit(int nr, volatile void *addr)
+{
+bitop_uint_t mask = BITOP_MASK(nr);
+volatile bitop_uint_t *p = (volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+bitop_uint_t old = *p;
+
+*p = old ^ mask;
+return (old & mask);
+}
+/**
+ * generic_test_bit - Determine whether a bit is set
+ * @nr: bit number to test
+ * @addr: Address to start counting from
+ */
+static always_inline bool generic_test_bit(int nr, const volatile void *addr)
+{
+bitop_uint_t mask = BITOP_MASK(nr);
+const volatile bitop_uint_t *p =
+(const volatile bitop_uint_t *)addr + BITOP_WORD(nr);
+
+return (*p & mask);
+}
+
+static always_inline bool
+__test_and_set_bit(int nr, volatile void *addr)
+{
+#ifndef arch__test_and_set_bit
+#define arch__test_and_set_bit generic__test_and_set_bit
+#endif
+
+return arch__test_and_set_bit(nr, addr);
+}


NIT: It is a bit too late to change this one. But I have to admit, I 
don't understand the purpose of the static inline when you could have 
simply call...



+#define __test_and_set_bit(nr, addr) ({ \
+if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
+__test_and_set_bit(nr, addr);   \


... __arch__test_and_set_bit here.


The only two reasons I am not providing an ack is the:
 * Explanation for the removal of asm/bitops.h in livepatch.c
 * The placement of the comments

There are not too important for me.

Cheers,

--
Julien Grall

Re: [for-4.19] Re: [XEN PATCH v3] arm/mem_access: add conditional build of mem_access.c

2024-05-23 Thread Julien Grall


Hi Oleksii,

On 23/05/2024 09:04, Oleksii K. wrote:

On Wed, 2024-05-22 at 21:50 +0100, Julien Grall wrote:

Hi,

Adding Oleksii as the release manager.

On 22/05/2024 19:27, Tamas K Lengyel wrote:

On Fri, May 10, 2024 at 8:32 AM Alessandro Zucchelli
 wrote:


In order to comply to MISRA C:2012 Rule 8.4 for ARM the following
changes are done:
revert preprocessor conditional changes to xen/mem_access.h which
had it build unconditionally, add conditional build for
xen/mem_access.c
as well and provide stubs in asm/mem_access.h for the users of
this
header.

Signed-off-by: Alessandro Zucchelli



Acked-by: Tamas K Lengyel 


Oleksii, would you be happy if this patch is committed for 4.19?

Sure:
  Release-acked-by: Oleksii Kurochko 


Thanks. It is now committed.





BTW, do you want to be release-ack every bug until the hard code
freeze?
Or would you be fine to levea the decision to the maintainers?

I would prefer to leave the decision to the maintainers.


Ok. I will keep it in mind for the bug fixes until the hard code.

Cheers,

--
Julien Grall

Re: [PATCH 5/5] x86/pvh: Add 64bit relocation page tables

2024-05-23 Thread Juergen Gross


On 10.04.24 21:48, Jason Andryuk wrote:

The PVH entry point is 32bit.  For a 64bit kernel, the entry point must
switch to 64bit mode, which requires a set of page tables.  In the past,
PVH used init_top_pgt.

This works fine when the kernel is loaded at LOAD_PHYSICAL_ADDR, as the
page tables are prebuilt for this address.  If the kernel is loaded at a
different address, they need to be adjusted.

__startup_64() adjusts the prebuilt page tables for the physical load
address, but it is 64bit code.  The 32bit PVH entry code can't call it
to adjust the page tables, so it can't readily be re-used.

64bit PVH entry needs page tables set up for identity map, the kernel
high map and the direct map.  pvh_start_xen() enters identity mapped.
Inside xen_prepare_pvh(), it jumps through a pv_ops function pointer
into the highmap.  The direct map is used for __va() on the initramfs
and other guest physical addresses.

Add a dedicated set of prebuild page tables for PVH entry.  They are
adjusted in assembly before loading.

Add XEN_ELFNOTE_PHYS32_RELOC to indicate support for relocation
along with the kernel's loading constraints.  The maximum load address,
KERNEL_IMAGE_SIZE - 1, is determined by a single pvh_level2_ident_pgt
page.  It could be larger with more pages.

Signed-off-by: Jason Andryuk 
---
Instead of adding 5 pages of prebuilt page tables, they could be
contructed dynamically in the .bss area.  They are then only used for
PVH entry and until transitioning to init_top_pgt.  The .bss is later
cleared.  It's safer to add the dedicated pages, so that is done here.
---
  arch/x86/platform/pvh/head.S | 105 ++-
  1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index c08d08d8cc92..4af3cfbcf2f8 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -21,6 +21,8 @@
  #include 
  #include 
  
+#include "../kernel/pgtable_64_helpers.h"

+
__HEAD
  
  /*

@@ -102,8 +104,47 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
btsl $_EFER_LME, %eax
wrmsr
  
+	mov %ebp, %ebx

+   subl $LOAD_PHYSICAL_ADDR, %ebx /* offset */
+   jz .Lpagetable_done
+
+   /* Fixup page-tables for relocation. */
+   leal rva(pvh_init_top_pgt)(%ebp), %edi
+   movl $512, %ecx


Please use PTRS_PER_PGD instead of the literal 512. Similar issue below.


+2:
+   testl $_PAGE_PRESENT, 0x00(%edi)
+   jz 1f
+   addl %ebx, 0x00(%edi)
+1:
+   addl $8, %edi
+   decl %ecx
+   jnz 2b
+
+   /* L3 ident has a single entry. */
+   leal rva(pvh_level3_ident_pgt)(%ebp), %edi
+   addl %ebx, 0x00(%edi)
+
+   leal rva(pvh_level3_kernel_pgt)(%ebp), %edi
+   addl %ebx, (4096 - 16)(%edi)
+   addl %ebx, (4096 - 8)(%edi)


PAGE_SIZE instead of 4096, please.


+
+   /* pvh_level2_ident_pgt is fine - large pages */
+
+   /* pvh_level2_kernel_pgt needs adjustment - large pages */
+   leal rva(pvh_level2_kernel_pgt)(%ebp), %edi
+   movl $512, %ecx
+2:
+   testl $_PAGE_PRESENT, 0x00(%edi)
+   jz 1f
+   addl %ebx, 0x00(%edi)
+1:
+   addl $8, %edi
+   decl %ecx
+   jnz 2b
+
+.Lpagetable_done:
/* Enable pre-constructed page tables. */
-   leal rva(init_top_pgt)(%ebp), %eax
+   leal rva(pvh_init_top_pgt)(%ebp), %eax
mov %eax, %cr3
mov $(X86_CR0_PG | X86_CR0_PE), %eax
mov %eax, %cr0
@@ -197,5 +238,67 @@ SYM_DATA_START_LOCAL(early_stack)
.fill BOOT_STACK_SIZE, 1, 0
  SYM_DATA_END_LABEL(early_stack, SYM_L_LOCAL, early_stack_end)
  
+#ifdef CONFIG_X86_64

+/*
+ * Xen PVH needs a set of identity mapped and kernel high mapping
+ * page tables.  pvh_start_xen starts running on the identity mapped
+ * page tables, but xen_prepare_pvh calls into the high mapping.
+ * These page tables need to be relocatable and are only used until
+ * startup_64 transitions to init_top_pgt.
+ */
+SYM_DATA_START_PAGE_ALIGNED(pvh_init_top_pgt)
+   .quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+   .orgpvh_init_top_pgt + L4_PAGE_OFFSET*8, 0


Please add a space before and after the '*'.


+   .quad   pvh_level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+   .orgpvh_init_top_pgt + L4_START_KERNEL*8, 0
+   /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
+   .quad   pvh_level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC
+SYM_DATA_END(pvh_init_top_pgt)
+
+SYM_DATA_START_PAGE_ALIGNED(pvh_level3_ident_pgt)
+   .quad   pvh_level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC
+   .fill   511, 8, 0
+SYM_DATA_END(pvh_level3_ident_pgt)
+SYM_DATA_START_PAGE_ALIGNED(pvh_level2_ident_pgt)
+   /*
+* Since I easily can, map the first 1G.
+* Don't set NX because code runs from these pages.
+*
+* Note: This sets _PAGE_GLOBAL despite whether
+* the CPU supports it or it is enabled.  But,
+* the

Re: [XEN PATCH] x86/iommu: Conditionally compile platform-specific union entries

2024-05-23 Thread Teddy Astie

Le 23/05/2024 à 11:52, Roger Pau Monné a écrit :
> The #ifdef and #endif processor directives shouldn't be indented.
>
> Would you mind adding /* CONFIG_{AMD,INTEL}_IOMMU */ comments in the
> #endif directives?
>

Sure, will change it for v2.

> I wonder if we could move the definitions of those structures to the
> vendor specific headers, but that's more convoluted, and would require
> including the iommu headers in pci.h

Do you mean moving the vtd/amd union entries to separate structures (e.g
vtd_arch_iommu) and put them into another file (I don't see any
vendor-specific headers for this, perhaps create ones ?).

>
> Thanks, Roger.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

Re: [PATCH 4/5] x86/kernel: Move page table macros to new header

2024-05-23 Thread Juergen Gross


On 10.04.24 21:48, Jason Andryuk wrote:

The PVH entry point will need an additional set of prebuild page tables.
Move the macros and defines to a new header so they can be re-used.

Signed-off-by: Jason Andryuk 


With the one nit below addressed:

Reviewed-by: Juergen Gross 

...


diff --git a/arch/x86/kernel/pgtable_64_helpers.h 
b/arch/x86/kernel/pgtable_64_helpers.h
new file mode 100644
index ..0ae87d768ce2
--- /dev/null
+++ b/arch/x86/kernel/pgtable_64_helpers.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __PGTABLES_64_H__
+#define __PGTABLES_64_H__
+
+#ifdef __ASSEMBLY__
+
+#define l4_index(x)(((x) >> 39) & 511)
+#define pud_index(x)   (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1))


Please fix the minor style issue in this line by s/-/ - /


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH for-4.19 v3 2/3] xen: enable altp2m at create domain domctl

2024-05-23 Thread Roger Pau Monné

On Fri, May 17, 2024 at 03:33:51PM +0200, Roger Pau Monne wrote:
> Enabling it using an HVM param is fragile, and complicates the logic when
> deciding whether options that interact with altp2m can also be enabled.
> 
> Leave the HVM param value for consumption by the guest, but prevent it from
> being set.  Enabling is now done using and additional altp2m specific field in
> xen_domctl_createdomain.
> 
> Note that albeit only currently implemented in x86, altp2m could be 
> implemented
> in other architectures, hence why the field is added to 
> xen_domctl_createdomain
> instead of xen_arch_domainconfig.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v2:
>  - Introduce a new altp2m field in xen_domctl_createdomain.
> 
> Changes since v1:
>  - New in this version.
> ---
>  tools/libs/light/libxl_create.c | 23 ++-
>  tools/libs/light/libxl_x86.c| 26 --
>  tools/ocaml/libs/xc/xenctrl_stubs.c |  2 +-
>  xen/arch/arm/domain.c   |  6 ++

Could I get an Ack from one of the Arm maintainers for the trivial Arm
change?

Thanks, Roger.

[PATCH 7/7] x86/defns: Clean up X86_{XCR0,XSS}_* constants

2024-05-23 Thread Andrew Cooper

With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused.

X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v3:
 * New
---
 xen/arch/x86/i387.c  |  2 +-
 xen/arch/x86/include/asm/x86-defns.h | 32 ++--
 xen/arch/x86/include/asm/xstate.h|  4 ++--
 xen/arch/x86/xstate.c| 18 
 4 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/xen/arch/x86/i387.c b/xen/arch/x86/i387.c
index 7a4297cc921e..fcdee10a6e69 100644
--- a/xen/arch/x86/i387.c
+++ b/xen/arch/x86/i387.c
@@ -369,7 +369,7 @@ void vcpu_setup_fpu(struct vcpu *v, struct xsave_struct 
*xsave_area,
 {
 v->arch.xsave_area->xsave_hdr.xstate_bv &= ~XSTATE_FP_SSE;
 if ( fcw_default != FCW_DEFAULT )
-v->arch.xsave_area->xsave_hdr.xstate_bv |= X86_XCR0_FP;
+v->arch.xsave_area->xsave_hdr.xstate_bv |= X86_XCR0_X87;
 }
 }
 
diff --git a/xen/arch/x86/include/asm/x86-defns.h 
b/xen/arch/x86/include/asm/x86-defns.h
index d7602ab225c4..3bcdbaccd3aa 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -79,25 +79,16 @@
 /*
  * XSTATE component flags in XCR0 | MSR_XSS
  */
-#define X86_XCR0_FP_POS   0
-#define X86_XCR0_FP   (1ULL << X86_XCR0_FP_POS)
-#define X86_XCR0_SSE_POS  1
-#define X86_XCR0_SSE  (1ULL << X86_XCR0_SSE_POS)
-#define X86_XCR0_YMM_POS  2
-#define X86_XCR0_YMM  (1ULL << X86_XCR0_YMM_POS)
-#define X86_XCR0_BNDREGS_POS  3
-#define X86_XCR0_BNDREGS  (1ULL << X86_XCR0_BNDREGS_POS)
-#define X86_XCR0_BNDCSR_POS   4
-#define X86_XCR0_BNDCSR   (1ULL << X86_XCR0_BNDCSR_POS)
-#define X86_XCR0_OPMASK_POS   5
-#define X86_XCR0_OPMASK   (1ULL << X86_XCR0_OPMASK_POS)
-#define X86_XCR0_ZMM_POS  6
-#define X86_XCR0_ZMM  (1ULL << X86_XCR0_ZMM_POS)
-#define X86_XCR0_HI_ZMM_POS   7
-#define X86_XCR0_HI_ZMM   (1ULL << X86_XCR0_HI_ZMM_POS)
+#define X86_XCR0_X87  (_AC(1, ULL) <<  0)
+#define X86_XCR0_SSE  (_AC(1, ULL) <<  1)
+#define X86_XCR0_YMM  (_AC(1, ULL) <<  2)
+#define X86_XCR0_BNDREGS  (_AC(1, ULL) <<  3)
+#define X86_XCR0_BNDCSR   (_AC(1, ULL) <<  4)
+#define X86_XCR0_OPMASK   (_AC(1, ULL) <<  5)
+#define X86_XCR0_ZMM  (_AC(1, ULL) <<  6)
+#define X86_XCR0_HI_ZMM   (_AC(1, ULL) <<  7)
 #define X86_XSS_PROC_TRACE(_AC(1, ULL) <<  8)
-#define X86_XCR0_PKRU_POS 9
-#define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS)
+#define X86_XCR0_PKRU (_AC(1, ULL) <<  9)
 #define X86_XSS_PASID (_AC(1, ULL) << 10)
 #define X86_XSS_CET_U (_AC(1, ULL) << 11)
 #define X86_XSS_CET_S (_AC(1, ULL) << 12)
@@ -107,11 +98,10 @@
 #define X86_XSS_HWP   (_AC(1, ULL) << 16)
 #define X86_XCR0_TILE_CFG (_AC(1, ULL) << 17)
 #define X86_XCR0_TILE_DATA(_AC(1, ULL) << 18)
-#define X86_XCR0_LWP_POS  62
-#define X86_XCR0_LWP  (1ULL << X86_XCR0_LWP_POS)
+#define X86_XCR0_LWP  (_AC(1, ULL) << 62)
 
 #define X86_XCR0_STATES \
-(X86_XCR0_FP | X86_XCR0_SSE | X86_XCR0_YMM | X86_XCR0_BNDREGS | \
+(X86_XCR0_X87 | X86_XCR0_SSE | X86_XCR0_YMM | X86_XCR0_BNDREGS |\
  X86_XCR0_BNDCSR | X86_XCR0_OPMASK | X86_XCR0_ZMM | \
  X86_XCR0_HI_ZMM | X86_XCR0_PKRU | X86_XCR0_TILE_CFG |  \
  X86_XCR0_TILE_DATA |   \
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index da1d89d2f416..f4a8e5f814a0 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -29,8 +29,8 @@ extern uint32_t mxcsr_mask;
 #define XSAVE_HDR_OFFSET  FXSAVE_SIZE
 #define XSTATE_AREA_MIN_SIZE  (FXSAVE_SIZE + XSAVE_HDR_SIZE)
 
-#define XSTATE_FP_SSE  (X86_XCR0_FP | X86_XCR0_SSE)
-#define XCNTXT_MASK(X86_XCR0_FP | X86_XCR0_SSE | X86_XCR0_YMM | \
+#define XSTATE_FP_SSE  (X86_XCR0_X87 | X86_XCR0_SSE)
+#define XCNTXT_MASK(X86_XCR0_X87 | X86_XCR0_SSE | X86_XCR0_YMM | \
 X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM | \
 XSTATE_NONLAZY)
 
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 7b7f2dcaf651..0ed2541665b3 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -313,7 +313,7 @@ void xsave(struct vcpu *v, uint64_t mask)
"=m" (*ptr), \
"a" (lmask), "d" (hmask), "D" (ptr))
 
-if ( fip_width == 8 || !(mask & X86_XCR0_FP) )
+if ( fip_width == 8 || !(mask & X86_XCR0_X87) )
 {

[PATCH 3/7] x86/boot: Collect the Raw CPU Policy earlier on boot

2024-05-23 Thread Andrew Cooper

This is a tangle, but it's a small step in the right direction.

xstate_init() is shortly going to want data from the Raw policy.
calculate_raw_cpu_policy() is sufficiently separate from the other policies to
be safe to do.

No functional change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

This is necessary for the forthcoming xstate_{un,}compressed_size() to perform
boot-time sanity checks on state components which aren't fully enabled yet.  I
decided that doing this was better than extending the xstate_{offsets,sizes}[]
logic that we're intending to retire in due course.

v3:
 * New.
---
 xen/arch/x86/cpu-policy.c | 1 -
 xen/arch/x86/setup.c  | 4 +++-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index b96f4ee55cc4..5b66f002df05 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -845,7 +845,6 @@ static void __init calculate_hvm_def_policy(void)
 
 void __init init_guest_cpu_policies(void)
 {
-calculate_raw_cpu_policy();
 calculate_host_policy();
 
 if ( IS_ENABLED(CONFIG_PV) )
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index b50c9c84af6d..8850e5637a98 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1888,7 +1888,9 @@ void asmlinkage __init noreturn __start_xen(unsigned long 
mbi_p)
 
 tsx_init(); /* Needs microcode.  May change HLE/RTM feature bits. */
 
-identify_cpu(_cpu_data);
+calculate_raw_cpu_policy(); /* Needs microcode.  No other dependenices. */
+
+identify_cpu(_cpu_data); /* Needs microcode and raw policy. */
 
 set_in_cr4(X86_CR4_OSFXSR | X86_CR4_OSXMMEXCPT);
 
-- 
2.30.2

[PATCH 5/7] x86/cpu-policy: Simplify recalculate_xstate()

2024-05-23 Thread Andrew Cooper

Make use of xstate_uncompressed_size() helper rather than maintaining the
running calculation while accumulating feature components.

The rest of the CPUID data can come direct from the raw cpu policy.  All
per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
instructions.

Use for_each_set_bit() rather than opencoding a slightly awkward version of
it.  Mask the attributes in ecx down based on the visible features.  This
isn't actually necessary for any components or attributes defined at the time
of writing (up to AMX), but is added out of an abundance of caution.

Signed-off-by: Andrew Cooper 
Reviewed-by: Jan Beulich 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v2:
 * Tie ALIGN64 to xsavec rather than xsaves.
v3:
 * Tweak commit message.
---
 xen/arch/x86/cpu-policy.c | 55 +++
 xen/arch/x86/include/asm/xstate.h |  1 +
 2 files changed, 21 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/cpu-policy.c b/xen/arch/x86/cpu-policy.c
index 5b66f002df05..304dc20cfab8 100644
--- a/xen/arch/x86/cpu-policy.c
+++ b/xen/arch/x86/cpu-policy.c
@@ -193,8 +193,7 @@ static void sanitise_featureset(uint32_t *fs)
 static void recalculate_xstate(struct cpu_policy *p)
 {
 uint64_t xstates = XSTATE_FP_SSE;
-uint32_t xstate_size = XSTATE_AREA_MIN_SIZE;
-unsigned int i, Da1 = p->xstate.Da1;
+unsigned int i, ecx_mask = 0, Da1 = p->xstate.Da1;
 
 /*
  * The Da1 leaf is the only piece of information preserved in the common
@@ -206,61 +205,47 @@ static void recalculate_xstate(struct cpu_policy *p)
 return;
 
 if ( p->basic.avx )
-{
 xstates |= X86_XCR0_YMM;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_YMM_POS] +
-  xstate_sizes[X86_XCR0_YMM_POS]);
-}
 
 if ( p->feat.mpx )
-{
 xstates |= X86_XCR0_BNDREGS | X86_XCR0_BNDCSR;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_BNDCSR_POS] +
-  xstate_sizes[X86_XCR0_BNDCSR_POS]);
-}
 
 if ( p->feat.avx512f )
-{
 xstates |= X86_XCR0_OPMASK | X86_XCR0_ZMM | X86_XCR0_HI_ZMM;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_HI_ZMM_POS] +
-  xstate_sizes[X86_XCR0_HI_ZMM_POS]);
-}
 
 if ( p->feat.pku )
-{
 xstates |= X86_XCR0_PKRU;
-xstate_size = max(xstate_size,
-  xstate_offsets[X86_XCR0_PKRU_POS] +
-  xstate_sizes[X86_XCR0_PKRU_POS]);
-}
 
-p->xstate.max_size  =  xstate_size;
+/* Subleaf 0 */
+p->xstate.max_size =
+xstate_uncompressed_size(xstates & ~XSTATE_XSAVES_ONLY);
 p->xstate.xcr0_low  =  xstates & ~XSTATE_XSAVES_ONLY;
 p->xstate.xcr0_high = (xstates & ~XSTATE_XSAVES_ONLY) >> 32;
 
+/* Subleaf 1 */
 p->xstate.Da1 = Da1;
+if ( p->xstate.xsavec )
+ecx_mask |= XSTATE_ALIGN64;
+
 if ( p->xstate.xsaves )
 {
+ecx_mask |= XSTATE_XSS;
 p->xstate.xss_low   =  xstates & XSTATE_XSAVES_ONLY;
 p->xstate.xss_high  = (xstates & XSTATE_XSAVES_ONLY) >> 32;
 }
-else
-xstates &= ~XSTATE_XSAVES_ONLY;
 
-for ( i = 2; i < min(63UL, ARRAY_SIZE(p->xstate.comp)); ++i )
+/* Subleafs 2+ */
+xstates &= ~XSTATE_FP_SSE;
+BUILD_BUG_ON(ARRAY_SIZE(p->xstate.comp) < 63);
+for_each_set_bit ( i, , 63 )
 {
-uint64_t curr_xstate = 1UL << i;
-
-if ( !(xstates & curr_xstate) )
-continue;
-
-p->xstate.comp[i].size   = xstate_sizes[i];
-p->xstate.comp[i].offset = xstate_offsets[i];
-p->xstate.comp[i].xss= curr_xstate & XSTATE_XSAVES_ONLY;
-p->xstate.comp[i].align  = curr_xstate & xstate_align;
+/*
+ * Pass through size (eax) and offset (ebx) directly.  Visbility of
+ * attributes in ecx limited by visible features in Da1.
+ */
+p->xstate.raw[i].a = raw_cpu_policy.xstate.raw[i].a;
+p->xstate.raw[i].b = raw_cpu_policy.xstate.raw[i].b;
+p->xstate.raw[i].c = raw_cpu_policy.xstate.raw[i].c & ecx_mask;
 }
 }
 
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index f5115199d4f9..bfb66dd766b6 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -40,6 +40,7 @@ extern uint32_t mxcsr_mask;
 #define XSTATE_XSAVES_ONLY 0
 #define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 
+#define XSTATE_XSS (1U << 0)
 #define XSTATE_ALIGN64 (1U << 1)
 
 extern u64 xfeature_mask;
-- 
2.30.2

[PATCH 1/7] x86/xstate: Fix initialisation of XSS cache

2024-05-23 Thread Andrew Cooper

The clobbering of this_cpu(xcr0) and this_cpu(xss) to architecturally invalid
values is to force the subsequent set_xcr0() and set_msr_xss() to reload the
hardware register.

While XCR0 is reloaded in xstate_init(), MSR_XSS isn't.  This causes
get_msr_xss() to return the invalid value, and logic of the form:

  old = get_msr_xss();
  set_msr_xss(new);
  ...
  set_msr_xss(old);

to try and restore the architecturally invalid value.

The architecturally invalid value must be purged from the cache, meaning the
hardware register must be written at least once.  This in turn highlights that
the invalid value must only be used in the case that the hardware register is
available.

Fixes: f7f4a523927f ("x86/xstate: reset cached register values on resume")
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v3:
 * Split out of later patch
---
 xen/arch/x86/xstate.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 99cedb4f5e24..75788147966a 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -641,13 +641,6 @@ void xstate_init(struct cpuinfo_x86 *c)
 return;
 }
 
-/*
- * Zap the cached values to make set_xcr0() and set_msr_xss() really
- * write it.
- */
-this_cpu(xcr0) = 0;
-this_cpu(xss) = ~0;
-
 cpuid_count(XSTATE_CPUID, 0, , , , );
 feature_mask = (((u64)edx << 32) | eax) & XCNTXT_MASK;
 BUG_ON(!valid_xcr0(feature_mask));
@@ -657,8 +650,19 @@ void xstate_init(struct cpuinfo_x86 *c)
  * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size.
  */
 set_in_cr4(X86_CR4_OSXSAVE);
+
+/*
+ * Zap the cached values to make set_xcr0() and set_msr_xss() really write
+ * the hardware register.
+ */
+this_cpu(xcr0) = 0;
 if ( !set_xcr0(feature_mask) )
 BUG();
+if ( cpu_has_xsaves )
+{
+this_cpu(xss) = ~0;
+set_msr_xss(0);
+}
 
 if ( bsp )
 {
-- 
2.30.2

[PATCH 6/7] x86/cpuid: Fix handling of XSAVE dynamic leaves

2024-05-23 Thread Andrew Cooper

First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.

Second, the comment concerning XSS state is wrong.  VT-x doesn't manage
host/guest state automatically, but there is provision for "host only" bits to
be set, so the implications are still accurate.

Introduce xstate_compressed_size() to mirror the uncompressed one.  Cross
check it at boot.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 

v3:
 * Adjust commit message about !XSAVE guests
 * Rebase over boot time cross check
 * Use raw policy
---
 xen/arch/x86/cpuid.c  | 24 --
 xen/arch/x86/include/asm/xstate.h |  1 +
 xen/arch/x86/xstate.c | 34 +++
 3 files changed, 43 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/cpuid.c b/xen/arch/x86/cpuid.c
index 7a38e032146a..a822e80c7ea7 100644
--- a/xen/arch/x86/cpuid.c
+++ b/xen/arch/x86/cpuid.c
@@ -330,23 +330,15 @@ void guest_cpuid(const struct vcpu *v, uint32_t leaf,
 case XSTATE_CPUID:
 switch ( subleaf )
 {
-case 1:
-if ( !p->xstate.xsavec && !p->xstate.xsaves )
-break;
-
-/*
- * TODO: Figure out what to do for XSS state.  VT-x manages host
- * vs guest MSR_XSS automatically, so as soon as we start
- * supporting any XSS states, the wrong XSS will be in context.
- */
-BUILD_BUG_ON(XSTATE_XSAVES_ONLY != 0);
-fallthrough;
 case 0:
-/*
- * Read CPUID[0xD,0/1].EBX from hardware.  They vary with enabled
- * XSTATE, and appropriate XCR0|XSS are in context.
- */
-res->b = cpuid_count_ebx(leaf, subleaf);
+if ( p->basic.xsave )
+res->b = xstate_uncompressed_size(v->arch.xcr0);
+break;
+
+case 1:
+if ( p->xstate.xsavec )
+res->b = xstate_compressed_size(v->arch.xcr0 |
+v->arch.msrs->xss.raw);
 break;
 }
 break;
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index bfb66dd766b6..da1d89d2f416 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -109,6 +109,7 @@ void xstate_free_save_area(struct vcpu *v);
 int xstate_alloc_save_area(struct vcpu *v);
 void xstate_init(struct cpuinfo_x86 *c);
 unsigned int xstate_uncompressed_size(uint64_t xcr0);
+unsigned int xstate_compressed_size(uint64_t xstates);
 
 static inline uint64_t xgetbv(unsigned int index)
 {
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 1b3153600d9c..7b7f2dcaf651 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -621,6 +621,34 @@ unsigned int xstate_uncompressed_size(uint64_t xcr0)
 return size;
 }
 
+unsigned int xstate_compressed_size(uint64_t xstates)
+{
+unsigned int i, size = XSTATE_AREA_MIN_SIZE;
+
+if ( xstates == 0 ) /* TODO: clean up paths passing 0 in here. */
+return 0;
+
+if ( xstates <= (X86_XCR0_SSE | X86_XCR0_FP) )
+return size;
+
+/*
+ * For the compressed size, every component matters.  Some componenets are
+ * rounded up to 64 first.
+ */
+xstates &= ~(X86_XCR0_SSE | X86_XCR0_FP);
+for_each_set_bit ( i, , 63 )
+{
+const struct xstate_component *c = _cpu_policy.xstate.comp[i];
+
+if ( c->align )
+size = ROUNDUP(size, 64);
+
+size += c->size;
+}
+
+return size;
+}
+
 struct xcheck_state {
 uint64_t states;
 uint32_t uncomp_size;
@@ -683,6 +711,12 @@ static void __init check_new_xstate(struct xcheck_state 
*s, uint64_t new)
   s->states, , hw_size, s->comp_size);
 
 s->comp_size = hw_size;
+
+xen_size = xstate_compressed_size(s->states);
+
+if ( xen_size != hw_size )
+panic("XSTATE 0x%016"PRIx64", compressed hw size %#x != xen size 
%#x\n",
+  s->states, hw_size, xen_size);
 }
 else
 BUG_ON(hw_size); /* Compressed size reported, but no XSAVEC ? */
-- 
2.30.2

[PATCH 4/7] x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()

2024-05-23 Thread Andrew Cooper

We're soon going to need a compressed helper of the same form.

The size of the uncompressed image depends on the single element with the
largest offset + size.  Sadly this isn't always the element with the largest
index.

Name the per-xstate-component cpu_policy struture, for legibility of the logic
in xstate_uncompressed_size().  Cross-check with hardware during boot, and
remove hw_uncompressed_size().  This means that the migration paths don't need
to mess with XCR0 just to sanity check the buffer size.

The users of hw_uncompressed_size() in xstate_init() can (and indeed need) to
be replaced with CPUID instructions.  They run with feature_mask in XCR0, and
prior to setup_xstate_features() on the BSP.

No practical change.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v2:
 * Scan all features.  LWP/APX_F are out-of-order.
v3:
 * Rebase over boot time check.
 * Use the raw CPU policy.
---
 xen/arch/x86/domctl.c|  2 +-
 xen/arch/x86/hvm/hvm.c   |  2 +-
 xen/arch/x86/include/asm/xstate.h|  2 +-
 xen/arch/x86/xstate.c| 78 +---
 xen/include/xen/lib/x86/cpu-policy.h |  2 +-
 5 files changed, 51 insertions(+), 35 deletions(-)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index 9a72d57333e9..c2f2016ed45a 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -833,7 +833,7 @@ long arch_do_domctl(
 uint32_t offset = 0;
 
 #define PV_XSAVE_HDR_SIZE (2 * sizeof(uint64_t))
-#define PV_XSAVE_SIZE(xcr0) (PV_XSAVE_HDR_SIZE + xstate_ctxt_size(xcr0))
+#define PV_XSAVE_SIZE(xcr0) (PV_XSAVE_HDR_SIZE + 
xstate_uncompressed_size(xcr0))
 
 ret = -ESRCH;
 if ( (evc->vcpu >= d->max_vcpus) ||
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2c66fe0f7a16..b84f4d2387d1 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1190,7 +1190,7 @@ HVM_REGISTER_SAVE_RESTORE(CPU, hvm_save_cpu_ctxt, NULL, 
hvm_load_cpu_ctxt, 1,
 
 #define HVM_CPU_XSAVE_SIZE(xcr0) (offsetof(struct hvm_hw_cpu_xsave, \
save_area) + \
-  xstate_ctxt_size(xcr0))
+  xstate_uncompressed_size(xcr0))
 
 static int cf_check hvm_save_cpu_xsave_states(
 struct vcpu *v, hvm_domain_context_t *h)
diff --git a/xen/arch/x86/include/asm/xstate.h 
b/xen/arch/x86/include/asm/xstate.h
index c08c267884f0..f5115199d4f9 100644
--- a/xen/arch/x86/include/asm/xstate.h
+++ b/xen/arch/x86/include/asm/xstate.h
@@ -107,7 +107,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, 
unsigned int size);
 void xstate_free_save_area(struct vcpu *v);
 int xstate_alloc_save_area(struct vcpu *v);
 void xstate_init(struct cpuinfo_x86 *c);
-unsigned int xstate_ctxt_size(u64 xcr0);
+unsigned int xstate_uncompressed_size(uint64_t xcr0);
 
 static inline uint64_t xgetbv(unsigned int index)
 {
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 33a5a89719ef..1b3153600d9c 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -8,6 +8,8 @@
 #include 
 #include 
 #include 
+
+#include 
 #include 
 #include 
 #include 
@@ -183,7 +185,7 @@ void expand_xsave_states(const struct vcpu *v, void *dest, 
unsigned int size)
 /* Check there is state to serialise (i.e. at least an XSAVE_HDR) */
 BUG_ON(!v->arch.xcr0_accum);
 /* Check there is the correct room to decompress into. */
-BUG_ON(size != xstate_ctxt_size(v->arch.xcr0_accum));
+BUG_ON(size != xstate_uncompressed_size(v->arch.xcr0_accum));
 
 if ( !(xstate->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED) )
 {
@@ -245,7 +247,7 @@ void compress_xsave_states(struct vcpu *v, const void *src, 
unsigned int size)
 u64 xstate_bv, valid;
 
 BUG_ON(!v->arch.xcr0_accum);
-BUG_ON(size != xstate_ctxt_size(v->arch.xcr0_accum));
+BUG_ON(size != xstate_uncompressed_size(v->arch.xcr0_accum));
 ASSERT(!xsave_area_compressed(src));
 
 xstate_bv = ((const struct xsave_struct *)src)->xsave_hdr.xstate_bv;
@@ -553,32 +555,6 @@ void xstate_free_save_area(struct vcpu *v)
 v->arch.xsave_area = NULL;
 }
 
-static unsigned int hw_uncompressed_size(uint64_t xcr0)
-{
-u64 act_xcr0 = get_xcr0();
-unsigned int size;
-bool ok = set_xcr0(xcr0);
-
-ASSERT(ok);
-size = cpuid_count_ebx(XSTATE_CPUID, 0);
-ok = set_xcr0(act_xcr0);
-ASSERT(ok);
-
-return size;
-}
-
-/* Fastpath for common xstate size requests, avoiding reloads of xcr0. */
-unsigned int xstate_ctxt_size(u64 xcr0)
-{
-if ( xcr0 == xfeature_mask )
-return xsave_cntxt_size;
-
-if ( xcr0 == 0 ) /* TODO: clean up paths passing 0 in here. */
-return 0;
-
-return hw_uncompressed_size(xcr0);
-}
-
 static bool valid_xcr0(uint64_t xcr0)
 {
 /* FP must be unconditionally set. */
@@ -611,6 +587,40 @@ static bool valid_xcr0(uint64_t xcr0)
 return true;
 }
 
+unsigned int

[PATCH 2/7] x86/xstate: Cross-check dynamic XSTATE sizes at boot

2024-05-23 Thread Andrew Cooper

Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in for
every call.  This is expensive, being used for domain create/migrate, as well
as to service certain guest CPUID instructions.

Instead, arrange to check the sizes once at boot.  See the code comments for
details.  Right now, it just checks hardware against the algorithm
expectations.  Later patches will add further cross-checking.

Introduce the missing X86_XCR0_* and X86_XSS_* constants, and a couple of
missing CPUID bits.  This is to maximise coverage in the sanity check, even if
we don't expect to use/virtualise some of these features any time soon.  Leave
HDC and HWP alone for now.  We don't have CPUID bits from them stored nicely.

Only perform the cross-checks in debug builds.  It's only developers or new
hardware liable to trip these checks, and Xen at least tracks "maximum value
ever seen in xcr0" for the lifetime of the VM, which we don't want to be
tickling in the general case.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 

v3:
 * New

On Sapphire Rapids with the whole series inc diagnostics, we get this pattern:

  (XEN) *** check_new_xstate(, 0x0003)
  (XEN) *** check_new_xstate(, 0x0004)
  (XEN) *** check_new_xstate(, 0x00e0)
  (XEN) *** check_new_xstate(, 0x0200)
  (XEN) *** check_new_xstate(, 0x0006)
  (XEN) *** check_new_xstate(, 0x0100)
  (XEN) *** check_new_xstate(, 0x0400)
  (XEN) *** check_new_xstate(, 0x0800)
  (XEN) *** check_new_xstate(, 0x1000)
  (XEN) *** check_new_xstate(, 0x4000)
  (XEN) *** check_new_xstate(, 0x8000)

and on Genoa, this pattern:

  (XEN) *** check_new_xstate(, 0x0003)
  (XEN) *** check_new_xstate(, 0x0004)
  (XEN) *** check_new_xstate(, 0x00e0)
  (XEN) *** check_new_xstate(, 0x0200)
  (XEN) *** check_new_xstate(, 0x0800)
  (XEN) *** check_new_xstate(, 0x1000)
---
 xen/arch/x86/include/asm/x86-defns.h|  25 +++-
 xen/arch/x86/xstate.c   | 150 
 xen/include/public/arch-x86/cpufeatureset.h |   3 +
 3 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/include/asm/x86-defns.h 
b/xen/arch/x86/include/asm/x86-defns.h
index 48d7a3b7af45..d7602ab225c4 100644
--- a/xen/arch/x86/include/asm/x86-defns.h
+++ b/xen/arch/x86/include/asm/x86-defns.h
@@ -77,7 +77,7 @@
 #define X86_CR4_PKS0x0100 /* Protection Key Supervisor */
 
 /*
- * XSTATE component flags in XCR0
+ * XSTATE component flags in XCR0 | MSR_XSS
  */
 #define X86_XCR0_FP_POS   0
 #define X86_XCR0_FP   (1ULL << X86_XCR0_FP_POS)
@@ -95,11 +95,34 @@
 #define X86_XCR0_ZMM  (1ULL << X86_XCR0_ZMM_POS)
 #define X86_XCR0_HI_ZMM_POS   7
 #define X86_XCR0_HI_ZMM   (1ULL << X86_XCR0_HI_ZMM_POS)
+#define X86_XSS_PROC_TRACE(_AC(1, ULL) <<  8)
 #define X86_XCR0_PKRU_POS 9
 #define X86_XCR0_PKRU (1ULL << X86_XCR0_PKRU_POS)
+#define X86_XSS_PASID (_AC(1, ULL) << 10)
+#define X86_XSS_CET_U (_AC(1, ULL) << 11)
+#define X86_XSS_CET_S (_AC(1, ULL) << 12)
+#define X86_XSS_HDC   (_AC(1, ULL) << 13)
+#define X86_XSS_UINTR (_AC(1, ULL) << 14)
+#define X86_XSS_LBR   (_AC(1, ULL) << 15)
+#define X86_XSS_HWP   (_AC(1, ULL) << 16)
+#define X86_XCR0_TILE_CFG (_AC(1, ULL) << 17)
+#define X86_XCR0_TILE_DATA(_AC(1, ULL) << 18)
 #define X86_XCR0_LWP_POS  62
 #define X86_XCR0_LWP  (1ULL << X86_XCR0_LWP_POS)
 
+#define X86_XCR0_STATES \
+(X86_XCR0_FP | X86_XCR0_SSE | X86_XCR0_YMM | X86_XCR0_BNDREGS | \
+ X86_XCR0_BNDCSR | X86_XCR0_OPMASK | X86_XCR0_ZMM | \
+ X86_XCR0_HI_ZMM | X86_XCR0_PKRU | X86_XCR0_TILE_CFG |  \
+ X86_XCR0_TILE_DATA |   \
+ X86_XCR0_LWP)
+
+#define X86_XSS_STATES  \
+(X86_XSS_PROC_TRACE | X86_XSS_PASID | X86_XSS_CET_U |   \
+ X86_XSS_CET_S | X86_XSS_HDC | X86_XSS_UINTR | X86_XSS_LBR |\
+ X86_XSS_HWP |  \
+ 0)
+
 /*
  * Debug status flags in DR6.
  *
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 75788147966a..33a5a89719ef 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -604,9 +604,156 @@ static bool valid_xcr0(uint64_t xcr0)
 if ( !(xcr0 & X86_XCR0_BNDREGS) != !(xcr0 & X86_XCR0_BNDCSR) )
 return false;
 
+/* TILE_CFG and TILE_DATA must be the same. */
+if ( !(xcr0 & X86_XCR0_TILE_CFG) != !(xcr0 & X86_XCR0_TILE_DATA) )
+return false;
+
 return true;
 }
 
+struct xcheck_state {
+uint64_t states;
+uint32_t uncomp_size;
+uint32_t comp_size;
+};
+
+static void __init check_new_xstate(struct xcheck_state *s, uint64_t new)
+{
+uint32_t

[PATCH for-4.19 v3 0/7] x86/xstate: Fixes to size calculations

2024-05-23 Thread Andrew Cooper

This has grown somewhat from v2, but is better for it IMO.

The headline change is patch 2 performing all the cross-checking at boot time.
This turned into needing prepare the Raw CPU policy earlier on boot (to avoid
further-adding to scheme we're already looking to retire).

The end result has been tested across the entire XenServer hardware lab.  This
found several false assupmtion about how the dynamic sizes behave.

Patches 1 and 6 the main bugfixes from this series.  There's still lots more
work to do in order to get AMX and/or CET working, but this is at least a 4-yo
series finally off my plate.

Andrew Cooper (7):
  x86/xstate: Fix initialisation of XSS cache
  x86/xstate: Cross-check dynamic XSTATE sizes at boot
  x86/boot: Collect the Raw CPU Policy earlier on boot
  x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
  x86/cpu-policy: Simplify recalculate_xstate()
  x86/cpuid: Fix handling of XSAVE dynamic leaves
  x86/defns: Clean up X86_{XCR0,XSS}_* constants

 xen/arch/x86/cpu-policy.c   |  56 ++--
 xen/arch/x86/cpuid.c|  24 +-
 xen/arch/x86/domctl.c   |   2 +-
 xen/arch/x86/hvm/hvm.c  |   2 +-
 xen/arch/x86/i387.c |   2 +-
 xen/arch/x86/include/asm/x86-defns.h|  55 ++--
 xen/arch/x86/include/asm/xstate.h   |   8 +-
 xen/arch/x86/setup.c|   4 +-
 xen/arch/x86/xstate.c   | 286 +---
 xen/include/public/arch-x86/cpufeatureset.h |   3 +
 xen/include/xen/lib/x86/cpu-policy.h|   2 +-
 11 files changed, 322 insertions(+), 122 deletions(-)

-- 
2.30.2

Re: [PATCH 3/5] x86/pvh: Set phys_base when calling xen_prepare_pvh()

2024-05-23 Thread Jürgen Groß


On 10.04.24 21:48, Jason Andryuk wrote:

phys_base needs to be set for __pa() to work in xen_pvh_init() when
finding the hypercall page.  Set it before calling into
xen_prepare_pvh(), which calls xen_pvh_init().  Clear it afterward to
avoid __startup_64() adding to it and creating an incorrect value.

Signed-off-by: Jason Andryuk 
---
Instead of setting and clearing phys_base, a dedicated variable could be
used just for the hypercall page.  Having phys_base set properly may
avoid further issues if the use of phys_base or __pa() grows.
---
  arch/x86/platform/pvh/head.S | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index bb1e582e32b1..c08d08d8cc92 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -125,7 +125,17 @@ SYM_CODE_START_LOCAL(pvh_start_xen)
xor %edx, %edx
wrmsr
  
+	/* Calculate load offset from LOAD_PHYSICAL_ADDR and store in

+* phys_base.  __pa() needs phys_base set to calculate the
+* hypercall page in xen_pvh_init(). */


Please use the correct style for multi-line comments:

/*
 * comment lines
 * comment lines
 */


+   movq %rbp, %rbx
+   subq $LOAD_PHYSICAL_ADDR, %rbx
+   movq %rbx, phys_base(%rip)
call xen_prepare_pvh
+   /* Clear phys_base.  __startup_64 will *add* to its value,
+* so reset to 0. */


Comment style again.


+   xor  %rbx, %rbx
+   movq %rbx, phys_base(%rip)
  
  	/* startup_64 expects boot_params in %rsi. */

lea rva(pvh_bootparams)(%ebp), %rsi


With above fixed:

Reviewed-by: Juergen Gross 


Juergen

[xen-unstable-smoke test] 186104: regressions - FAIL

2024-05-23 Thread osstest service owner

flight 186104 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186104/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   6 xen-buildfail REGR. vs. 186064

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  d6a7fd83039af36c28bd0ae2174f12c3888ce993
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186064  2024-05-21 15:04:02 Z1 days
Testing same since   186104  2024-05-23 09:00:22 Z0 days1 attempts


People who touched revisions under test:
  Alejandro Vallejo 
  Bobby Eshleman 
  Jan Beulich 
  Julien Grall 
  Oleksandr Andrushchenko 
  Oleksii Kurochko 
  Roger Pau Monné 
  Stewart Hildebrand 
  Volodymyr Babchuk 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  fail
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 387 lines long.)

Re: [PATCH] xen-hvm: Avoid livelock while handling buffered ioreqs

2024-05-23 Thread Ross Lagerwall

On Tue, Apr 9, 2024 at 3:19 PM Ross Lagerwall  wrote:
>
> On Tue, Apr 9, 2024 at 11:20 AM Anthony PERARD  
> wrote:
> >
> > On Thu, Apr 04, 2024 at 03:08:33PM +0100, Ross Lagerwall wrote:
> > > diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> > > index 1627da739822..1116b3978938 100644
> > > --- a/hw/xen/xen-hvm-common.c
> > > +++ b/hw/xen/xen-hvm-common.c
> > > @@ -521,22 +521,30 @@ static bool handle_buffered_iopage(XenIOState 
> > > *state)
> > [...]
> > >
> > >  static void handle_buffered_io(void *opaque)
> > >  {
> > > +unsigned int handled;
> > >  XenIOState *state = opaque;
> > >
> > > -if (handle_buffered_iopage(state)) {
> > > +handled = handle_buffered_iopage(state);
> > > +if (handled >= IOREQ_BUFFER_SLOT_NUM) {
> > > +/* We handled a full page of ioreqs. Schedule a timer to continue
> > > + * processing while giving other stuff a chance to run.
> > > + */
> >
> > ./scripts/checkpatch.pl report a style issue here:
> > WARNING: Block comments use a leading /* on a separate line
> >
> > I can try to remember to fix that on commit.
>
> I copied the comment style from a few lines above but I guess it was
> wrong.
>
> >
> > >  timer_mod(state->buffered_io_timer,
> > > -BUFFER_IO_MAX_DELAY + 
> > > qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
> > > -} else {
> > > +qemu_clock_get_ms(QEMU_CLOCK_REALTIME));
> > > +} else if (handled == 0) {
> >
> > Just curious, why did you check for `handled == 0` here instead of
> > `handled != 0`? That would have avoided to invert the last 2 cases, and
> > the patch would just have introduce a new case without changing the
> > order of the existing ones. But not that important I guess.
> >
>
> In general I try to use conditionals with the least amount of negation
> since I think it is easier to read. I can change it if you would prefer?

It looks like this hasn't been committed anywhere. Were you expecting
an updated version with the style issue fixed or has it fallen through
the cracks?

Thanks,
Ross

Re: [PATCH v3 2/2] tools/xg: Clean up xend-style overrides for CPU policies

2024-05-23 Thread Roger Pau Monné

On Thu, May 23, 2024 at 10:41:30AM +0100, Alejandro Vallejo wrote:
> Factor out policy getters/setters from both (CPUID and MSR) policy override
> functions. Additionally, use host policy rather than featureset when
> preparing the cur policy, saving one hypercall and several lines of
> boilerplate.
> 
> No functional change intended.
> 
> Signed-off-by: Alejandro Vallejo 
> ---
> v3:
>   * Restored overscoped loop indices
>   * Split long line in conditional
> ---
>  tools/libs/guest/xg_cpuid_x86.c | 438 ++--
>  1 file changed, 131 insertions(+), 307 deletions(-)
> 
> diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
> index 4f4b86b59470..1e631fd46d2f 100644
> --- a/tools/libs/guest/xg_cpuid_x86.c
> +++ b/tools/libs/guest/xg_cpuid_x86.c
> @@ -36,6 +36,34 @@ enum {
>  #define bitmaskof(idx)  (1u << ((idx) & 31))
>  #define featureword_of(idx) ((idx) >> 5)
>  
> +static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy)
> +{
> +uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
> +int rc;
> +
> +rc = x86_cpuid_copy_from_buffer(>policy, policy->leaves,
> +policy->nr_leaves, _leaf, 
> _subleaf);
> +if ( rc )
> +{
> +if ( err_leaf != -1 )
> +ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) 
> (%d = %s)",
> +  err_leaf, err_subleaf, -rc, strerror(-rc));
> +return rc;
> +}
> +
> +rc = x86_msr_copy_from_buffer(>policy, policy->msrs,
> +  policy->nr_msrs, _msr);
> +if ( rc )
> +{
> +if ( err_msr != -1 )
> +ERROR("Failed to deserialise MSR (err MSR %#x) (%d = %s)",
> +  err_msr, -rc, strerror(-rc));
> +return rc;
> +}
> +
> +return 0;
> +}
> +
>  int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps)
>  {
>  struct xen_sysctl sysctl = {};
> @@ -260,102 +288,37 @@ static int compare_leaves(const void *l, const void *r)
>  return 0;
>  }
>  
> -static xen_cpuid_leaf_t *find_leaf(
> -xen_cpuid_leaf_t *leaves, unsigned int nr_leaves,
> -const struct xc_xend_cpuid *xend)
> +static xen_cpuid_leaf_t *find_leaf(xc_cpu_policy_t *p,
> +   const struct xc_xend_cpuid *xend)
>  {
>  const xen_cpuid_leaf_t key = { xend->leaf, xend->subleaf };
>  
> -return bsearch(, leaves, nr_leaves, sizeof(*leaves), compare_leaves);
> +return bsearch(, p->leaves, ARRAY_SIZE(p->leaves),

Don't you need to use p->nr_leaves here, as otherwise we could check
against possibly uninitialized leaves (or leaves with stale data)?

> +   sizeof(*p->leaves), compare_leaves);
>  }
>  
> -static int xc_cpuid_xend_policy(
> -xc_interface *xch, uint32_t domid, const struct xc_xend_cpuid *xend)
> +static int xc_cpuid_xend_policy(xc_interface *xch, uint32_t domid,
> +const struct xc_xend_cpuid *xend,
> +xc_cpu_policy_t *host,
> +xc_cpu_policy_t *def,
> +xc_cpu_policy_t *cur)
>  {
> -int rc;
> -bool hvm;
> -xc_domaininfo_t di;
> -unsigned int nr_leaves, nr_msrs;
> -uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
> -/*
> - * Three full policies.  The host, default for the domain type,
> - * and domain current.
> - */
> -xen_cpuid_leaf_t *host = NULL, *def = NULL, *cur = NULL;
> -unsigned int nr_host, nr_def, nr_cur;
> -
> -if ( (rc = xc_domain_getinfo_single(xch, domid, )) < 0 )
> -{
> -PERROR("Failed to obtain d%d info", domid);
> -rc = -errno;
> -goto fail;
> -}
> -hvm = di.flags & XEN_DOMINF_hvm_guest;
> -
> -rc = xc_cpu_policy_get_size(xch, _leaves, _msrs);
> -if ( rc )
> -{
> -PERROR("Failed to obtain policy info size");
> -rc = -errno;
> -goto fail;
> -}
> -
> -rc = -ENOMEM;
> -if ( (host = calloc(nr_leaves, sizeof(*host))) == NULL ||
> - (def  = calloc(nr_leaves, sizeof(*def)))  == NULL ||
> - (cur  = calloc(nr_leaves, sizeof(*cur)))  == NULL )
> -{
> -ERROR("Unable to allocate memory for %u CPUID leaves", nr_leaves);
> -goto fail;
> -}
> -
> -/* Get the domain's current policy. */
> -nr_msrs = 0;
> -nr_cur = nr_leaves;
> -rc = get_domain_cpu_policy(xch, domid, _cur, cur, _msrs, NULL);
> -if ( rc )
> -{
> -PERROR("Failed to obtain d%d current policy", domid);
> -rc = -errno;
> -goto fail;
> -}
> +if ( !xend )
> +return 0;
>  
> -/* Get the domain type's default policy. */
> -nr_msrs = 0;
> -nr_def = nr_leaves;
> -rc = get_system_cpu_policy(xch, hvm ? XEN_SYSCTL_cpu_policy_hvm_default
> -: XEN_SYSCTL_cpu_policy_pv_default,
> -

Re: [XEN PATCH v2 06/15] x86/p2m: guard altp2m code with CONFIG_ALTP2M option

2024-05-23 Thread Sergiy Kibrik


16.05.24 14:01, Jan Beulich:

On 15.05.2024 11:10, Sergiy Kibrik wrote:

@@ -38,7 +38,10 @@ static inline bool altp2m_active(const struct domain *d)
  }
  
  /* Only declaration is needed. DCE will optimise it out when linking. */

+void altp2m_vcpu_initialise(struct vcpu *v);
+void altp2m_vcpu_destroy(struct vcpu *v);
  uint16_t altp2m_vcpu_idx(const struct vcpu *v);
+int altp2m_vcpu_enable_ve(struct vcpu *v, gfn_t gfn);
  void altp2m_vcpu_disable_ve(struct vcpu *v);


These additions look unrelated, as long as the description says nothing in
this regard.


agree, I'll update description on why these declarations are added




--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -670,7 +670,7 @@ static inline bool hvm_hap_supported(void)
  /* returns true if hardware supports alternate p2m's */
  static inline bool hvm_altp2m_supported(void)
  {
-return hvm_funcs.caps.altp2m;
+return IS_ENABLED(CONFIG_ALTP2M) && hvm_funcs.caps.altp2m;


Which in turn raises the question whether the altp2m struct field shouldn't
become conditional upon CONFIG_ALTP2M too (or rather: instead, as the change
here then would need to be done differently). Yet maybe that would entail
further changes elsewhere, so may well better be left for later.



 but hvm_funcs.caps.altp2m is only a capability bit -- is it worth to 
become conditional?



--- a/xen/arch/x86/mm/Makefile
+++ b/xen/arch/x86/mm/Makefile
@@ -1,7 +1,7 @@
  obj-y += shadow/
  obj-$(CONFIG_HVM) += hap/
  
-obj-$(CONFIG_HVM) += altp2m.o

+obj-$(CONFIG_ALTP2M) += altp2m.o


This change I think wants to move to patch 5.



If this moves to patch 5 then HVM=y && ALTP2M=n configuration 
combination will break the build in between patch 5 and 6, so I've 
decided to put it together with fixes of these build failures in patch 6.

Maybe I can merge patch 5 & 6 together then ?

  -Sergiy

Re: [PATCH v6 7/8] xen: mapcache: Add support for grant mappings

2024-05-23 Thread Edgar E. Iglesias

On Thu, May 23, 2024 at 9:47 AM Manos Pitsidianakis <
manos.pitsidiana...@linaro.org> wrote:

> On Thu, 16 May 2024 18:48, "Edgar E. Iglesias" 
> wrote:
> >From: "Edgar E. Iglesias" 
> >
> >Add a second mapcache for grant mappings. The mapcache for
> >grants needs to work with XC_PAGE_SIZE granularity since
> >we can't map larger ranges than what has been granted to us.
> >
> >Like with foreign mappings (xen_memory), machines using grants
> >are expected to initialize the xen_grants MR and map it
> >into their address-map accordingly.
> >
> >Signed-off-by: Edgar E. Iglesias 
> >Reviewed-by: Stefano Stabellini 
> >---
> > hw/xen/xen-hvm-common.c |  12 ++-
> > hw/xen/xen-mapcache.c   | 163 ++--
> > include/hw/xen/xen-hvm-common.h |   3 +
> > include/sysemu/xen.h|   7 ++
> > 4 files changed, 152 insertions(+), 33 deletions(-)
> >
> >diff --git a/hw/xen/xen-hvm-common.c b/hw/xen/xen-hvm-common.c
> >index a0a0252da0..b8ace1c368 100644
> >--- a/hw/xen/xen-hvm-common.c
> >+++ b/hw/xen/xen-hvm-common.c
> >@@ -10,12 +10,18 @@
> > #include "hw/boards.h"
> > #include "hw/xen/arch_hvm.h"
> >
> >-MemoryRegion xen_memory;
> >+MemoryRegion xen_memory, xen_grants;
> >
> >-/* Check for xen memory.  */
> >+/* Check for any kind of xen memory, foreign mappings or grants.  */
> > bool xen_mr_is_memory(MemoryRegion *mr)
> > {
> >-return mr == _memory;
> >+return mr == _memory || mr == _grants;
> >+}
> >+
> >+/* Check specifically for grants.  */
> >+bool xen_mr_is_grants(MemoryRegion *mr)
> >+{
> >+return mr == _grants;
> > }
> >
> > void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion
> *mr,
> >diff --git a/hw/xen/xen-mapcache.c b/hw/xen/xen-mapcache.c
> >index a07c47b0b1..1cbc2aeaa9 100644
> >--- a/hw/xen/xen-mapcache.c
> >+++ b/hw/xen/xen-mapcache.c
> >@@ -14,6 +14,7 @@
> >
> > #include 
> >
> >+#include "hw/xen/xen-hvm-common.h"
> > #include "hw/xen/xen_native.h"
> > #include "qemu/bitmap.h"
> >
> >@@ -21,6 +22,8 @@
> > #include "sysemu/xen-mapcache.h"
> > #include "trace.h"
> >
> >+#include 
> >+#include 
> >
> > #if HOST_LONG_BITS == 32
> > #  define MCACHE_MAX_SIZE (1UL<<31) /* 2GB Cap */
> >@@ -41,6 +44,7 @@ typedef struct MapCacheEntry {
> > unsigned long *valid_mapping;
> > uint32_t lock;
> > #define XEN_MAPCACHE_ENTRY_DUMMY (1 << 0)
> >+#define XEN_MAPCACHE_ENTRY_GRANT (1 << 1)
>
> Might we get more entry kinds in the future? (for example foreign maps).
> Maybe this could be an enum.
>
>
Perhaps. Foreign mappings are already supported, this flag separates
ordinary foreign mappings from grant foreign mappings.
IMO, since this is not an external interface it's probably better to change
it once we have a concrete use-case at hand.



> > uint8_t flags;
> > hwaddr size;
> > struct MapCacheEntry *next;
> >@@ -71,6 +75,8 @@ typedef struct MapCache {
> > } MapCache;
> >
> > static MapCache *mapcache;
> >+static MapCache *mapcache_grants;
> >+static xengnttab_handle *xen_region_gnttabdev;
> >
> > static inline void mapcache_lock(MapCache *mc)
> > {
> >@@ -131,6 +137,12 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f,
> void *opaque)
> > unsigned long max_mcache_size;
> > unsigned int bucket_shift;
> >
> >+xen_region_gnttabdev = xengnttab_open(NULL, 0);
> >+if (xen_region_gnttabdev == NULL) {
> >+error_report("mapcache: Failed to open gnttab device");
> >+exit(EXIT_FAILURE);
> >+}
> >+
> > if (HOST_LONG_BITS == 32) {
> > bucket_shift = 16;
> > } else {
> >@@ -159,6 +171,15 @@ void xen_map_cache_init(phys_offset_to_gaddr_t f,
> void *opaque)
> > mapcache = xen_map_cache_init_single(f, opaque,
> >  bucket_shift,
> >  max_mcache_size);
> >+
> >+/*
> >+ * Grant mappings must use XC_PAGE_SIZE granularity since we can't
> >+ * map anything beyond the number of pages granted to us.
> >+ */
> >+mapcache_grants = xen_map_cache_init_single(f, opaque,
> >+XC_PAGE_SHIFT,
> >+max_mcache_size);
> >+
> > setrlimit(RLIMIT_AS, _as);
> > }
> >
> >@@ -168,17 +189,24 @@ static void xen_remap_bucket(MapCache *mc,
> >  hwaddr size,
> >  hwaddr address_index,
> >  bool dummy,
> >+ bool grant,
> >+ bool is_write,
> >  ram_addr_t ram_offset)
> > {
> > uint8_t *vaddr_base;
> >-xen_pfn_t *pfns;
> >+uint32_t *refs = NULL;
> >+xen_pfn_t *pfns = NULL;
> > int *err;
>
> You should use g_autofree to perform automatic cleanup on function exit
> instead of manually freeing, since the allocations should only live
> within the function call.
>
>
Sounds good, I'll do that in the next version.



> > unsigned

Re: [PATCH v3 1/2] tools/xg: Streamline cpu policy serialise/deserialise calls

2024-05-23 Thread Roger Pau Monné

On Thu, May 23, 2024 at 10:41:29AM +0100, Alejandro Vallejo wrote:
> The idea is to use xc_cpu_policy_t as a single object containing both the
> serialised and deserialised forms of the policy. Note that we need lengths
> for the arrays, as the serialised policies may be shorter than the array
> capacities.
> 
> * Add the serialised lengths to the struct so we can distinguish
>   between length and capacity of the serialisation buffers.
> * Remove explicit buffer+lengths in serialise/deserialise calls
>   and use the internal buffer inside xc_cpu_policy_t instead.
> * Refactor everything to use the new serialisation functions.
> * Remove redundant serialization calls and avoid allocating dynamic
>   memory aside from the policy objects in xen-cpuid. Also minor cleanup
>   in the policy print call sites.
> 
> No functional change intended.
> 
> Signed-off-by: Alejandro Vallejo 

Acked-by: Roger Pau Monné 

Just two comments.

> ---
> v3:
>   * Better context scoping in xg_sr_common_x86.
> * Can't be const because write_record() takes non-const.
>   * Adjusted line length of xen-cpuid's print_policy.
>   * Adjusted error messages in xen-cpuid's print_policy.
>   * Reverted removal of overscoped loop indices.
> ---
>  tools/include/xenguest.h|  8 ++-
>  tools/libs/guest/xg_cpuid_x86.c | 98 -
>  tools/libs/guest/xg_private.h   |  2 +
>  tools/libs/guest/xg_sr_common_x86.c | 56 ++---
>  tools/misc/xen-cpuid.c  | 41 
>  5 files changed, 106 insertions(+), 99 deletions(-)
> 
> diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
> index e01f494b772a..563811cd8dde 100644
> --- a/tools/include/xenguest.h
> +++ b/tools/include/xenguest.h
> @@ -799,14 +799,16 @@ int xc_cpu_policy_set_domain(xc_interface *xch, 
> uint32_t domid,
>   xc_cpu_policy_t *policy);
>  
>  /* Manipulate a policy via architectural representations. */
> -int xc_cpu_policy_serialise(xc_interface *xch, const xc_cpu_policy_t *policy,
> -xen_cpuid_leaf_t *leaves, uint32_t *nr_leaves,
> -xen_msr_entry_t *msrs, uint32_t *nr_msrs);
> +int xc_cpu_policy_serialise(xc_interface *xch, xc_cpu_policy_t *policy);
>  int xc_cpu_policy_update_cpuid(xc_interface *xch, xc_cpu_policy_t *policy,
> const xen_cpuid_leaf_t *leaves,
> uint32_t nr);
>  int xc_cpu_policy_update_msrs(xc_interface *xch, xc_cpu_policy_t *policy,
>const xen_msr_entry_t *msrs, uint32_t nr);
> +int xc_cpu_policy_get_leaves(xc_interface *xch, const xc_cpu_policy_t 
> *policy,
> + const xen_cpuid_leaf_t **leaves, uint32_t *nr);
> +int xc_cpu_policy_get_msrs(xc_interface *xch, const xc_cpu_policy_t *policy,
> +   const xen_msr_entry_t **msrs, uint32_t *nr);

Maybe it would be helpful to have a comment clarifying that the return
of xc_cpu_policy_get_{leaves,msrs}() is a reference to the content of
the policy, not a copy of it (and hence is tied to the lifetime of
policy, and doesn't require explicit freeing).

>  
>  /* Compatibility calculations. */
>  bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
> diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
> index 4453178100ad..4f4b86b59470 100644
> --- a/tools/libs/guest/xg_cpuid_x86.c
> +++ b/tools/libs/guest/xg_cpuid_x86.c
> @@ -834,14 +834,13 @@ void xc_cpu_policy_destroy(xc_cpu_policy_t *policy)
>  }
>  }
>  
> -static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy,
> -  unsigned int nr_leaves, unsigned int 
> nr_entries)
> +static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy)
>  {
>  uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
>  int rc;
>  
>  rc = x86_cpuid_copy_from_buffer(>policy, policy->leaves,
> -nr_leaves, _leaf, _subleaf);
> +policy->nr_leaves, _leaf, 
> _subleaf);
>  if ( rc )
>  {
>  if ( err_leaf != -1 )
> @@ -851,7 +850,7 @@ static int deserialize_policy(xc_interface *xch, 
> xc_cpu_policy_t *policy,
>  }
>  
>  rc = x86_msr_copy_from_buffer(>policy, policy->msrs,
> -  nr_entries, _msr);
> +  policy->nr_msrs, _msr);
>  if ( rc )
>  {
>  if ( err_msr != -1 )
> @@ -878,7 +877,10 @@ int xc_cpu_policy_get_system(xc_interface *xch, unsigned 
> int policy_idx,
>  return rc;
>  }
>  
> -rc = deserialize_policy(xch, policy, nr_leaves, nr_msrs);
> +policy->nr_leaves = nr_leaves;
> +policy->nr_msrs = nr_msrs;
> +
> +rc = deserialize_policy(xch, policy);
>  if ( rc )
>  {
>  errno = -rc;
> @@ -903,7 +905,10 @@ int xc_cpu_policy_get_domain(xc_interface

Re: [XEN PATCH] x86/iommu: Conditionally compile platform-specific union entries

2024-05-23 Thread Roger Pau Monné

On Thu, May 23, 2024 at 09:19:53AM +, Teddy Astie wrote:
> If some platform driver isn't compiled in, remove its related union
> entries as they are not used.
> 
> Signed-off-by Teddy Astie 
> ---
>  xen/arch/x86/include/asm/iommu.h | 4 
>  xen/arch/x86/include/asm/pci.h   | 4 
>  2 files changed, 8 insertions(+)
> 
> diff --git a/xen/arch/x86/include/asm/iommu.h 
> b/xen/arch/x86/include/asm/iommu.h
> index 8dc464fbd3..99180940c4 100644
> --- a/xen/arch/x86/include/asm/iommu.h
> +++ b/xen/arch/x86/include/asm/iommu.h
> @@ -42,17 +42,21 @@ struct arch_iommu
>  struct list_head identity_maps;
>  
>  union {
> +#ifdef CONFIG_INTEL_IOMMU
>  /* Intel VT-d */
>  struct {
>  uint64_t pgd_maddr; /* io page directory machine address */
>  unsigned int agaw; /* adjusted guest address width, 0 is level 2 
> 30-bit */
>  unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the 
> domain uses */
>  } vtd;
> +#endif
> +#ifdef CONFIG_AMD_IOMMU
>  /* AMD IOMMU */
>  struct {
>  unsigned int paging_mode;
>  struct page_info *root_table;
>  } amd;
> +#endif
>  };
>  };
>  
> diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h
> index fd5480d67d..842710f0dc 100644
> --- a/xen/arch/x86/include/asm/pci.h
> +++ b/xen/arch/x86/include/asm/pci.h
> @@ -22,12 +22,16 @@ struct arch_pci_dev {
>   */
>  union {
>  /* Subset of struct arch_iommu's fields, to be used in dom_io. */
> +#ifdef CONFIG_INTEL_IOMMU
>  struct {
>  uint64_t pgd_maddr;
>  } vtd;
> +#endif
> +#ifdef CONFIG_AMD_IOMMU
>  struct {
>  struct page_info *root_table;
>  } amd;
> +#endif
>  };

The #ifdef and #endif processor directives shouldn't be indented.

Would you mind adding /* CONFIG_{AMD,INTEL}_IOMMU */ comments in the
#endif directives?

I wonder if we could move the definitions of those structures to the
vendor specific headers, but that's more convoluted, and would require
including the iommu headers in pci.h

Thanks, Roger.

[PATCH v3 2/2] tools/xg: Clean up xend-style overrides for CPU policies

2024-05-23 Thread Alejandro Vallejo

Factor out policy getters/setters from both (CPUID and MSR) policy override
functions. Additionally, use host policy rather than featureset when
preparing the cur policy, saving one hypercall and several lines of
boilerplate.

No functional change intended.

Signed-off-by: Alejandro Vallejo 
---
v3:
  * Restored overscoped loop indices
  * Split long line in conditional
---
 tools/libs/guest/xg_cpuid_x86.c | 438 ++--
 1 file changed, 131 insertions(+), 307 deletions(-)

diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4f4b86b59470..1e631fd46d2f 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -36,6 +36,34 @@ enum {
 #define bitmaskof(idx)  (1u << ((idx) & 31))
 #define featureword_of(idx) ((idx) >> 5)
 
+static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy)
+{
+uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
+int rc;
+
+rc = x86_cpuid_copy_from_buffer(>policy, policy->leaves,
+policy->nr_leaves, _leaf, 
_subleaf);
+if ( rc )
+{
+if ( err_leaf != -1 )
+ERROR("Failed to deserialise CPUID (err leaf %#x, subleaf %#x) (%d 
= %s)",
+  err_leaf, err_subleaf, -rc, strerror(-rc));
+return rc;
+}
+
+rc = x86_msr_copy_from_buffer(>policy, policy->msrs,
+  policy->nr_msrs, _msr);
+if ( rc )
+{
+if ( err_msr != -1 )
+ERROR("Failed to deserialise MSR (err MSR %#x) (%d = %s)",
+  err_msr, -rc, strerror(-rc));
+return rc;
+}
+
+return 0;
+}
+
 int xc_get_cpu_levelling_caps(xc_interface *xch, uint32_t *caps)
 {
 struct xen_sysctl sysctl = {};
@@ -260,102 +288,37 @@ static int compare_leaves(const void *l, const void *r)
 return 0;
 }
 
-static xen_cpuid_leaf_t *find_leaf(
-xen_cpuid_leaf_t *leaves, unsigned int nr_leaves,
-const struct xc_xend_cpuid *xend)
+static xen_cpuid_leaf_t *find_leaf(xc_cpu_policy_t *p,
+   const struct xc_xend_cpuid *xend)
 {
 const xen_cpuid_leaf_t key = { xend->leaf, xend->subleaf };
 
-return bsearch(, leaves, nr_leaves, sizeof(*leaves), compare_leaves);
+return bsearch(, p->leaves, ARRAY_SIZE(p->leaves),
+   sizeof(*p->leaves), compare_leaves);
 }
 
-static int xc_cpuid_xend_policy(
-xc_interface *xch, uint32_t domid, const struct xc_xend_cpuid *xend)
+static int xc_cpuid_xend_policy(xc_interface *xch, uint32_t domid,
+const struct xc_xend_cpuid *xend,
+xc_cpu_policy_t *host,
+xc_cpu_policy_t *def,
+xc_cpu_policy_t *cur)
 {
-int rc;
-bool hvm;
-xc_domaininfo_t di;
-unsigned int nr_leaves, nr_msrs;
-uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
-/*
- * Three full policies.  The host, default for the domain type,
- * and domain current.
- */
-xen_cpuid_leaf_t *host = NULL, *def = NULL, *cur = NULL;
-unsigned int nr_host, nr_def, nr_cur;
-
-if ( (rc = xc_domain_getinfo_single(xch, domid, )) < 0 )
-{
-PERROR("Failed to obtain d%d info", domid);
-rc = -errno;
-goto fail;
-}
-hvm = di.flags & XEN_DOMINF_hvm_guest;
-
-rc = xc_cpu_policy_get_size(xch, _leaves, _msrs);
-if ( rc )
-{
-PERROR("Failed to obtain policy info size");
-rc = -errno;
-goto fail;
-}
-
-rc = -ENOMEM;
-if ( (host = calloc(nr_leaves, sizeof(*host))) == NULL ||
- (def  = calloc(nr_leaves, sizeof(*def)))  == NULL ||
- (cur  = calloc(nr_leaves, sizeof(*cur)))  == NULL )
-{
-ERROR("Unable to allocate memory for %u CPUID leaves", nr_leaves);
-goto fail;
-}
-
-/* Get the domain's current policy. */
-nr_msrs = 0;
-nr_cur = nr_leaves;
-rc = get_domain_cpu_policy(xch, domid, _cur, cur, _msrs, NULL);
-if ( rc )
-{
-PERROR("Failed to obtain d%d current policy", domid);
-rc = -errno;
-goto fail;
-}
+if ( !xend )
+return 0;
 
-/* Get the domain type's default policy. */
-nr_msrs = 0;
-nr_def = nr_leaves;
-rc = get_system_cpu_policy(xch, hvm ? XEN_SYSCTL_cpu_policy_hvm_default
-: XEN_SYSCTL_cpu_policy_pv_default,
-   _def, def, _msrs, NULL);
-if ( rc )
-{
-PERROR("Failed to obtain %s def policy", hvm ? "hvm" : "pv");
-rc = -errno;
-goto fail;
-}
+if ( !host || !def || !cur )
+return -EINVAL;
 
-/* Get the host policy. */
-nr_msrs = 0;
-nr_host = nr_leaves;
-rc = get_system_cpu_policy(xch, XEN_SYSCTL_cpu_policy_host,
-   _host, host, _msrs, NULL);
-if ( rc )
-{
-PERROR("Failed to

[PATCH v3 1/2] tools/xg: Streamline cpu policy serialise/deserialise calls

2024-05-23 Thread Alejandro Vallejo

The idea is to use xc_cpu_policy_t as a single object containing both the
serialised and deserialised forms of the policy. Note that we need lengths
for the arrays, as the serialised policies may be shorter than the array
capacities.

* Add the serialised lengths to the struct so we can distinguish
  between length and capacity of the serialisation buffers.
* Remove explicit buffer+lengths in serialise/deserialise calls
  and use the internal buffer inside xc_cpu_policy_t instead.
* Refactor everything to use the new serialisation functions.
* Remove redundant serialization calls and avoid allocating dynamic
  memory aside from the policy objects in xen-cpuid. Also minor cleanup
  in the policy print call sites.

No functional change intended.

Signed-off-by: Alejandro Vallejo 
---
v3:
  * Better context scoping in xg_sr_common_x86.
* Can't be const because write_record() takes non-const.
  * Adjusted line length of xen-cpuid's print_policy.
  * Adjusted error messages in xen-cpuid's print_policy.
  * Reverted removal of overscoped loop indices.
---
 tools/include/xenguest.h|  8 ++-
 tools/libs/guest/xg_cpuid_x86.c | 98 -
 tools/libs/guest/xg_private.h   |  2 +
 tools/libs/guest/xg_sr_common_x86.c | 56 ++---
 tools/misc/xen-cpuid.c  | 41 
 5 files changed, 106 insertions(+), 99 deletions(-)

diff --git a/tools/include/xenguest.h b/tools/include/xenguest.h
index e01f494b772a..563811cd8dde 100644
--- a/tools/include/xenguest.h
+++ b/tools/include/xenguest.h
@@ -799,14 +799,16 @@ int xc_cpu_policy_set_domain(xc_interface *xch, uint32_t 
domid,
  xc_cpu_policy_t *policy);
 
 /* Manipulate a policy via architectural representations. */
-int xc_cpu_policy_serialise(xc_interface *xch, const xc_cpu_policy_t *policy,
-xen_cpuid_leaf_t *leaves, uint32_t *nr_leaves,
-xen_msr_entry_t *msrs, uint32_t *nr_msrs);
+int xc_cpu_policy_serialise(xc_interface *xch, xc_cpu_policy_t *policy);
 int xc_cpu_policy_update_cpuid(xc_interface *xch, xc_cpu_policy_t *policy,
const xen_cpuid_leaf_t *leaves,
uint32_t nr);
 int xc_cpu_policy_update_msrs(xc_interface *xch, xc_cpu_policy_t *policy,
   const xen_msr_entry_t *msrs, uint32_t nr);
+int xc_cpu_policy_get_leaves(xc_interface *xch, const xc_cpu_policy_t *policy,
+ const xen_cpuid_leaf_t **leaves, uint32_t *nr);
+int xc_cpu_policy_get_msrs(xc_interface *xch, const xc_cpu_policy_t *policy,
+   const xen_msr_entry_t **msrs, uint32_t *nr);
 
 /* Compatibility calculations. */
 bool xc_cpu_policy_is_compatible(xc_interface *xch, xc_cpu_policy_t *host,
diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 4453178100ad..4f4b86b59470 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -834,14 +834,13 @@ void xc_cpu_policy_destroy(xc_cpu_policy_t *policy)
 }
 }
 
-static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy,
-  unsigned int nr_leaves, unsigned int nr_entries)
+static int deserialize_policy(xc_interface *xch, xc_cpu_policy_t *policy)
 {
 uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
 int rc;
 
 rc = x86_cpuid_copy_from_buffer(>policy, policy->leaves,
-nr_leaves, _leaf, _subleaf);
+policy->nr_leaves, _leaf, 
_subleaf);
 if ( rc )
 {
 if ( err_leaf != -1 )
@@ -851,7 +850,7 @@ static int deserialize_policy(xc_interface *xch, 
xc_cpu_policy_t *policy,
 }
 
 rc = x86_msr_copy_from_buffer(>policy, policy->msrs,
-  nr_entries, _msr);
+  policy->nr_msrs, _msr);
 if ( rc )
 {
 if ( err_msr != -1 )
@@ -878,7 +877,10 @@ int xc_cpu_policy_get_system(xc_interface *xch, unsigned 
int policy_idx,
 return rc;
 }
 
-rc = deserialize_policy(xch, policy, nr_leaves, nr_msrs);
+policy->nr_leaves = nr_leaves;
+policy->nr_msrs = nr_msrs;
+
+rc = deserialize_policy(xch, policy);
 if ( rc )
 {
 errno = -rc;
@@ -903,7 +905,10 @@ int xc_cpu_policy_get_domain(xc_interface *xch, uint32_t 
domid,
 return rc;
 }
 
-rc = deserialize_policy(xch, policy, nr_leaves, nr_msrs);
+policy->nr_leaves = nr_leaves;
+policy->nr_msrs = nr_msrs;
+
+rc = deserialize_policy(xch, policy);
 if ( rc )
 {
 errno = -rc;
@@ -917,17 +922,14 @@ int xc_cpu_policy_set_domain(xc_interface *xch, uint32_t 
domid,
  xc_cpu_policy_t *policy)
 {
 uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
-unsigned int nr_leaves = ARRAY_SIZE(policy->leaves);
-unsigned int nr_msrs =

[PATCH v3 0/2] Clean the policy manipulation path in domain creation

2024-05-23 Thread Alejandro Vallejo

v2 -> v3:
  * Style adjustments
  * Revert of loop index scope refactors

v1 -> v2:
  * Removed xc_cpu_policy from xenguest.h (dropped v1/patch1)
  * Added accessors for xc_cpu_policy so the serialised form can be extracted.
  * Modified xen-cpuid to use accessors.

 Original cover letter 

In the context of creating a domain, we currently issue a lot of hypercalls
redundantly while populating its CPU policy; likely a side effect of
organic growth more than anything else.

However, the worst part is not the overhead (this is a glacially cold
path), but the insane amounts of boilerplate that make it really hard to
pick apart what's going on. One major contributor to this situation is the
fact that what's effectively "setup" and "teardown" phases in policy
manipulation are not factored out from the functions that perform said
manipulations, leading to the same getters and setter being invoked many
times, when once each would do.

Another big contributor is the code being unaware of when a policy is
serialised and when it's not.

This patch attempts to alleviate this situation, yielding over 200 LoC
reduction.

Patch 1: Mechanical change. Makes xc_cpu_policy_t public so it's usable
 from clients of libxc/libxg.
Patch 2: Changes the (de)serialization wrappers in xenguest so they always
 serialise to/from the internal buffers of xc_cpu_policy_t. The
 struct is suitably expanded to hold extra information required.
Patch 3: Performs the refactor of the policy manipulation code so that it
 follows a strict: PULL_POLICIES, MUTATE_POLICY (n times), PUSH_POLICY.


Subject: [PATCH v3 0/2] *** SUBJECT HERE ***

*** BLURB HERE ***

Alejandro Vallejo (2):
  tools/xg: Streamline cpu policy serialise/deserialise calls
  tools/xg: Clean up xend-style overrides for CPU policies

 tools/include/xenguest.h|   8 +-
 tools/libs/guest/xg_cpuid_x86.c | 530 ++--
 tools/libs/guest/xg_private.h   |   2 +
 tools/libs/guest/xg_sr_common_x86.c |  56 ++-
 tools/misc/xen-cpuid.c  |  41 +--
 5 files changed, 234 insertions(+), 403 deletions(-)

-- 
2.34.1

[xen-unstable test] 186078: tolerable FAIL

2024-05-23 Thread osstest service owner

flight 186078 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/186078/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-qcow2 8 xen-boot fail in 186066 pass in 186078
 test-armhf-armhf-xl-multivcpu  8 xen-boot  fail pass in 186066

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-multivcpu 15 migrate-support-check fail in 186066 never 
pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-check fail in 186066 
never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 186066
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 186066
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 186066
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 186066
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 186066
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 186066
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-raw  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-vhd 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-qcow214 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-qcow215 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c
baseline version:
 xen  ced21fbb2842ac4655048bdee56232974ff9ff9c

Last test of basis   186078  2024-05-22 12:53:24 Z0 days
Testing same since  (not found) 0 attempts

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64-xtf  pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass

[XEN PATCH] x86/iommu: Conditionally compile platform-specific union entries

2024-05-23 Thread Teddy Astie

If some platform driver isn't compiled in, remove its related union
entries as they are not used.

Signed-off-by Teddy Astie 
---
 xen/arch/x86/include/asm/iommu.h | 4 
 xen/arch/x86/include/asm/pci.h   | 4 
 2 files changed, 8 insertions(+)

diff --git a/xen/arch/x86/include/asm/iommu.h b/xen/arch/x86/include/asm/iommu.h
index 8dc464fbd3..99180940c4 100644
--- a/xen/arch/x86/include/asm/iommu.h
+++ b/xen/arch/x86/include/asm/iommu.h
@@ -42,17 +42,21 @@ struct arch_iommu
 struct list_head identity_maps;
 
 union {
+#ifdef CONFIG_INTEL_IOMMU
 /* Intel VT-d */
 struct {
 uint64_t pgd_maddr; /* io page directory machine address */
 unsigned int agaw; /* adjusted guest address width, 0 is level 2 
30-bit */
 unsigned long *iommu_bitmap; /* bitmap of iommu(s) that the domain 
uses */
 } vtd;
+#endif
+#ifdef CONFIG_AMD_IOMMU
 /* AMD IOMMU */
 struct {
 unsigned int paging_mode;
 struct page_info *root_table;
 } amd;
+#endif
 };
 };
 
diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h
index fd5480d67d..842710f0dc 100644
--- a/xen/arch/x86/include/asm/pci.h
+++ b/xen/arch/x86/include/asm/pci.h
@@ -22,12 +22,16 @@ struct arch_pci_dev {
  */
 union {
 /* Subset of struct arch_iommu's fields, to be used in dom_io. */
+#ifdef CONFIG_INTEL_IOMMU
 struct {
 uint64_t pgd_maddr;
 } vtd;
+#endif
+#ifdef CONFIG_AMD_IOMMU
 struct {
 struct page_info *root_table;
 } amd;
+#endif
 };
 domid_t pseudo_domid;
 mfn_t leaf_mfn;
-- 
2.45.1



Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech

Re: [PATCH v2 4/4] tools: Drop libsystemd as a dependency

2024-05-23 Thread Jürgen Groß


On 16.05.24 20:58, Andrew Cooper wrote:

There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends.  Drop the dependency.

We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.

Rerun autogen.sh, and mark the dependency as removed in the build containers.

Signed-off-by: Andrew Cooper 


Reviewed-by: Juergen Gross 


Juergen

Re: [PATCH v2 4/4] tools: Drop libsystemd as a dependency

2024-05-23 Thread Andrew Cooper

On 23/05/2024 9:27 am, Jürgen Groß wrote:
> On 16.05.24 20:58, Andrew Cooper wrote:
>> diff --git a/automation/build/archlinux/current.dockerfile
>> b/automation/build/archlinux/current.dockerfile
>> index 3e37ab5c40c1..d29f1358c2bd 100644
>> --- a/automation/build/archlinux/current.dockerfile
>> +++ b/automation/build/archlinux/current.dockerfile
>> @@ -37,6 +37,7 @@ RUN pacman -S --refresh --sysupgrade --noconfirm
>> --noprogressbar --needed \
>>   sdl2 \
>>   spice \
>>   spice-protocol \
>> +    # systemd for Xen < 4.19
>
> Does this work as intended? A comment between the parameters and no
> "\" at the
> end of the line?

Sadly, yes.

Comments are stripped out on a line-granuar basis, prior to Docker
interpreting the remainder.

This is the approved way to do comments in dockerfiles, and we already
have other examples of this in our dockerfiles.

See e.g. a0e29b316363d9 for what I'll be doing with these comments in
~3y time.

~Andrew

Re: [PATCH v2 4/4] tools: Drop libsystemd as a dependency

2024-05-23 Thread Jürgen Groß


On 16.05.24 20:58, Andrew Cooper wrote:

There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends.  Drop the dependency.

We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.

Rerun autogen.sh, and mark the dependency as removed in the build containers.

Signed-off-by: Andrew Cooper 
---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: George Dunlap 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Christian Lindig 
CC: Edwin Török 

v2:
  * Only strip out the library check.
---
  automation/build/archlinux/current.dockerfile |   1 +
  .../build/suse/opensuse-leap.dockerfile   |   1 +
  .../build/suse/opensuse-tumbleweed.dockerfile |   1 +
  automation/build/ubuntu/focal.dockerfile  |   1 +
  config/Tools.mk.in|   2 -
  m4/systemd.m4 |   9 -
  tools/configure   | 256 --
  7 files changed, 4 insertions(+), 267 deletions(-)

diff --git a/automation/build/archlinux/current.dockerfile 
b/automation/build/archlinux/current.dockerfile
index 3e37ab5c40c1..d29f1358c2bd 100644
--- a/automation/build/archlinux/current.dockerfile
+++ b/automation/build/archlinux/current.dockerfile
@@ -37,6 +37,7 @@ RUN pacman -S --refresh --sysupgrade --noconfirm 
--noprogressbar --needed \
  sdl2 \
  spice \
  spice-protocol \
+# systemd for Xen < 4.19


Does this work as intended? A comment between the parameters and no "\" at the
end of the line?


Juergen

Re: [PATCH v2 3/4] tools/{c,o}xenstored: Don't link against libsystemd

2024-05-23 Thread Jürgen Groß


On 16.05.24 20:58, Andrew Cooper wrote:

Use the local freestanding wrapper instead.

Signed-off-by: Andrew Cooper 


Reviewed-by: Juergen Gross  # tools/xenstored


Juergen


---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: George Dunlap 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Christian Lindig 
CC: Edwin Török 

v2:
  * Redo almost from scratch, using the freestanding wrapper instead.
---
  tools/ocaml/xenstored/Makefile| 2 --
  tools/ocaml/xenstored/systemd_stubs.c | 2 +-
  tools/xenstored/Makefile  | 5 -
  tools/xenstored/posix.c   | 4 ++--
  4 files changed, 3 insertions(+), 10 deletions(-)

diff --git a/tools/ocaml/xenstored/Makefile b/tools/ocaml/xenstored/Makefile
index e8aaecf2e630..a8b8bb64698e 100644
--- a/tools/ocaml/xenstored/Makefile
+++ b/tools/ocaml/xenstored/Makefile
@@ -4,8 +4,6 @@ include $(OCAML_TOPLEVEL)/common.make
  
  # Include configure output (config.h)

  CFLAGS += -include $(XEN_ROOT)/tools/config.h
-CFLAGS-$(CONFIG_SYSTEMD)  += $(SYSTEMD_CFLAGS)
-LDFLAGS-$(CONFIG_SYSTEMD) += $(SYSTEMD_LIBS)
  
  CFLAGS  += $(CFLAGS-y)

  CFLAGS  += $(APPEND_CFLAGS)
diff --git a/tools/ocaml/xenstored/systemd_stubs.c 
b/tools/ocaml/xenstored/systemd_stubs.c
index f4c875075abe..7dbbdd35bf30 100644
--- a/tools/ocaml/xenstored/systemd_stubs.c
+++ b/tools/ocaml/xenstored/systemd_stubs.c
@@ -25,7 +25,7 @@
  
  #if defined(HAVE_SYSTEMD)
  
-#include 

+#include 
  
  CAMLprim value ocaml_sd_notify_ready(value ignore)

  {
diff --git a/tools/xenstored/Makefile b/tools/xenstored/Makefile
index e0897ed1ba30..09adfe1d5064 100644
--- a/tools/xenstored/Makefile
+++ b/tools/xenstored/Makefile
@@ -9,11 +9,6 @@ xenstored: LDLIBS += $(LDLIBS_libxenctrl)
  xenstored: LDLIBS += -lrt
  xenstored: LDLIBS += $(SOCKET_LIBS)
  
-ifeq ($(CONFIG_SYSTEMD),y)

-$(XENSTORED_OBJS-y): CFLAGS += $(SYSTEMD_CFLAGS)
-xenstored: LDLIBS += $(SYSTEMD_LIBS)
-endif
-
  TARGETS := xenstored
  
  .PHONY: all

diff --git a/tools/xenstored/posix.c b/tools/xenstored/posix.c
index d88c82d972d7..6037d739d013 100644
--- a/tools/xenstored/posix.c
+++ b/tools/xenstored/posix.c
@@ -27,7 +27,7 @@
  #include 
  #include 
  #if defined(HAVE_SYSTEMD)
-#include 
+#include 
  #endif
  #include 
  
@@ -393,7 +393,7 @@ void late_init(bool live_update)

  #if defined(HAVE_SYSTEMD)
if (!live_update) {
sd_notify(1, "READY=1");
-   fprintf(stderr, SD_NOTICE "xenstored is ready\n");
+   fprintf(stderr, "xenstored is ready\n");
}
  #endif
  }

Re: [PATCH v4 2/2] drivers/char: Use sub-page ro API to make just xhci dbc cap RO

2024-05-23 Thread Jan Beulich

On 22.05.2024 17:39, Marek Marczykowski-Górecki wrote:
> Not the whole page, which may contain other registers too. The XHCI
> specification describes DbC as designed to be controlled by a different
> driver, but does not mandate placing registers on a separate page. In fact
> on Tiger Lake and newer (at least), this page do contain other registers
> that Linux tries to use. And with share=yes, a domU would use them too.
> Without this patch, PV dom0 would fail to initialize the controller,
> while HVM would be killed on EPT violation.
> 
> With `share=yes`, this patch gives domU more access to the emulator
> (although a HVM with any emulated device already has plenty of it). This
> configuration is already documented as unsafe with untrusted guests and
> not security supported.
> 
> Signed-off-by: Marek Marczykowski-Górecki 

Reviewed-by: Jan Beulich

Re: [PATCH v2 2/4] tools: Import standalone sd_notify() implementation from systemd

2024-05-23 Thread Jürgen Groß


On 16.05.24 20:58, Andrew Cooper wrote:

... in order to avoid linking against the whole of libsystemd.

Only minimal changes to the upstream copy, to function as a drop-in
replacement for sd_notify() and as a header-only library.

Signed-off-by: Andrew Cooper 


With s/cleanup(sd_closep)/cleanup(xen_sd_closep)/

Reviewed-by: Juergen Gross 


Juergen


---
CC: Anthony PERARD 
CC: Juergen Gross 
CC: George Dunlap 
CC: Jan Beulich 
CC: Stefano Stabellini 
CC: Julien Grall 
CC: Christian Lindig 
CC: Edwin Török 

v2:
  * New
---
  tools/include/xen-sd-notify.h | 98 +++
  1 file changed, 98 insertions(+)
  create mode 100644 tools/include/xen-sd-notify.h

diff --git a/tools/include/xen-sd-notify.h b/tools/include/xen-sd-notify.h
new file mode 100644
index ..eda9d8b22d9e
--- /dev/null
+++ b/tools/include/xen-sd-notify.h
@@ -0,0 +1,98 @@
+/* SPDX-License-Identifier: MIT-0 */
+
+/*
+ * Implement the systemd notify protocol without external dependencies.
+ * Supports both readiness notification on startup and on reloading,
+ * according to the protocol defined at:
+ * https://www.freedesktop.org/software/systemd/man/latest/sd_notify.html
+ * This protocol is guaranteed to be stable as per:
+ * https://systemd.io/PORTABILITY_AND_STABILITY/
+ *
+ * Differences from the upstream copy:
+ * - Rename/rework as a drop-in replacement for systemd/sd-daemon.h
+ * - Only take the subset Xen cares about
+ * - Respect -Wdeclaration-after-statement
+ */
+
+#ifndef XEN_SD_NOTIFY
+#define XEN_SD_NOTIFY
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static inline void xen_sd_closep(int *fd) {
+  if (!fd || *fd < 0)
+return;
+
+  close(*fd);
+  *fd = -1;
+}
+
+static inline int xen_sd_notify(const char *message) {
+  union sockaddr_union {
+struct sockaddr sa;
+struct sockaddr_un sun;
+  } socket_addr = {
+.sun.sun_family = AF_UNIX,
+  };
+  size_t path_length, message_length;
+  ssize_t written;
+  const char *socket_path;
+  int __attribute__((cleanup(sd_closep))) fd = -1;
+
+  /* Verify the argument first */
+  if (!message)
+return -EINVAL;
+
+  message_length = strlen(message);
+  if (message_length == 0)
+return -EINVAL;
+
+  /* If the variable is not set, the protocol is a noop */
+  socket_path = getenv("NOTIFY_SOCKET");
+  if (!socket_path)
+return 0; /* Not set? Nothing to do */
+
+  /* Only AF_UNIX is supported, with path or abstract sockets */
+  if (socket_path[0] != '/' && socket_path[0] != '@')
+return -EAFNOSUPPORT;
+
+  path_length = strlen(socket_path);
+  /* Ensure there is room for NUL byte */
+  if (path_length >= sizeof(socket_addr.sun.sun_path))
+return -E2BIG;
+
+  memcpy(socket_addr.sun.sun_path, socket_path, path_length);
+
+  /* Support for abstract socket */
+  if (socket_addr.sun.sun_path[0] == '@')
+socket_addr.sun.sun_path[0] = 0;
+
+  fd = socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0);
+  if (fd < 0)
+return -errno;
+
+  if (connect(fd, _addr.sa, offsetof(struct sockaddr_un, sun_path) + 
path_length) != 0)
+return -errno;
+
+  written = write(fd, message, message_length);
+  if (written != (ssize_t) message_length)
+return written < 0 ? -errno : -EPROTO;
+
+  return 1; /* Notified! */
+}
+
+static inline int sd_notify(int unset_environment, const char *message) {
+int r = xen_sd_notify(message);
+
+if (unset_environment)
+unsetenv("NOTIFY_SOCKET");
+
+return r;
+}
+
+#endif /* XEN_SD_NOTIFY */

Re: [PATCH] x86/shadow: don't leave trace record field uninitialized

2024-05-23 Thread Oleksii K.

On Wed, 2024-05-22 at 12:17 +0200, Jan Beulich wrote:
> The emulation_count field is set only conditionally right now.
> Convert
> all field setting to an initializer, thus guaranteeing that field to
> be
> set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.
> 
> While there also drop the "event" local variable, thus eliminating an
> instance of the being phased out u32 type.
> 
> Coverity ID: 1598430
> Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow
> code")
> Signed-off-by: Jan Beulich 
Release-acked-by: Oleksii Kurochko 

~ Oleksii
> 
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -2093,20 +2093,18 @@ static inline void trace_shadow_emulate(
>  guest_l1e_t gl1e, write_val;
>  guest_va_t va;
>  uint32_t flags:29, emulation_count:3;
> -    } d;
> -    u32 event;
> -
> -    event = TRC_SHADOW_EMULATE | ((GUEST_PAGING_LEVELS-2)<<8);
> -
> -    d.gl1e = gl1e;
> -    d.write_val.l1 = this_cpu(trace_emulate_write_val);
> -    d.va = va;
> +    } d = {
> +    .gl1e = gl1e,
> +    .write_val.l1 = this_cpu(trace_emulate_write_val),
> +    .va = va,
>  #if GUEST_PAGING_LEVELS == 3
> -    d.emulation_count = this_cpu(trace_extra_emulation_count);
> +    .emulation_count =
> this_cpu(trace_extra_emulation_count),
>  #endif
> -    d.flags = this_cpu(trace_shadow_path_flags);
> +    .flags = this_cpu(trace_shadow_path_flags),
> +    };
>  
> -    trace(event, sizeof(d), );
> +    trace(TRC_SHADOW_EMULATE | ((GUEST_PAGING_LEVELS - 2) << 8),
> +  sizeof(d), );
>  }
>  }
>  #endif /* CONFIG_HVM */

1 2 >

1 - 100 of 119 matches

Mail list logo