Re: [PATCH] xen-pcifront: Handle missed Connected state

2022-10-05 Thread Juergen Gross

On 29.08.22 17:15, Jason Andryuk wrote:

An HVM guest with linux stubdom and 2 PCI devices failed to start as
libxl timed out waiting for the PCI devices to be added.  It happens
intermittently but with some regularity.  libxl wrote the two xenstore
entries for the devices, but then timed out waiting for backend state 4
(Connected) - the state stayed at 7 (Reconfiguring).  (PCI passthrough
to an HVM with stubdomain is PV passthrough to the stubdomain and then
HVM passthrough with the QEMU inside the stubdomain.)

The stubdom kernel never printed "pcifront pci-0: Installing PCI
frontend", so it seems to have missed state 4 which would have
called pcifront_try_connect -> pcifront_connect_and_init_dma

Have pcifront_detach_devices special-case state Initialised and call
pcifront_connect_and_init_dma.  Don't use pcifront_try_connect because
that sets the xenbus state which may throw off the backend.  After
connecting, skip the remainder of detach_devices since none have been
initialized yet.  When the backend switches to Reconfigured,
pcifront_attach_devices will pick them up again.

Signed-off-by: Jason Andryuk 


Pushed to xen/tip.git for-linus-6.1


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/2] xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices

2022-10-05 Thread Juergen Gross

On 05.10.22 19:48, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

As find_xen_grant_dma_data() is called from both interrupt and process
contexts, the access to xen_grant_dma_devices XArray must be protected
by xa_lock_irqsave to avoid deadlock scenario.
As XArray API doesn't provide xa_store_irqsave helper, call lockless
__xa_store directly and guard it externally.

Also move the storage of the XArray's entry to a separate helper.

Signed-off-by: Oleksandr Tyshchenko 
Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access under 
Xen")


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[linux-5.4 test] 173433: regressions - FAIL

2022-10-05 Thread osstest service owner
flight 173433 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173433/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 173353
 test-armhf-armhf-xl-credit2  14 guest-start  fail REGR. vs. 173353
 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeat fail REGR. vs. 
173353

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-examine  5 host-install   broken in 173427 pass in 173433
 test-amd64-amd64-xl-rtds  8 xen-boot fail in 173427 pass in 173433
 test-armhf-armhf-xl-credit1  14 guest-start  fail in 173427 pass in 173433
 test-armhf-armhf-xl-multivcpu 14 guest-start fail in 173427 pass in 173433
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 14 
guest-start/debianhvm.repeat fail pass in 173427
 test-armhf-armhf-xl-rtds 14 guest-startfail pass in 173427

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
173353
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail in 173427 like 173353
 test-armhf-armhf-xl-rtds15 migrate-support-check fail in 173427 never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-check fail in 173427 never pass
 test-armhf-armhf-xl-credit1  18 guest-start/debian.repeatfail  like 173353
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173353
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173353
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173353
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173353
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173353
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173353
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173353
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173353
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173353
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173353
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173353
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   

Re: [PATCH 1/2] xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()

2022-10-05 Thread Juergen Gross

On 05.10.22 19:48, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

Take page offset into the account when calculating the number of pages
to be granted.

Signed-off-by: Oleksandr Tyshchenko 
Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access under 
Xen")


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered

2022-10-05 Thread Demi Marie Obenour
On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
>  wrote:
> >
> > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > This function does not work under Xen because Xen could have already
> > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > for this thread, as it prevents EFI tables that are in
> > > > EfiBootServicesData from being used under Xen.
> > > >
> > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > would allow efi_mem_reserve() to work normally.
> > >
> > > efi_mem_reserve() actually working would be a layering violation;
> > > controlling the EFI memory map is entirely Xen's job.
> >
> > Doing this properly would require Xen to understand all of the EFI
> > tables that could validly be in EfiBootServices* and which could be of
> > interest to dom0.  It might (at least on some very buggy firmware)
> > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> >
> > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > opinion that default behavior should be matching the intentions of the
> > > spec, and the intention of EfiBootServices* is for the space to be
> > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > that hypercall: It might use it for regions where data lives which it
> > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > entity would later want to consume. Code/data potentially usable by
> > > _anyone_ between two resets of the system cannot legitimately be freed
> > > (and hence imo is wrong to live in EfiBootServices* regions).
> >
> > I agree, but currently some such data *is* in EfiBootServices* regions,
> > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > configuration tables that point to EfiBootServicesData memory before
> > freeing that memory.
> >
> 
> That seems like a reasonable approach to me. Tables like MEMATTR or
> RT_PROP are mostly relevant for bare metal where the host kernel maps
> the runtime services, and in general, passing on these tables without
> knowing what they do is kind of fishy anyway. You might even argue
> that only known table types should be forwarded in the first place,
> regardless of the memory type.

Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
ESRT, but I am curious which others Xen should preserve.  Currently, Xen
does not know about RT_PROP or MEMATTR; could this be a cause of
problems?
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


[qemu-mainline test] 173431: trouble: broken/fail/pass

2022-10-05 Thread osstest service owner
flight 173431 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173431/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-shadow   broken
 test-amd64-amd64-xl-shadow5 host-install(5)broken REGR. vs. 173408

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173408
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173408
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173408
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173408
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 173408
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173408
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173408
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173408
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 qemuufafd35a6dab8e70a7c395aaa8e1273267cf9f3c8
baseline version:
 qemuuefbf38d73e5dcc4d5f8b98c6e7a12be1f3b91745

Last test of basis   173408  2022-10-03 22:08:40 Z2 days

Re: [PATCH 2/2] xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices

2022-10-05 Thread Stefano Stabellini
On Wed, 5 Oct 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> As find_xen_grant_dma_data() is called from both interrupt and process
> contexts, the access to xen_grant_dma_devices XArray must be protected
> by xa_lock_irqsave to avoid deadlock scenario.
> As XArray API doesn't provide xa_store_irqsave helper, call lockless
> __xa_store directly and guard it externally.
> 
> Also move the storage of the XArray's entry to a separate helper.
> 
> Signed-off-by: Oleksandr Tyshchenko 
> Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access 
> under Xen")

Reviewed-by: Stefano Stabellini 


> ---
>  drivers/xen/grant-dma-ops.c | 24 +++-
>  1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index 1998d0e8ce82..c66f56d24013 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -25,7 +25,7 @@ struct xen_grant_dma_data {
>   bool broken;
>  };
>  
> -static DEFINE_XARRAY(xen_grant_dma_devices);
> +static DEFINE_XARRAY_FLAGS(xen_grant_dma_devices, XA_FLAGS_LOCK_IRQ);
>  
>  #define XEN_GRANT_DMA_ADDR_OFF   (1ULL << 63)
>  
> @@ -42,14 +42,29 @@ static inline grant_ref_t dma_to_grant(dma_addr_t dma)
>  static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
>  {
>   struct xen_grant_dma_data *data;
> + unsigned long flags;
>  
> - xa_lock(_grant_dma_devices);
> + xa_lock_irqsave(_grant_dma_devices, flags);
>   data = xa_load(_grant_dma_devices, (unsigned long)dev);
> - xa_unlock(_grant_dma_devices);
> + xa_unlock_irqrestore(_grant_dma_devices, flags);
>  
>   return data;
>  }
>  
> +static int store_xen_grant_dma_data(struct device *dev,
> + struct xen_grant_dma_data *data)
> +{
> + unsigned long flags;
> + int ret;
> +
> + xa_lock_irqsave(_grant_dma_devices, flags);
> + ret = xa_err(__xa_store(_grant_dma_devices, (unsigned long)dev, 
> data,
> + GFP_ATOMIC));
> + xa_unlock_irqrestore(_grant_dma_devices, flags);
> +
> + return ret;
> +}
> +
>  /*
>   * DMA ops for Xen frontends (e.g. virtio).
>   *
> @@ -338,8 +353,7 @@ void xen_grant_setup_dma_ops(struct device *dev)
>*/
>   data->backend_domid = iommu_spec.args[0];
>  
> - if (xa_err(xa_store(_grant_dma_devices, (unsigned long)dev, data,
> - GFP_KERNEL))) {
> + if (store_xen_grant_dma_data(dev, data)) {
>   dev_err(dev, "Cannot store Xen grant DMA data\n");
>   goto err;
>   }
> -- 
> 2.25.1
> 



Re: [PATCH 1/2] xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()

2022-10-05 Thread Stefano Stabellini
On Wed, 5 Oct 2022, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> Take page offset into the account when calculating the number of pages
> to be granted.
> 
> Signed-off-by: Oleksandr Tyshchenko 
> Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access 
> under Xen")

Reviewed-by: Stefano Stabellini 


> ---
>  drivers/xen/grant-dma-ops.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
> index 8973fc1e9ccc..1998d0e8ce82 100644
> --- a/drivers/xen/grant-dma-ops.c
> +++ b/drivers/xen/grant-dma-ops.c
> @@ -153,7 +153,7 @@ static dma_addr_t xen_grant_dma_map_page(struct device 
> *dev, struct page *page,
>unsigned long attrs)
>  {
>   struct xen_grant_dma_data *data;
> - unsigned int i, n_pages = PFN_UP(size);
> + unsigned int i, n_pages = PFN_UP(offset + size);
>   grant_ref_t grant;
>   dma_addr_t dma_handle;
>  
> @@ -185,7 +185,8 @@ static void xen_grant_dma_unmap_page(struct device *dev, 
> dma_addr_t dma_handle,
>unsigned long attrs)
>  {
>   struct xen_grant_dma_data *data;
> - unsigned int i, n_pages = PFN_UP(size);
> + unsigned long offset = dma_handle & (PAGE_SIZE - 1);
> + unsigned int i, n_pages = PFN_UP(offset + size);
>   grant_ref_t grant;
>  
>   if (WARN_ON(dir == DMA_NONE))
> -- 
> 2.25.1
> 



[xen-unstable test] 173430: tolerable FAIL - PUSHED

2022-10-05 Thread osstest service owner
flight 173430 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173430/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173422
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173422
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173422
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173422
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173422
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173422
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173422
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173422
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 173422
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173422
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173422
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173422
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  66a5633aa038f4abb4455463755974febac69034
baseline version:
 xen  

Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered

2022-10-05 Thread Ard Biesheuvel
On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
 wrote:
>
> On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > This function does not work under Xen because Xen could have already
> > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > for this thread, as it prevents EFI tables that are in
> > > EfiBootServicesData from being used under Xen.
> > >
> > > A much nicer approach would be for Xen to reserve boot services memory
> > > unconditionally, but provide a hypercall that dom0 could used to free
> > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > would allow efi_mem_reserve() to work normally.
> >
> > efi_mem_reserve() actually working would be a layering violation;
> > controlling the EFI memory map is entirely Xen's job.
>
> Doing this properly would require Xen to understand all of the EFI
> tables that could validly be in EfiBootServices* and which could be of
> interest to dom0.  It might (at least on some very buggy firmware)
> require a partial ACPI and/or SMBIOS implementation too, if the firmware
> decided to put an ACPI or SMBIOS table in EfiBootServices*.
>
> > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > for the case when -mapbs is used. As I've indicated before, I'm of the
> > opinion that default behavior should be matching the intentions of the
> > spec, and the intention of EfiBootServices* is for the space to be
> > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > that hypercall: It might use it for regions where data lives which it
> > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > entity would later want to consume. Code/data potentially usable by
> > _anyone_ between two resets of the system cannot legitimately be freed
> > (and hence imo is wrong to live in EfiBootServices* regions).
>
> I agree, but currently some such data *is* in EfiBootServices* regions,
> sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> configuration tables that point to EfiBootServicesData memory before
> freeing that memory.
>

That seems like a reasonable approach to me. Tables like MEMATTR or
RT_PROP are mostly relevant for bare metal where the host kernel maps
the runtime services, and in general, passing on these tables without
knowing what they do is kind of fishy anyway. You might even argue
that only known table types should be forwarded in the first place,
regardless of the memory type.



Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered

2022-10-05 Thread Demi Marie Obenour
On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > Linux has a function called efi_mem_reserve() that is used to reserve
> > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > This function does not work under Xen because Xen could have already
> > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > for this thread, as it prevents EFI tables that are in
> > EfiBootServicesData from being used under Xen.
> > 
> > A much nicer approach would be for Xen to reserve boot services memory
> > unconditionally, but provide a hypercall that dom0 could used to free
> > the parts of EfiBootServicesData memory that are no longer needed.  This
> > would allow efi_mem_reserve() to work normally.
> 
> efi_mem_reserve() actually working would be a layering violation;
> controlling the EFI memory map is entirely Xen's job.

Doing this properly would require Xen to understand all of the EFI
tables that could validly be in EfiBootServices* and which could be of
interest to dom0.  It might (at least on some very buggy firmware)
require a partial ACPI and/or SMBIOS implementation too, if the firmware
decided to put an ACPI or SMBIOS table in EfiBootServices*.

> As to the hypercall you suggest - I wouldn't mind its addition, but only
> for the case when -mapbs is used. As I've indicated before, I'm of the
> opinion that default behavior should be matching the intentions of the
> spec, and the intention of EfiBootServices* is for the space to be
> reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> that hypercall: It might use it for regions where data lives which it
> wouldn't care about itself, but which an eventual kexec-ed (or alike)
> entity would later want to consume. Code/data potentially usable by
> _anyone_ between two resets of the system cannot legitimately be freed
> (and hence imo is wrong to live in EfiBootServices* regions).

I agree, but currently some such data *is* in EfiBootServices* regions,
sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
configuration tables that point to EfiBootServicesData memory before
freeing that memory.

> In a way one could view the Dom0 kernel as an "or alike" entity ...

It is indeed such an entity.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH][4.17] EFI: don't convert memory marked for runtime use to ordinary RAM

2022-10-05 Thread Julien Grall

Hi Jan,

On 05/10/2022 12:55, Jan Beulich wrote:

On 05.10.2022 12:44, Julien Grall wrote:

On 04/10/2022 16:58, Jan Beulich wrote:

On 30.09.2022 14:51, Bertrand Marquis wrote:

On 30 Sep 2022, at 09:50, Jan Beulich  wrote:

efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
higher priority than the type of the range. To avoid accessing memory at
runtime which was re-used for other purposes, make
efi_arch_process_memory_map() follow suit. While on x86 in theory the
same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
E820_ACPI memory there and hence that type's handling can be left alone.

Fixes: bf6501a62e80 ("x86-64: EFI boot code")
Fixes: facac0af87ef ("x86-64: EFI runtime code")
Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
Signed-off-by: Jan Beulich 


Reviewed-by: Bertrand Marquis  #arm


Thanks. However ...


---
Partly RFC for Arm, for two reasons:

On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
For one like on x86 such ranges would likely better be retained, as Dom0
may (will?) have a need to look at tables placed there. Plus converting
such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
me as well. I'd be inclined to make the latter adjustment right here
(while the other change probably would better be separate, if there
aren't actually reasons for the present behavior).


... any views on this WB aspect at least (also Stefano or Julien)? Would be
good to know before I send v2.


I don't quite understand what you are questioning here. Looking at the
code, EfiACPIReclaimMemory will not be converted to RAM but added in a
separate array.

Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0
(see acpi_create_efi_mmap_table()).

So to me the code looks correct.


Oh, I've indeed not paid enough attention to the first argument passed
to meminfo_add_bank(). I'm sorry for the extra noise. However, the
question I wanted to have addressed before sending out v3 was that
regarding the present using of memory when EFI_MEMORY_WB is not set.
Is that correct for the EfiACPIReclaimMemory case, i.e. is the
consumer (Dom0) aware that there might be a restriction?


Looking at the code, we always set EFI_MEMORY_WB for the reclaimable 
region and the stage-2 mapping will be cachable.


So it looks like there would be a mismatch if EFI_MEMORY_WB is not set. 
However, given the region is reclaimable, shouldn't this imply that the 
flag is always set?



And would
this memory then be guaranteed to never be freed into the general pool
of RAM pages?


The region is not treated as RAM by Xen and not owned by the dom0. 
Therefore, it should not be possible to free the page because 
get_page_from_gfn() would not be able to get a reference.


Cheers,

--
Julien Grall



[PATCH 2/2] xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices

2022-10-05 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

As find_xen_grant_dma_data() is called from both interrupt and process
contexts, the access to xen_grant_dma_devices XArray must be protected
by xa_lock_irqsave to avoid deadlock scenario.
As XArray API doesn't provide xa_store_irqsave helper, call lockless
__xa_store directly and guard it externally.

Also move the storage of the XArray's entry to a separate helper.

Signed-off-by: Oleksandr Tyshchenko 
Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access 
under Xen")
---
 drivers/xen/grant-dma-ops.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 1998d0e8ce82..c66f56d24013 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -25,7 +25,7 @@ struct xen_grant_dma_data {
bool broken;
 };
 
-static DEFINE_XARRAY(xen_grant_dma_devices);
+static DEFINE_XARRAY_FLAGS(xen_grant_dma_devices, XA_FLAGS_LOCK_IRQ);
 
 #define XEN_GRANT_DMA_ADDR_OFF (1ULL << 63)
 
@@ -42,14 +42,29 @@ static inline grant_ref_t dma_to_grant(dma_addr_t dma)
 static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev)
 {
struct xen_grant_dma_data *data;
+   unsigned long flags;
 
-   xa_lock(_grant_dma_devices);
+   xa_lock_irqsave(_grant_dma_devices, flags);
data = xa_load(_grant_dma_devices, (unsigned long)dev);
-   xa_unlock(_grant_dma_devices);
+   xa_unlock_irqrestore(_grant_dma_devices, flags);
 
return data;
 }
 
+static int store_xen_grant_dma_data(struct device *dev,
+   struct xen_grant_dma_data *data)
+{
+   unsigned long flags;
+   int ret;
+
+   xa_lock_irqsave(_grant_dma_devices, flags);
+   ret = xa_err(__xa_store(_grant_dma_devices, (unsigned long)dev, 
data,
+   GFP_ATOMIC));
+   xa_unlock_irqrestore(_grant_dma_devices, flags);
+
+   return ret;
+}
+
 /*
  * DMA ops for Xen frontends (e.g. virtio).
  *
@@ -338,8 +353,7 @@ void xen_grant_setup_dma_ops(struct device *dev)
 */
data->backend_domid = iommu_spec.args[0];
 
-   if (xa_err(xa_store(_grant_dma_devices, (unsigned long)dev, data,
-   GFP_KERNEL))) {
+   if (store_xen_grant_dma_data(dev, data)) {
dev_err(dev, "Cannot store Xen grant DMA data\n");
goto err;
}
-- 
2.25.1




[PATCH 0/2] Misc fixes for Xen grant DMA-mapping layer

2022-10-05 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Hello all.

These are several fixes I collected when playing with virtio-net device
using Xen grant mappings.

Tested with both virtio-blk and virtio-net devices.

Oleksandr Tyshchenko (2):
  xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
  xen/virtio: Fix potential deadlock when accessing
xen_grant_dma_devices

 drivers/xen/grant-dma-ops.c | 29 ++---
 1 file changed, 22 insertions(+), 7 deletions(-)

-- 
2.25.1




[PATCH 1/2] xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()

2022-10-05 Thread Oleksandr Tyshchenko
From: Oleksandr Tyshchenko 

Take page offset into the account when calculating the number of pages
to be granted.

Signed-off-by: Oleksandr Tyshchenko 
Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access 
under Xen")
---
 drivers/xen/grant-dma-ops.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c
index 8973fc1e9ccc..1998d0e8ce82 100644
--- a/drivers/xen/grant-dma-ops.c
+++ b/drivers/xen/grant-dma-ops.c
@@ -153,7 +153,7 @@ static dma_addr_t xen_grant_dma_map_page(struct device 
*dev, struct page *page,
 unsigned long attrs)
 {
struct xen_grant_dma_data *data;
-   unsigned int i, n_pages = PFN_UP(size);
+   unsigned int i, n_pages = PFN_UP(offset + size);
grant_ref_t grant;
dma_addr_t dma_handle;
 
@@ -185,7 +185,8 @@ static void xen_grant_dma_unmap_page(struct device *dev, 
dma_addr_t dma_handle,
 unsigned long attrs)
 {
struct xen_grant_dma_data *data;
-   unsigned int i, n_pages = PFN_UP(size);
+   unsigned long offset = dma_handle & (PAGE_SIZE - 1);
+   unsigned int i, n_pages = PFN_UP(offset + size);
grant_ref_t grant;
 
if (WARN_ON(dir == DMA_NONE))
-- 
2.25.1




[linux-5.4 test] 173427: regressions - FAIL

2022-10-05 Thread osstest service owner
flight 173427 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173427/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-examine  5 host-install   broken REGR. vs. 173353
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 173353
 test-armhf-armhf-xl-credit2  14 guest-start  fail REGR. vs. 173353
 test-armhf-armhf-xl-credit1  14 guest-start  fail REGR. vs. 173353
 test-armhf-armhf-xl-multivcpu 14 guest-start fail REGR. vs. 173353

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds  8 xen-boot fail REGR. vs. 173353

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 
173353
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail  like 173353
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173353
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173353
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173353
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173353
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173353
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173353
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173353
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173353
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173353
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173353
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173353
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 

[linux-linus test] 173426: tolerable FAIL - PUSHED

2022-10-05 Thread osstest service owner
flight 173426 linux-linus real [real]
flight 173432 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/173426/
http://logs.test-lab.xenproject.org/osstest/logs/173432/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-freebsd12-amd64 18 guest-saverestore.2 fail pass in 
173432-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173421
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173421
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173421
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173421
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 173421
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173421
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173421
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173421
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass

version targeted for testing:
 linux2bca25eaeba6190efbfcb38ed169bd7ee43b5aaf
baseline version:
 linux0326074ff4652329f2a1a9c8685104576bd8d131

Last test of basis   173421  2022-10-05 00:13:01 Z0 days
Testing same since   173426  2022-10-05 08:41:36 Z0 days1 attempts


People who touched revisions under test:
  Adam Skladowski 
  Adrien Grassein  # for lontium-lt8912b
  Akhil R 
  Aleksa Savic 
  Aleksandr Mezin 
  Alexander Stein 
  Alexandru Gagniuc 

Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Marek Marczykowski-Górecki
On Wed, Oct 05, 2022 at 05:45:56PM +0200, Juergen Gross wrote:
> On 05.10.22 17:35, Marek Marczykowski-Górecki wrote:
> > On Wed, Oct 05, 2022 at 05:04:29PM +0200, Juergen Gross wrote:
> > > On 05.10.22 15:51, Marek Marczykowski-Górecki wrote:
> > > > On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote:
> > > > > On 05.10.22 15:25, Marek Marczykowski-Górecki wrote:
> > > > > > On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:
> > > > > > > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > When booting Xen with Linux dom0 nested under KVM,
> > > > > > > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio 
> > > > > > > > devices
> > > > > > > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants 
> > > > > > > > are
> > > > > > > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled.
> > > > > > > > 
> > > > > > > > This is probably uncommon corner case, but one that has bitten 
> > > > > > > > me in my
> > > > > > > > CI setup... I think Xen should set smarter
> > > > > > > > virtio_require_restricted_mem_acc(), that enforces it only for 
> > > > > > > > devices
> > > > > > > > really provided by another Xen VM (not by the "outer host"), 
> > > > > > > > but I'm not
> > > > > > > > sure how that could be done. Any ideas?
> > > > > > > > 
> > > > > > > 
> > > > > > > It should be possible to add a boot parameter for that purpose. 
> > > > > > > Using it
> > > > > > > would open a security hole, though (basically like all PCI 
> > > > > > > passthrough to
> > > > > > > PV guests).
> > > > > > 
> > > > > > What about excluding just dom0? At least currently, there is no way 
> > > > > > for
> > > > > > dom0 to see virtio devices provided by another Xen domU.
> > > > > 
> > > > > Even not via hotplug?
> > > > 
> > > > That's why I said "currently", IIUC hotplug of virtio devices under Xen
> > > > doesn't work yet, no?
> > > > With hotplug working, it would need to be a proper detection where the
> > > > backend lives, and probably with some extra considerations re Xen on
> > > > Xen (based on below, pv-shim could use grants).
> > > 
> > > As stated before, this isn't a problem specific to virtio devices. The 
> > > same
> > > applies to Xen PV devices.
> > 
> > Why is that an issue for Xen PV devices? They always use grants, so no need
> > for exception. But more relevant here, there is no protocol for L0
> > hypervisor (that would need to be Xen) to provide PV device to nested L1
> > guest (besides pv-shim case, which is already handled), so L1 guest
> > cannot confuse PV device provided by L0 and L1.
> 
> That's the point. Today using virtio the way you are using it is possible
> only because virtio devices don't have the security features of Xen PV
> devices. With adding grant support for virtio devices this has changed.
> 
> BTW, you can have the old virtio behavior back by not enabling
> CONFIG_XEN_VIRTIO.

Yes, I know, and currently I'm doing it. But at some point I'd like to
be able to enable CONFIG_XEN_VIRTIO without breaking the nested virt
case. Ideally KVM virtio devices would use something like grants, and
that thing would work even through nested virt, but I don't think that's
strictly necessary for ensuring new security properties of virtio
devices where they can be enforced.

> > > > For me specifically, a command line option would work (because I don't
> > > > use Xen-based virtio devices when nested under KVM, or anywhere at all,
> > > > at least not yet), but I can see future cases where you have virtio
> > > > devices from both L0 and L1 in the same guest, and then it wouldn't be
> > > > that simple.
> > > 
> > > Lets think of a general solution covering all PV devices (Xen and virtio).
> > 
> > In fact, I wonder what's the security benefit of
> > CONFIG_XEN_VIRTIO_FORCE_GRANT. If the backend lives in dom0 (or
> > stubdomain), it can access whole guest memory anyway, whether frontend
> > likes it or not. But if the backend is elsewhere (or guest is protected
> > with AMD SEV-SNP, XSM or similar), then the backend won't be able to access
> > memory outside of what frontend shares explicitly. So, in the non-dom0 case,
> > backend trying to provide non-grant-based virtio device will simply not
> > function (because of inability to access guest's memory), instead of
> > gaining unintended access. Am I missing some implicit memory sharing
> > here?
> 
> You are missing the possibility to have a deprivileged user land virtio
> backend.

Ok, but that backend either can xenforeignmemory_map() arbitrary guest's
page (in which case it doesn't matter if the frontend driver likes
non-grant-based device or not), or it cannot (in which case,
non-grant-based device will simply not work, backend still won't have
access to the guest memory).

Sure, the error reporting might be different, but I don't think it
changes the outcome security-wise.

> And BTW, SEV won't disable guest memory access, it 

Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Juergen Gross

On 05.10.22 17:35, Marek Marczykowski-Górecki wrote:

On Wed, Oct 05, 2022 at 05:04:29PM +0200, Juergen Gross wrote:

On 05.10.22 15:51, Marek Marczykowski-Górecki wrote:

On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote:

On 05.10.22 15:25, Marek Marczykowski-Górecki wrote:

On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:

On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:

Hi,

When booting Xen with Linux dom0 nested under KVM,
CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
required for virtio even if just CONFIG_XEN_VIRTIO is enabled.

This is probably uncommon corner case, but one that has bitten me in my
CI setup... I think Xen should set smarter
virtio_require_restricted_mem_acc(), that enforces it only for devices
really provided by another Xen VM (not by the "outer host"), but I'm not
sure how that could be done. Any ideas?



It should be possible to add a boot parameter for that purpose. Using it
would open a security hole, though (basically like all PCI passthrough to
PV guests).


What about excluding just dom0? At least currently, there is no way for
dom0 to see virtio devices provided by another Xen domU.


Even not via hotplug?


That's why I said "currently", IIUC hotplug of virtio devices under Xen
doesn't work yet, no?
With hotplug working, it would need to be a proper detection where the
backend lives, and probably with some extra considerations re Xen on
Xen (based on below, pv-shim could use grants).


As stated before, this isn't a problem specific to virtio devices. The same
applies to Xen PV devices.


Why is that an issue for Xen PV devices? They always use grants, so no need
for exception. But more relevant here, there is no protocol for L0
hypervisor (that would need to be Xen) to provide PV device to nested L1
guest (besides pv-shim case, which is already handled), so L1 guest
cannot confuse PV device provided by L0 and L1.


That's the point. Today using virtio the way you are using it is possible
only because virtio devices don't have the security features of Xen PV
devices. With adding grant support for virtio devices this has changed.

BTW, you can have the old virtio behavior back by not enabling
CONFIG_XEN_VIRTIO.




For me specifically, a command line option would work (because I don't
use Xen-based virtio devices when nested under KVM, or anywhere at all,
at least not yet), but I can see future cases where you have virtio
devices from both L0 and L1 in the same guest, and then it wouldn't be
that simple.


Lets think of a general solution covering all PV devices (Xen and virtio).


In fact, I wonder what's the security benefit of
CONFIG_XEN_VIRTIO_FORCE_GRANT. If the backend lives in dom0 (or
stubdomain), it can access whole guest memory anyway, whether frontend
likes it or not. But if the backend is elsewhere (or guest is protected
with AMD SEV-SNP, XSM or similar), then the backend won't be able to access
memory outside of what frontend shares explicitly. So, in the non-dom0 case,
backend trying to provide non-grant-based virtio device will simply not
function (because of inability to access guest's memory), instead of
gaining unintended access. Am I missing some implicit memory sharing
here?


You are missing the possibility to have a deprivileged user land virtio
backend.

And BTW, SEV won't disable guest memory access, it will just make it
impossible to interprete memory contents from outside. A malicious
backend can still easily crash a SEV guest by clobbering its memory.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions

2022-10-05 Thread Jan Beulich
On 05.10.2022 16:12, Roger Pau Monné wrote:
> Hm, I have the following diff attached below which is more limited,
> and not so bad I think.  Initially I wanted to introduce an
> efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE
> which is quite intrusive.
> 
> Let me know if you think the proposal below is better and I will
> formally send a patch with it.

Hmm, personally I like this slightly better for, as you say, its more
limited effect. Objectively, however, I'm still unconvinced of making
this an EFI special case. How would non-EFI firmware go about
communicating that it is going to access a device at runtime (which,
as said before, I consider a no-go in the first place)? Likely by
putting its BAR range(s) in an E820_RESERVED region.

Plus the MMIO range covered on the system in question is pretty large.
That way we're still allowing pretty wide an abuse by the firmware.
Furthermore the MCFG range would also be covered by an
EfiMemoryMappedIO descriptor (in fact that's the only use of the type
I had been aware of so far). IOW the change specifically permits an
overlap of a BAR with an MCFG range.

Who's the manufacturer of the system? Or put in different words - how
likely is it that we could first gain understanding on their
intentions with this region? You did say the system hangs hard without
some kind of workaround, but that doesn't clarify to me in how far a
use of the device by the firmware was actually involved there.

Have you considered other routes towards dealing with the issue? One
approach coming to mind would build on top of what you've been doing
so far (either variant): Besides avoiding the turning off of memory
decode, also invoke pci_ro_device(), thus protecting it from having
its BARs relocated, and also preventing any driver in Dom0 to gain
control of it, thus avoiding two parties competing for the device.

Relocating the BAR outside of the reserved region would be nice, but
will likely not resolve the hang.

In any event I'm still hoping to have a 3rd view on the situation as a
whole, irrespective of specific ideas towards possible workarounds ...

Independent of the above a couple of purely cosmetic comments:

> @@ -98,3 +99,28 @@ int pci_conf_write_intercept(unsigned int seg, unsigned 
> int bdf,
>  
>  return rc;
>  }
> +
> +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end)
> +{
> +/*
> + * Check if BAR is not overlapping with any memory region defined
> + * in the memory map.
> + */
> +if ( is_memory_hole(start, end) )
> +return true;
> +
> +/*
> + * Also allow BARs placed on EfiMemoryMappedIO regions in order to deal
> + * with EFI firmware using those regions to place the BARs of devices 
> that
> + * can be used during runtime.  But print a warning when doing so.
> + */
> +if ( !efi_all_runtime_mmio(mfn_to_maddr(start),
> +   mfn_to_maddr(mfn_add(end, 1))) )
> +return false;
> +
> +printk(XENLOG_WARNING
> +   "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved 
> region\n",
> +   >sbdf, mfn_x(start), mfn_x(end));

Perhaps re-word the message now that the check is a different one?

> --- a/xen/common/efi/runtime.c
> +++ b/xen/common/efi/runtime.c
> @@ -78,6 +78,30 @@ bool efi_enabled(unsigned int feature)
>  return test_bit(feature, _flags);
>  }
>  
> +/*
> + * This function checks if the entire range [start,end) is contained inside 
> of
> + * a single EfiMemoryMappedIO descriptor with the runtime attribute set.
> + */
> +bool efi_all_runtime_mmio(uint64_t start, uint64_t end)
> +{
> +unsigned int i;
> +
> +for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size )
> +{
> +EFI_MEMORY_DESCRIPTOR *desc = efi_memmap + i;

const?

> +uint64_t len = desc->NumberOfPages << EFI_PAGE_SHIFT;
> +
> +if ( desc->Type != EfiMemoryMappedIO ||
> + !(desc->Attribute & EFI_MEMORY_RUNTIME) )
> +continue;
> +
> +if ( start >= desc->PhysicalStart && end <= desc->PhysicalStart + 
> len )
> +return true;
> +}
> +
> +return false;
> +}
> +
>  #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */

Perhaps put the function inside this #ifdef?

Jan



Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Marek Marczykowski-Górecki
On Wed, Oct 05, 2022 at 05:04:29PM +0200, Juergen Gross wrote:
> On 05.10.22 15:51, Marek Marczykowski-Górecki wrote:
> > On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote:
> > > On 05.10.22 15:25, Marek Marczykowski-Górecki wrote:
> > > > On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:
> > > > > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > When booting Xen with Linux dom0 nested under KVM,
> > > > > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio 
> > > > > > devices
> > > > > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
> > > > > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled.
> > > > > > 
> > > > > > This is probably uncommon corner case, but one that has bitten me 
> > > > > > in my
> > > > > > CI setup... I think Xen should set smarter
> > > > > > virtio_require_restricted_mem_acc(), that enforces it only for 
> > > > > > devices
> > > > > > really provided by another Xen VM (not by the "outer host"), but 
> > > > > > I'm not
> > > > > > sure how that could be done. Any ideas?
> > > > > > 
> > > > > 
> > > > > It should be possible to add a boot parameter for that purpose. Using 
> > > > > it
> > > > > would open a security hole, though (basically like all PCI 
> > > > > passthrough to
> > > > > PV guests).
> > > > 
> > > > What about excluding just dom0? At least currently, there is no way for
> > > > dom0 to see virtio devices provided by another Xen domU.
> > > 
> > > Even not via hotplug?
> > 
> > That's why I said "currently", IIUC hotplug of virtio devices under Xen
> > doesn't work yet, no?
> > With hotplug working, it would need to be a proper detection where the
> > backend lives, and probably with some extra considerations re Xen on
> > Xen (based on below, pv-shim could use grants).
> 
> As stated before, this isn't a problem specific to virtio devices. The same
> applies to Xen PV devices.

Why is that an issue for Xen PV devices? They always use grants, so no need
for exception. But more relevant here, there is no protocol for L0
hypervisor (that would need to be Xen) to provide PV device to nested L1
guest (besides pv-shim case, which is already handled), so L1 guest
cannot confuse PV device provided by L0 and L1.

> > For me specifically, a command line option would work (because I don't
> > use Xen-based virtio devices when nested under KVM, or anywhere at all,
> > at least not yet), but I can see future cases where you have virtio
> > devices from both L0 and L1 in the same guest, and then it wouldn't be
> > that simple.
> 
> Lets think of a general solution covering all PV devices (Xen and virtio).

In fact, I wonder what's the security benefit of
CONFIG_XEN_VIRTIO_FORCE_GRANT. If the backend lives in dom0 (or
stubdomain), it can access whole guest memory anyway, whether frontend
likes it or not. But if the backend is elsewhere (or guest is protected
with AMD SEV-SNP, XSM or similar), then the backend won't be able to access
memory outside of what frontend shares explicitly. So, in the non-dom0 case,
backend trying to provide non-grant-based virtio device will simply not
function (because of inability to access guest's memory), instead of
gaining unintended access. Am I missing some implicit memory sharing
here?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Juergen Gross

On 05.10.22 15:51, Marek Marczykowski-Górecki wrote:

On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote:

On 05.10.22 15:25, Marek Marczykowski-Górecki wrote:

On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:

On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:

Hi,

When booting Xen with Linux dom0 nested under KVM,
CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
required for virtio even if just CONFIG_XEN_VIRTIO is enabled.

This is probably uncommon corner case, but one that has bitten me in my
CI setup... I think Xen should set smarter
virtio_require_restricted_mem_acc(), that enforces it only for devices
really provided by another Xen VM (not by the "outer host"), but I'm not
sure how that could be done. Any ideas?



It should be possible to add a boot parameter for that purpose. Using it
would open a security hole, though (basically like all PCI passthrough to
PV guests).


What about excluding just dom0? At least currently, there is no way for
dom0 to see virtio devices provided by another Xen domU.


Even not via hotplug?


That's why I said "currently", IIUC hotplug of virtio devices under Xen
doesn't work yet, no?
With hotplug working, it would need to be a proper detection where the
backend lives, and probably with some extra considerations re Xen on
Xen (based on below, pv-shim could use grants).


As stated before, this isn't a problem specific to virtio devices. The same
applies to Xen PV devices.



For me specifically, a command line option would work (because I don't
use Xen-based virtio devices when nested under KVM, or anywhere at all,
at least not yet), but I can see future cases where you have virtio
devices from both L0 and L1 in the same guest, and then it wouldn't be
that simple.


Lets think of a general solution covering all PV devices (Xen and virtio).




Something like this:
---8<---
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9b1a58dda935..6ac32b0b3720 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
   static void __init xen_pv_init_platform(void)
   {
  /* PV guests can't operate virtio devices without grants. */
-   if (IS_ENABLED(CONFIG_XEN_VIRTIO))
+   if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain())
  virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc);
  populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
---8<---

This BTW raises also a question what will happen on Xen nested inside
Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0
wouldn't work on L0, no? Or maybe this is solved already for pv-shim
case?


This is a similar problem as with normal Xen PV devices.

You will need either a simple grant passthrough like with pv-shim (enabling
such devices for one guest in L1 only), or you need a grant multiplexer in
L1 Xen in case you want to be able to have multiple guests in L1 being able
to
use L0 PV devices.


This will be tricky, at least with the current frontend drivers.
Frontend kernel is in charge of assigning grant refs, _and_
communicating them to the backend. Such multiplexer would need to
intercept one or the other (either translate, or ensure they are
allocated from distinct ranges to begin with). I don't see how that
could be done with the current domU kernels. That might be better with
your idea of multiple grant v3 trees, where the hypervisor might dictate
grant ranges.


Yes, this is another advantage of the V3 approach I haven't thought of
before.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions

2022-10-05 Thread Roger Pau Monné
On Wed, Oct 05, 2022 at 10:53:38AM +0200, Jan Beulich wrote:
> On 05.10.2022 10:37, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote:
> >> On 04.10.2022 17:36, Roger Pau Monne wrote:
> >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
> >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by
> >>> EFI firmware.
> >>>
> >>> The current parsing of the EFI memory map is translating
> >>> EfiMemoryMappedIO to E820_RESERVED on x86.  This causes issues on some
> >>> boxes as the firmware is relying on using those regions to position
> >>> the BARs of devices being used (possibly during runtime) for the
> >>> firmware.
> >>>
> >>> Xen will disable memory decoding on any device that has BARs
> >>> positioned over any regions on the e820 memory map, hence the firmware
> >>> will malfunction after Xen turning off memory decoding for the
> >>> device(s) that have BARs mapped in EfiMemoryMappedIO regions.
> >>>
> >>> The system under which this was observed has:
> >>>
> >>> EFI memory map:
> >>> [...]
> >>>  0fd00-0fe7f type=11 attr=8000100d
> >>> [...]
> >>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> >>>
> >>> The device behind this BAR is:
> >>>
> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> >>> Controller (rev 09)
> >>>   Subsystem: Super Micro Computer Inc Device 091c
> >>>   Flags: fast devsel
> >>>   Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> >>>
> >>> For the record, the symptom observed in that machine was a hard freeze
> >>> when attempting to set an EFI variable (XEN_EFI_set_variable).
> >>>
> >>> Fix by allowing BARs of PCI devices to be positioned over reserved
> >>> memory regions, but print a warning message about such overlap.
> >>
> >> Somewhat hesitantly I might ack this change, but I'd like to give
> >> others (Andrew in particular) some time to voice their views. As
> >> said during the earlier discussion - I think we're relaxing things
> >> too much by going this route.
> > 
> > Another option would be to explicitly check in efi_memmap for
> > EfiMemoryMappedIO regions and BAR overlap and only allow those.  That
> > would be more cumbersome code wise AFAICT.
> 
> Indeed there's a question of balancing well here, between two outcomes
> neither of which is really desirable.

Hm, I have the following diff attached below which is more limited,
and not so bad I think.  Initially I wanted to introduce an
efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE
which is quite intrusive.

Let me know if you think the proposal below is better and I will
formally send a patch with it.

Thanks, Roger.
---
diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h
index f4a58c8acf..c8e1a9ecdb 100644
--- a/xen/arch/x86/include/asm/pci.h
+++ b/xen/arch/x86/include/asm/pci.h
@@ -57,14 +57,4 @@ static always_inline bool is_pci_passthrough_enabled(void)
 
 void arch_pci_init_pdev(struct pci_dev *pdev);
 
-static inline bool pci_check_bar(const struct pci_dev *pdev,
- mfn_t start, mfn_t end)
-{
-/*
- * Check if BAR is not overlapping with any memory region defined
- * in the memory map.
- */
-return is_memory_hole(start, end);
-}
-
 #endif /* __X86_PCI_H__ */
diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c
index 97b792e578..c3737e226d 100644
--- a/xen/arch/x86/pci.c
+++ b/xen/arch/x86/pci.c
@@ -4,6 +4,7 @@
  * Architecture-dependent PCI access functions.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -98,3 +99,28 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int 
bdf,
 
 return rc;
 }
+
+bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end)
+{
+/*
+ * Check if BAR is not overlapping with any memory region defined
+ * in the memory map.
+ */
+if ( is_memory_hole(start, end) )
+return true;
+
+/*
+ * Also allow BARs placed on EfiMemoryMappedIO regions in order to deal
+ * with EFI firmware using those regions to place the BARs of devices that
+ * can be used during runtime.  But print a warning when doing so.
+ */
+if ( !efi_all_runtime_mmio(mfn_to_maddr(start),
+   mfn_to_maddr(mfn_add(end, 1))) )
+return false;
+
+printk(XENLOG_WARNING
+   "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved 
region\n",
+   >sbdf, mfn_x(start), mfn_x(end));
+
+return true;
+}
diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c
index 13b0975866..b69c710ce3 100644
--- a/xen/common/efi/runtime.c
+++ b/xen/common/efi/runtime.c
@@ -78,6 +78,30 @@ bool efi_enabled(unsigned int feature)
 return test_bit(feature, _flags);
 }
 
+/*
+ * This function checks if the entire range [start,end) is contained inside of
+ * a single EfiMemoryMappedIO descriptor with the runtime attribute set.
+ */
+bool 

Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Roger Pau Monné
On Wed, Oct 05, 2022 at 01:34:06PM +, Andrew Cooper wrote:
> On 05/10/2022 10:55, Roger Pau Monné wrote:
> > On Tue, Oct 04, 2022 at 05:08:09PM +0100, Andrew Cooper wrote:
> >> A future change will want a cpuid-like identifier which doesn't have a 
> >> mapping
> >> to a feature bit.
> > Why we prefer this logic over just giving such pseudo features a
> > synthetic bit or akin?
> 
> Synthetic bits are (intentionally) not available to cmdline parsing.  We
> need something that is available for parsing.

I think Jan expressed my view better, in that it would be nicer to
just have the MSR_ARCH_CAPS bits in the featureset, and listed in
cpufeatureset.h like we handle CPUID features.

Maybe we want to go your proposed route now if that's easier however.

Thanks, Roger.



Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Marek Marczykowski-Górecki
On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote:
> On 05.10.22 15:25, Marek Marczykowski-Górecki wrote:
> > On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:
> > > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:
> > > > Hi,
> > > > 
> > > > When booting Xen with Linux dom0 nested under KVM,
> > > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
> > > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
> > > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled.
> > > > 
> > > > This is probably uncommon corner case, but one that has bitten me in my
> > > > CI setup... I think Xen should set smarter
> > > > virtio_require_restricted_mem_acc(), that enforces it only for devices
> > > > really provided by another Xen VM (not by the "outer host"), but I'm not
> > > > sure how that could be done. Any ideas?
> > > > 
> > > 
> > > It should be possible to add a boot parameter for that purpose. Using it
> > > would open a security hole, though (basically like all PCI passthrough to
> > > PV guests).
> > 
> > What about excluding just dom0? At least currently, there is no way for
> > dom0 to see virtio devices provided by another Xen domU.
> 
> Even not via hotplug?

That's why I said "currently", IIUC hotplug of virtio devices under Xen
doesn't work yet, no?
With hotplug working, it would need to be a proper detection where the
backend lives, and probably with some extra considerations re Xen on
Xen (based on below, pv-shim could use grants).

For me specifically, a command line option would work (because I don't
use Xen-based virtio devices when nested under KVM, or anywhere at all,
at least not yet), but I can see future cases where you have virtio
devices from both L0 and L1 in the same guest, and then it wouldn't be
that simple.

> > Something like this:
> > ---8<---
> > diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
> > index 9b1a58dda935..6ac32b0b3720 100644
> > --- a/arch/x86/xen/enlighten_pv.c
> > +++ b/arch/x86/xen/enlighten_pv.c
> > @@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, 
> > shadow_tls_desc);
> >   static void __init xen_pv_init_platform(void)
> >   {
> >  /* PV guests can't operate virtio devices without grants. */
> > -   if (IS_ENABLED(CONFIG_XEN_VIRTIO))
> > +   if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain())
> >  virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc);
> >  populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
> > ---8<---
> > 
> > This BTW raises also a question what will happen on Xen nested inside
> > Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0
> > wouldn't work on L0, no? Or maybe this is solved already for pv-shim
> > case?
> 
> This is a similar problem as with normal Xen PV devices.
> 
> You will need either a simple grant passthrough like with pv-shim (enabling
> such devices for one guest in L1 only), or you need a grant multiplexer in
> L1 Xen in case you want to be able to have multiple guests in L1 being able
> to
> use L0 PV devices.

This will be tricky, at least with the current frontend drivers.
Frontend kernel is in charge of assigning grant refs, _and_
communicating them to the backend. Such multiplexer would need to
intercept one or the other (either translate, or ensure they are
allocated from distinct ranges to begin with). I don't see how that
could be done with the current domU kernels. That might be better with
your idea of multiple grant v3 trees, where the hypervisor might dictate
grant ranges.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Andrew Cooper
On 05/10/2022 14:11, Jan Beulich wrote:
> On 05.10.2022 14:58, Andrew Cooper wrote:
>> On 05/10/2022 07:59, Jan Beulich wrote:
>>> On 04.10.2022 18:08, Andrew Cooper wrote:
 A future change will want a cpuid-like identifier which doesn't have a 
 mapping
 to a feature bit.

  * Pass the feature name into the parse callback.
  * Exclude a feature value of ~0u from falling into the general set/clear 
 bit
paths.
  * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature
names.
>>> Hmm, I was envisioning this to be fitted into the existing model in a
>>> less adhoc way: (parts of) MSRs holding feature bits aren't very different
>>> from individual (pairs of) registers of CPUID output (in the case of
>>> ARCH_CAPS there would be a perhaps just abstract mask limiting things to
>>> the subset of bits which actually act as feature indicators). Hence I'd
>>> have expected them to gain proper entries in the public interface
>>> (cpufeatureset.h) and then be represented / processed the same way in
>>> featuresets and policies. All that would be left out at this point would
>>> be the exposing of the bit to guests (in patch 2, assuming the split into
>>> two patches is then actually still warranted), integration into
>>> guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for
>>> example, could easily learn of such right away).
>>>
>>> However, since I'm pretty sure you've considered such an approach, I guess
>>> I might be overlooking some caveat?
>> I have on multiple occasions considered putting MSR_ARCH_CAPS into the
>> existing X86_FEATURE_* infrastructure.  In the future, it's likely the
>> right course of action to take.
>>
>> However, doing so now will break speculation safety for guests in some
> Considering further text - s/speculation/migration/, I guess?

More "speculation safety on migrate".

Except its not just on migrate.  It can go wrong for suspend/resume on
the same host across a reboot which changes the microcode version.

>> cases.  The featureset API intended to be safe to treat as a regular
>> bitmap, and this is how it is used in practice.
>>
>> Honestly, I didn't imagine that we'd get bits like RSBA and RRSBA that
>> need to be treated with opposite polarity to be levelled safely.
>>
>> Even if we do end up putting MSR_ARCH_CAPS in X86_FEATURE_*, we still
>> need to remove the featureset api (replaced by the cpu policy
>> infrastructure and libx86) to retain migration safety.
> Well, I didn't mean to suggest to put all of the feature-like bits there
> right away. Just the one bit we're after would do for now. Afaict that
> wouldn't affect migration safety, while it would allow doing things in
> Xen in a more "natural" way.

That requires reworking how the MSR policies get set up, patches for
which have been stagnent on xen-devel, as well as a reasonable amount of
plumbing because the featureset<->policy conversions which only have
CPUID data right now.

If I go down this route, realise it is going to have to go out in an XSA
so will be backported, and that you're also committing to taking the VMX
MSRs in due course.  (which is all on my plan, but I don't want to start
the work now and have an argument down the line.)

~Andrew


Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Juergen Gross

On 05.10.22 15:25, Marek Marczykowski-Górecki wrote:

On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:

On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:

Hi,

When booting Xen with Linux dom0 nested under KVM,
CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
required for virtio even if just CONFIG_XEN_VIRTIO is enabled.

This is probably uncommon corner case, but one that has bitten me in my
CI setup... I think Xen should set smarter
virtio_require_restricted_mem_acc(), that enforces it only for devices
really provided by another Xen VM (not by the "outer host"), but I'm not
sure how that could be done. Any ideas?



It should be possible to add a boot parameter for that purpose. Using it
would open a security hole, though (basically like all PCI passthrough to
PV guests).


What about excluding just dom0? At least currently, there is no way for
dom0 to see virtio devices provided by another Xen domU.


Even not via hotplug?



Something like this:
---8<---
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9b1a58dda935..6ac32b0b3720 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
  static void __init xen_pv_init_platform(void)
  {
 /* PV guests can't operate virtio devices without grants. */
-   if (IS_ENABLED(CONFIG_XEN_VIRTIO))
+   if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain())
 virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc);
  
 populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));

---8<---

This BTW raises also a question what will happen on Xen nested inside
Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0
wouldn't work on L0, no? Or maybe this is solved already for pv-shim
case?


This is a similar problem as with normal Xen PV devices.

You will need either a simple grant passthrough like with pv-shim (enabling
such devices for one guest in L1 only), or you need a grant multiplexer in L1 
Xen in case you want to be able to have multiple guests in L1 being able to

use L0 PV devices.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Andrew Cooper
On 05/10/2022 10:55, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 05:08:09PM +0100, Andrew Cooper wrote:
>> A future change will want a cpuid-like identifier which doesn't have a 
>> mapping
>> to a feature bit.
> Why we prefer this logic over just giving such pseudo features a
> synthetic bit or akin?

Synthetic bits are (intentionally) not available to cmdline parsing.  We
need something that is available for parsing.

> Could we have a bit more of a description about what would be
> considered a pseudo feature?

I don't really have anything further to add beyond the comment in
gen-cpuid.py

It's a misc collection of things requiring special handling.

~Andrew


Re: [PATCH 2/2] x86: Activate Data Operand Invariant Timing Mode by default

2022-10-05 Thread Andrew Cooper
On 05/10/2022 13:09, Jan Beulich wrote:
> On 05.10.2022 12:20, Roger Pau Monné wrote:
>> On Tue, Oct 04, 2022 at 05:08:10PM +0100, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/cpu/common.c
>>> +++ b/xen/arch/x86/cpu/common.c
>>> @@ -209,6 +209,34 @@ void ctxt_switch_levelling(const struct vcpu *next)
>>> alternative_vcall(ctxt_switch_masking, next);
>>>  }
>>>  
>>> +bool __ro_after_init opt_doitm = true;
>>> +
>>> +static void doitm_init(void)
>>> +{
>>> +uint64_t val;
>>> +
>>> +if ( !opt_doitm || !cpu_has_arch_caps )
>>> +return;
>>> +
>>> +rdmsrl(MSR_ARCH_CAPABILITIES, val);
>>> +if ( !(val & ARCH_CAPS_DOITM) )
>>> +return;
>>> +
>>> +/*
>>> + * We are currently unable to enumerate MSR_ARCH_CAPS to guest.  As a
>>> + * consequence, guest kernels will believe they're safe even when they 
>>> are
>>> + * not.
>>> + *
>>> + * Until we can enumerate DOITM properly for guests, set it 
>>> unilaterally.
>>> + * This prevents otherwise-correct crypto from becoming vulnerable to
>>> + * timing sidechannels.
>>> + */
>>> +
>>> +rdmsrl(MSR_UARCH_MISC_CTRL, val);
>>> +val |= UARCH_CTRL_DOITM;
>>> +wrmsrl(MSR_UARCH_MISC_CTRL, val);
>> Is it possible for the firmware to have enabled DOITM and Xen needing to
>> clear it if !opt_doitm?
> I think a firmware setup option is quite plausible to expect, such that
> safety can also be achieved underneath an unaware OS. Note how in my
> earlier patch I did specifically set the bit both ways, for this very
> reason.

Firmware is not likely to set it, but we should cope with the case when
we're somewhere along a kexec chain.

I'll adjust.

~Andrew


Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Marek Marczykowski-Górecki
On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote:
> On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > When booting Xen with Linux dom0 nested under KVM,
> > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
> > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
> > required for virtio even if just CONFIG_XEN_VIRTIO is enabled.
> > 
> > This is probably uncommon corner case, but one that has bitten me in my
> > CI setup... I think Xen should set smarter
> > virtio_require_restricted_mem_acc(), that enforces it only for devices
> > really provided by another Xen VM (not by the "outer host"), but I'm not
> > sure how that could be done. Any ideas?
> > 
> 
> It should be possible to add a boot parameter for that purpose. Using it
> would open a security hole, though (basically like all PCI passthrough to
> PV guests).

What about excluding just dom0? At least currently, there is no way for
dom0 to see virtio devices provided by another Xen domU.

Something like this:
---8<---
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9b1a58dda935..6ac32b0b3720 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
 static void __init xen_pv_init_platform(void)
 {
/* PV guests can't operate virtio devices without grants. */
-   if (IS_ENABLED(CONFIG_XEN_VIRTIO))
+   if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain())
virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc);
 
populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP));
---8<---

This BTW raises also a question what will happen on Xen nested inside
Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0
wouldn't work on L0, no? Or maybe this is solved already for pv-shim
case?


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


[xen-unstable-smoke test] 173428: tolerable all pass - PUSHED

2022-10-05 Thread osstest service owner
flight 173428 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173428/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  66a5633aa038f4abb4455463755974febac69034
baseline version:
 xen  3161231abcb461314b205337fbd0530c7ead1696

Last test of basis   173410  2022-10-04 01:01:52 Z1 days
Testing same since   173428  2022-10-05 09:00:30 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3161231abc..66a5633aa0  66a5633aa038f4abb4455463755974febac69034 -> smoke



Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Jan Beulich
On 05.10.2022 14:58, Andrew Cooper wrote:
> On 05/10/2022 07:59, Jan Beulich wrote:
>> On 04.10.2022 18:08, Andrew Cooper wrote:
>>> A future change will want a cpuid-like identifier which doesn't have a 
>>> mapping
>>> to a feature bit.
>>>
>>>  * Pass the feature name into the parse callback.
>>>  * Exclude a feature value of ~0u from falling into the general set/clear 
>>> bit
>>>paths.
>>>  * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature
>>>names.
>> Hmm, I was envisioning this to be fitted into the existing model in a
>> less adhoc way: (parts of) MSRs holding feature bits aren't very different
>> from individual (pairs of) registers of CPUID output (in the case of
>> ARCH_CAPS there would be a perhaps just abstract mask limiting things to
>> the subset of bits which actually act as feature indicators). Hence I'd
>> have expected them to gain proper entries in the public interface
>> (cpufeatureset.h) and then be represented / processed the same way in
>> featuresets and policies. All that would be left out at this point would
>> be the exposing of the bit to guests (in patch 2, assuming the split into
>> two patches is then actually still warranted), integration into
>> guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for
>> example, could easily learn of such right away).
>>
>> However, since I'm pretty sure you've considered such an approach, I guess
>> I might be overlooking some caveat?
> 
> I have on multiple occasions considered putting MSR_ARCH_CAPS into the
> existing X86_FEATURE_* infrastructure.  In the future, it's likely the
> right course of action to take.
> 
> However, doing so now will break speculation safety for guests in some

Considering further text - s/speculation/migration/, I guess?

> cases.  The featureset API intended to be safe to treat as a regular
> bitmap, and this is how it is used in practice.
> 
> Honestly, I didn't imagine that we'd get bits like RSBA and RRSBA that
> need to be treated with opposite polarity to be levelled safely.
> 
> Even if we do end up putting MSR_ARCH_CAPS in X86_FEATURE_*, we still
> need to remove the featureset api (replaced by the cpu policy
> infrastructure and libx86) to retain migration safety.

Well, I didn't mean to suggest to put all of the feature-like bits there
right away. Just the one bit we're after would do for now. Afaict that
wouldn't affect migration safety, while it would allow doing things in
Xen in a more "natural" way.

Jan



Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Juergen Gross

On 05.10.22 14:41, Marek Marczykowski-Górecki wrote:

Hi,

When booting Xen with Linux dom0 nested under KVM,
CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
required for virtio even if just CONFIG_XEN_VIRTIO is enabled.

This is probably uncommon corner case, but one that has bitten me in my
CI setup... I think Xen should set smarter
virtio_require_restricted_mem_acc(), that enforces it only for devices
really provided by another Xen VM (not by the "outer host"), but I'm not
sure how that could be done. Any ideas?



It should be possible to add a boot parameter for that purpose. Using it
would open a security hole, though (basically like all PCI passthrough to
PV guests).


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Andrew Cooper
On 05/10/2022 07:59, Jan Beulich wrote:
> On 04.10.2022 18:08, Andrew Cooper wrote:
>> A future change will want a cpuid-like identifier which doesn't have a 
>> mapping
>> to a feature bit.
>>
>>  * Pass the feature name into the parse callback.
>>  * Exclude a feature value of ~0u from falling into the general set/clear bit
>>paths.
>>  * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature
>>names.
> Hmm, I was envisioning this to be fitted into the existing model in a
> less adhoc way: (parts of) MSRs holding feature bits aren't very different
> from individual (pairs of) registers of CPUID output (in the case of
> ARCH_CAPS there would be a perhaps just abstract mask limiting things to
> the subset of bits which actually act as feature indicators). Hence I'd
> have expected them to gain proper entries in the public interface
> (cpufeatureset.h) and then be represented / processed the same way in
> featuresets and policies. All that would be left out at this point would
> be the exposing of the bit to guests (in patch 2, assuming the split into
> two patches is then actually still warranted), integration into
> guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for
> example, could easily learn of such right away).
>
> However, since I'm pretty sure you've considered such an approach, I guess
> I might be overlooking some caveat?

I have on multiple occasions considered putting MSR_ARCH_CAPS into the
existing X86_FEATURE_* infrastructure.  In the future, it's likely the
right course of action to take.

However, doing so now will break speculation safety for guests in some
cases.  The featureset API intended to be safe to treat as a regular
bitmap, and this is how it is used in practice.

Honestly, I didn't imagine that we'd get bits like RSBA and RRSBA that
need to be treated with opposite polarity to be levelled safely.

Even if we do end up putting MSR_ARCH_CAPS in X86_FEATURE_*, we still
need to remove the featureset api (replaced by the cpu policy
infrastructure and libx86) to retain migration safety.

>
>> --- a/xen/tools/gen-cpuid.py
>> +++ b/xen/tools/gen-cpuid.py
>> @@ -297,6 +297,19 @@ def crunch_numbers(state):
>>  RTM: [TSXLDTRK],
>>  }
>>  
>> +#
>> +# Pseudo feature names.  These don't map to a feature bit, but are
>> +# inserted into the values dictionary so they can be parsed and handled
>> +# specially
>> +#
>> +pseduo_names = (
>> +)
>> +
>> +for n in pseduo_names:
>> +if n in state.values:
>> +raise Fail("Pseduo feature name %s aliases real feature" % (n, 
>> ))
>> +state.values[n] = 0x
> Throughout this hunk: s/pseduo/pseudo/g.

Oops, yes.  Will fix.

~Andrew


CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt

2022-10-05 Thread Marek Marczykowski-Górecki
Hi,

When booting Xen with Linux dom0 nested under KVM,
CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices
provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are
required for virtio even if just CONFIG_XEN_VIRTIO is enabled.

This is probably uncommon corner case, but one that has bitten me in my
CI setup... I think Xen should set smarter
virtio_require_restricted_mem_acc(), that enforces it only for devices
really provided by another Xen VM (not by the "outer host"), but I'm not
sure how that could be done. Any ideas? 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH 2/2] x86: Activate Data Operand Invariant Timing Mode by default

2022-10-05 Thread Jan Beulich
On 05.10.2022 12:20, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 05:08:10PM +0100, Andrew Cooper wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -209,6 +209,34 @@ void ctxt_switch_levelling(const struct vcpu *next)
>>  alternative_vcall(ctxt_switch_masking, next);
>>  }
>>  
>> +bool __ro_after_init opt_doitm = true;
>> +
>> +static void doitm_init(void)
>> +{
>> +uint64_t val;
>> +
>> +if ( !opt_doitm || !cpu_has_arch_caps )
>> +return;
>> +
>> +rdmsrl(MSR_ARCH_CAPABILITIES, val);
>> +if ( !(val & ARCH_CAPS_DOITM) )
>> +return;
>> +
>> +/*
>> + * We are currently unable to enumerate MSR_ARCH_CAPS to guest.  As a
>> + * consequence, guest kernels will believe they're safe even when they 
>> are
>> + * not.
>> + *
>> + * Until we can enumerate DOITM properly for guests, set it 
>> unilaterally.
>> + * This prevents otherwise-correct crypto from becoming vulnerable to
>> + * timing sidechannels.
>> + */
>> +
>> +rdmsrl(MSR_UARCH_MISC_CTRL, val);
>> +val |= UARCH_CTRL_DOITM;
>> +wrmsrl(MSR_UARCH_MISC_CTRL, val);
> 
> Is it possible for the firmware to have enabled DOITM and Xen needing to
> clear it if !opt_doitm?

I think a firmware setup option is quite plausible to expect, such that
safety can also be achieved underneath an unaware OS. Note how in my
earlier patch I did specifically set the bit both ways, for this very
reason.

Jan



Re: [PATCH v3 2/4] xen/pv: fix vendor checks for pmu emulation

2022-10-05 Thread Jan Beulich
On 05.10.2022 13:03, Juergen Gross wrote:
> The CPU vendor checks for pmu emulation are rather limited today, as
> the assumption seems to be that only Intel and AMD are existing and/or
> supported vendors.
> 
> Fix that by handling Centaur and Zhaoxin CPUs the same way as Intel,
> and Hygon the same way as AMD.
> 
> While at it fix the return type of is_intel_pmu_msr().
> 
> Suggested-by: Jan Beulich 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 





Re: [PATCH v3 1/4] xen/pv: add fault recovery control to pmu msr accesses

2022-10-05 Thread Jan Beulich
On 05.10.2022 13:02, Juergen Gross wrote:
> Today pmu_msr_read() and pmu_msr_write() fall back to the safe variants
> of read/write MSR in case the MSR access isn't emulated via Xen. Allow
> the caller to select that faults should not be recovered from by passing
> NULL for the error pointer.
> 
> Restructure the code to make it more readable.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 





Re: [PATCH][4.17] EFI: don't convert memory marked for runtime use to ordinary RAM

2022-10-05 Thread Jan Beulich
On 05.10.2022 12:44, Julien Grall wrote:
> On 04/10/2022 16:58, Jan Beulich wrote:
>> On 30.09.2022 14:51, Bertrand Marquis wrote:
 On 30 Sep 2022, at 09:50, Jan Beulich  wrote:

 efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
 higher priority than the type of the range. To avoid accessing memory at
 runtime which was re-used for other purposes, make
 efi_arch_process_memory_map() follow suit. While on x86 in theory the
 same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
 E820_ACPI memory there and hence that type's handling can be left alone.

 Fixes: bf6501a62e80 ("x86-64: EFI boot code")
 Fixes: facac0af87ef ("x86-64: EFI runtime code")
 Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
 Signed-off-by: Jan Beulich 
>>>
>>> Reviewed-by: Bertrand Marquis  #arm
>>
>> Thanks. However ...
>>
 ---
 Partly RFC for Arm, for two reasons:

 On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
 For one like on x86 such ranges would likely better be retained, as Dom0
 may (will?) have a need to look at tables placed there. Plus converting
 such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
 me as well. I'd be inclined to make the latter adjustment right here
 (while the other change probably would better be separate, if there
 aren't actually reasons for the present behavior).
>>
>> ... any views on this WB aspect at least (also Stefano or Julien)? Would be
>> good to know before I send v2.
> 
> I don't quite understand what you are questioning here. Looking at the 
> code, EfiACPIReclaimMemory will not be converted to RAM but added in a 
> separate array.
> 
> Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 
> (see acpi_create_efi_mmap_table()).
> 
> So to me the code looks correct.

Oh, I've indeed not paid enough attention to the first argument passed
to meminfo_add_bank(). I'm sorry for the extra noise. However, the
question I wanted to have addressed before sending out v3 was that
regarding the present using of memory when EFI_MEMORY_WB is not set.
Is that correct for the EfiACPIReclaimMemory case, i.e. is the
consumer (Dom0) aware that there might be a restriction? And would
this memory then be guaranteed to never be freed into the general pool
of RAM pages?

Jan



[PATCH v3 4/4] xen/pv: support selecting safe/unsafe msr accesses

2022-10-05 Thread Juergen Gross
Instead of always doing the safe variants for reading and writing MSRs
in Xen PV guests, make the behavior controllable via Kconfig option
and a boot parameter.

The default will be the current behavior, which is to always use the
safe variant.

Signed-off-by: Juergen Gross 
---
 .../admin-guide/kernel-parameters.txt |  6 +
 arch/x86/xen/Kconfig  |  9 +++
 arch/x86/xen/enlighten_pv.c   | 24 +++
 3 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 426fa892d311..1bda9cf18fae 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -6836,6 +6836,12 @@
Crash from Xen panic notifier, without executing late
panic() code such as dumping handler.
 
+   xen_msr_safe=   [X86,XEN]
+   Format: 
+   Select whether to always use non-faulting (safe) MSR
+   access functions when running as Xen PV guest. The
+   default value is controlled by CONFIG_XEN_PV_MSR_SAFE.
+
xen_nopvspin[X86,XEN]
Disables the qspinlock slowpath using Xen PV 
optimizations.
This parameter is obsoleted by "nopvspin" parameter, 
which
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig
index 85246dd9faa1..9b1ec5d8c99c 100644
--- a/arch/x86/xen/Kconfig
+++ b/arch/x86/xen/Kconfig
@@ -92,3 +92,12 @@ config XEN_DOM0
select X86_X2APIC if XEN_PVH && X86_64
help
  Support running as a Xen Dom0 guest.
+
+config XEN_PV_MSR_SAFE
+   bool "Always use safe MSR accesses in PV guests"
+   default y
+   depends on XEN_PV
+   help
+ Use safe (not faulting) MSR access functions even if the MSR access
+ should not fault anyway.
+ The default can be changed by using the "xen_msr_safe" boot parameter.
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index d5b0844a1b7c..daae454191f2 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -108,6 +108,16 @@ struct tls_descs {
  */
 static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc);
 
+static __read_mostly bool xen_msr_safe = IS_ENABLED(CONFIG_XEN_PV_MSR_SAFE);
+
+static int __init parse_xen_msr_safe(char *str)
+{
+   if (str)
+   return strtobool(str, _msr_safe);
+   return -EINVAL;
+}
+early_param("xen_msr_safe", parse_xen_msr_safe);
+
 static void __init xen_pv_init_platform(void)
 {
/* PV guests can't operate virtio devices without grants. */
@@ -1011,22 +1021,16 @@ static int xen_write_msr_safe(unsigned int msr, 
unsigned int low,
 
 static u64 xen_read_msr(unsigned int msr)
 {
-   /*
-* This will silently swallow a #GP from RDMSR.  It may be worth
-* changing that.
-*/
int err;
 
-   return xen_read_msr_safe(msr, );
+   return xen_do_read_msr(msr, xen_msr_safe ?  : NULL);
 }
 
 static void xen_write_msr(unsigned int msr, unsigned low, unsigned high)
 {
-   /*
-* This will silently swallow a #GP from WRMSR.  It may be worth
-* changing that.
-*/
-   xen_write_msr_safe(msr, low, high);
+   int err;
+
+   xen_do_write_msr(msr, low, high, xen_msr_safe ?  : NULL);
 }
 
 /* This is called once we have the cpu_possible_mask */
-- 
2.35.3




[PATCH v3 3/4] xen/pv: refactor msr access functions to support safe and unsafe accesses

2022-10-05 Thread Juergen Gross
Refactor and rename xen_read_msr_safe() and xen_write_msr_safe() to
support both cases of MSR accesses, safe ones and potentially GP-fault
generating ones.

This will prepare to no longer swallow GPs silently in xen_read_msr()
and xen_write_msr().

Signed-off-by: Juergen Gross 
Reviewed-by: Jan Beulich 
---
V2:
- init val in xen_do_read_msr() to 0 (Jan Beulich)
---
 arch/x86/xen/enlighten_pv.c | 75 +++--
 1 file changed, 56 insertions(+), 19 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 9b1a58dda935..d5b0844a1b7c 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -917,14 +917,18 @@ static void xen_write_cr4(unsigned long cr4)
native_write_cr4(cr4);
 }
 
-static u64 xen_read_msr_safe(unsigned int msr, int *err)
+static u64 xen_do_read_msr(unsigned int msr, int *err)
 {
-   u64 val;
+   u64 val = 0;/* Avoid uninitialized value for safe variant. */
 
if (pmu_msr_read(msr, , err))
return val;
 
-   val = native_read_msr_safe(msr, err);
+   if (err)
+   val = native_read_msr_safe(msr, err);
+   else
+   val = native_read_msr(msr);
+
switch (msr) {
case MSR_IA32_APICBASE:
val &= ~X2APIC_ENABLE;
@@ -933,23 +937,39 @@ static u64 xen_read_msr_safe(unsigned int msr, int *err)
return val;
 }
 
-static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high)
+static void set_seg(unsigned int which, unsigned int low, unsigned int high,
+   int *err)
 {
-   int ret;
-   unsigned int which;
-   u64 base;
+   u64 base = ((u64)high << 32) | low;
+
+   if (HYPERVISOR_set_segment_base(which, base) == 0)
+   return;
 
-   ret = 0;
+   if (err)
+   *err = -EIO;
+   else
+   WARN(1, "Xen set_segment_base(%u, %llx) failed\n", which, base);
+}
 
+/*
+ * Support write_msr_safe() and write_msr() semantics.
+ * With err == NULL write_msr() semantics are selected.
+ * Supplying an err pointer requires err to be pre-initialized with 0.
+ */
+static void xen_do_write_msr(unsigned int msr, unsigned int low,
+unsigned int high, int *err)
+{
switch (msr) {
-   case MSR_FS_BASE:   which = SEGBASE_FS; goto set;
-   case MSR_KERNEL_GS_BASE:which = SEGBASE_GS_USER; goto set;
-   case MSR_GS_BASE:   which = SEGBASE_GS_KERNEL; goto set;
-
-   set:
-   base = ((u64)high << 32) | low;
-   if (HYPERVISOR_set_segment_base(which, base) != 0)
-   ret = -EIO;
+   case MSR_FS_BASE:
+   set_seg(SEGBASE_FS, low, high, err);
+   break;
+
+   case MSR_KERNEL_GS_BASE:
+   set_seg(SEGBASE_GS_USER, low, high, err);
+   break;
+
+   case MSR_GS_BASE:
+   set_seg(SEGBASE_GS_KERNEL, low, high, err);
break;
 
case MSR_STAR:
@@ -965,11 +985,28 @@ static int xen_write_msr_safe(unsigned int msr, unsigned 
low, unsigned high)
break;
 
default:
-   if (!pmu_msr_write(msr, low, high, ))
-   ret = native_write_msr_safe(msr, low, high);
+   if (!pmu_msr_write(msr, low, high, err)) {
+   if (err)
+   *err = native_write_msr_safe(msr, low, high);
+   else
+   native_write_msr(msr, low, high);
+   }
}
+}
+
+static u64 xen_read_msr_safe(unsigned int msr, int *err)
+{
+   return xen_do_read_msr(msr, err);
+}
+
+static int xen_write_msr_safe(unsigned int msr, unsigned int low,
+ unsigned int high)
+{
+   int err = 0;
+
+   xen_do_write_msr(msr, low, high, );
 
-   return ret;
+   return err;
 }
 
 static u64 xen_read_msr(unsigned int msr)
-- 
2.35.3




[PATCH v3 2/4] xen/pv: fix vendor checks for pmu emulation

2022-10-05 Thread Juergen Gross
The CPU vendor checks for pmu emulation are rather limited today, as
the assumption seems to be that only Intel and AMD are existing and/or
supported vendors.

Fix that by handling Centaur and Zhaoxin CPUs the same way as Intel,
and Hygon the same way as AMD.

While at it fix the return type of is_intel_pmu_msr().

Suggested-by: Jan Beulich 
Signed-off-by: Juergen Gross 
---
V3:
- new patch
---
 arch/x86/xen/pmu.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 0f98cb1077e3..68aff1382872 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -131,7 +131,8 @@ static inline uint32_t get_fam15h_addr(u32 addr)
 
 static inline bool is_amd_pmu_msr(unsigned int msr)
 {
-   if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+   if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD &&
+   boot_cpu_data.x86_vendor != X86_VENDOR_HYGON)
return false;
 
if ((msr >= MSR_F15H_PERF_CTL &&
@@ -143,11 +144,13 @@ static inline bool is_amd_pmu_msr(unsigned int msr)
return false;
 }
 
-static int is_intel_pmu_msr(u32 msr_index, int *type, int *index)
+static bool is_intel_pmu_msr(u32 msr_index, int *type, int *index)
 {
u32 msr_index_pmc;
 
-   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
+   boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR &&
+   boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
return false;
 
switch (msr_index) {
-- 
2.35.3




[PATCH v3 0/4] xen/pv: sanitize xen pv guest msr accesses

2022-10-05 Thread Juergen Gross
Historically when running as Xen PV guest all MSR accesses have been
silently swallowing any GP faults, even when the kernel was using not
the *msr_safe() access functions.

Change that by making the behavior controllable via kernel config and
via a boot parameter.

This will help finding paths where MSRs are being accessed under Xen
which are not emulated by the hypervisor.

Changes in V3:
- new patch 2
- addressed comments

Juergen Gross (4):
  xen/pv: add fault recovery control to pmu msr accesses
  xen/pv: fix vendor checks for pmu emulation
  xen/pv: refactor msr access functions to support safe and unsafe
accesses
  xen/pv: support selecting safe/unsafe msr accesses

 .../admin-guide/kernel-parameters.txt |  6 ++
 arch/x86/xen/Kconfig  |  9 ++
 arch/x86/xen/enlighten_pv.c   | 99 +--
 arch/x86/xen/pmu.c| 71 +++--
 4 files changed, 127 insertions(+), 58 deletions(-)

-- 
2.35.3




[PATCH v3 1/4] xen/pv: add fault recovery control to pmu msr accesses

2022-10-05 Thread Juergen Gross
Today pmu_msr_read() and pmu_msr_write() fall back to the safe variants
of read/write MSR in case the MSR access isn't emulated via Xen. Allow
the caller to select that faults should not be recovered from by passing
NULL for the error pointer.

Restructure the code to make it more readable.

Signed-off-by: Juergen Gross 
---
V2:
- do some restructuring (Jan Beulich, Boris Ostrovsky)
V3:
- commit message rephrasing (Jan Beulich)
- more restructuring (Boris Ostrovsky)
---
 arch/x86/xen/pmu.c | 66 ++
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 21ecbe754cb2..0f98cb1077e3 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -131,6 +131,9 @@ static inline uint32_t get_fam15h_addr(u32 addr)
 
 static inline bool is_amd_pmu_msr(unsigned int msr)
 {
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+   return false;
+
if ((msr >= MSR_F15H_PERF_CTL &&
 msr < MSR_F15H_PERF_CTR + (amd_num_counters * 2)) ||
(msr >= MSR_K7_EVNTSEL0 &&
@@ -144,6 +147,9 @@ static int is_intel_pmu_msr(u32 msr_index, int *type, int 
*index)
 {
u32 msr_index_pmc;
 
+   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+   return false;
+
switch (msr_index) {
case MSR_CORE_PERF_FIXED_CTR_CTRL:
case MSR_IA32_DS_AREA:
@@ -290,48 +296,52 @@ static bool xen_amd_pmu_emulate(unsigned int msr, u64 
*val, bool is_read)
return false;
 }
 
+static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read,
+bool *emul)
+{
+   int type, index;
+
+   if (is_amd_pmu_msr(msr))
+   *emul = xen_amd_pmu_emulate(msr, val, is_read);
+   else if (is_intel_pmu_msr(msr, , ))
+   *emul = xen_intel_pmu_emulate(msr, val, type, index, is_read);
+   else
+   return false;
+
+   return true;
+}
+
 bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
 {
-   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
-   if (is_amd_pmu_msr(msr)) {
-   if (!xen_amd_pmu_emulate(msr, val, 1))
-   *val = native_read_msr_safe(msr, err);
-   return true;
-   }
-   } else {
-   int type, index;
+   bool emulated;
 
-   if (is_intel_pmu_msr(msr, , )) {
-   if (!xen_intel_pmu_emulate(msr, val, type, index, 1))
-   *val = native_read_msr_safe(msr, err);
-   return true;
-   }
+   if (!pmu_msr_chk_emulated(msr, val, true, ))
+   return false;
+
+   if (!emulated) {
+   *val = err ? native_read_msr_safe(msr, err)
+  : native_read_msr(msr);
}
 
-   return false;
+   return true;
 }
 
 bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
 {
uint64_t val = ((uint64_t)high << 32) | low;
+   bool emulated;
 
-   if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
-   if (is_amd_pmu_msr(msr)) {
-   if (!xen_amd_pmu_emulate(msr, , 0))
-   *err = native_write_msr_safe(msr, low, high);
-   return true;
-   }
-   } else {
-   int type, index;
+   if (!pmu_msr_chk_emulated(msr, , false, ))
+   return false;
 
-   if (is_intel_pmu_msr(msr, , )) {
-   if (!xen_intel_pmu_emulate(msr, , type, index, 0))
-   *err = native_write_msr_safe(msr, low, high);
-   return true;
-   }
+   if (!emulated) {
+   if (err)
+   *err = native_write_msr_safe(msr, low, high);
+   else
+   native_write_msr(msr, low, high);
}
 
-   return false;
+   return true;
 }
 
 static unsigned long long xen_amd_read_pmc(int counter)
-- 
2.35.3




[libvirt test] 173423: tolerable all pass - PUSHED

2022-10-05 Thread osstest service owner
flight 173423 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173423/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 173413
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173413
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173413
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  92f7aafced8d354cead03e50e1e7d57a99d29435
baseline version:
 libvirt  d6245e36c24969efcaa835ce6e9a9c953f47cc87

Last test of basis   173413  2022-10-04 04:25:20 Z1 days
Testing same since   173423  2022-10-05 04:20:23 Z0 days1 attempts


People who touched revisions under test:
  Christian Ehrhardt 
  Michal Privoznik 
  Peter Krempa 
  Stefan Berger 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs


Re: [PATCH][4.17] EFI: don't convert memory marked for runtime use to ordinary RAM

2022-10-05 Thread Julien Grall

Hi Jan,

On 04/10/2022 16:58, Jan Beulich wrote:

On 30.09.2022 14:51, Bertrand Marquis wrote:

On 30 Sep 2022, at 09:50, Jan Beulich  wrote:

efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME
higher priority than the type of the range. To avoid accessing memory at
runtime which was re-used for other purposes, make
efi_arch_process_memory_map() follow suit. While on x86 in theory the
same would apply to EfiACPIReclaimMemory, we don't actually "reclaim"
E820_ACPI memory there and hence that type's handling can be left alone.

Fixes: bf6501a62e80 ("x86-64: EFI boot code")
Fixes: facac0af87ef ("x86-64: EFI runtime code")
Fixes: 6d70ea10d49f ("Add ARM EFI boot support")
Signed-off-by: Jan Beulich 


Reviewed-by: Bertrand Marquis  #arm


Thanks. However ...


---
Partly RFC for Arm, for two reasons:

On Arm I question the conversion of EfiACPIReclaimMemory, in two ways:
For one like on x86 such ranges would likely better be retained, as Dom0
may (will?) have a need to look at tables placed there. Plus converting
such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to
me as well. I'd be inclined to make the latter adjustment right here
(while the other change probably would better be separate, if there
aren't actually reasons for the present behavior).


... any views on this WB aspect at least (also Stefano or Julien)? Would be
good to know before I send v2.


I don't quite understand what you are questioning here. Looking at the 
code, EfiACPIReclaimMemory will not be converted to RAM but added in a 
separate array.


Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 
(see acpi_create_efi_mmap_table()).


So to me the code looks correct.

Cheers,

--
Julien Grall



Re: [PATCH 2/2] x86: Activate Data Operand Invariant Timing Mode by default

2022-10-05 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 05:08:10PM +0100, Andrew Cooper wrote:
> Intel IceLake and later CPUs have microarchitectural behaviours which cause
> data-dependent timing behaviour.  This is not an issue for 99% of software,
> but it is a problem for cryptography routines.  On these CPUs, a new
> architectural feature, DOITM, was retrofitted in microcode.
> 
> For now, Xen can't enumerate DOITM to guest kernels; getting this working is
> still in progress.  The consequence is that guest kernels will incorrectly
> conclude that they are safe.
> 
> To maintain the safety of current software, activate DOITM unilaterally.  This
> will be relaxed in the future when we can enumerate the feature properly to
> guests.
> 
> As an emergency stopgap, this behaviour can be disabled by specifying
> `cpuid=no-doitm` on Xen's command line, but is not guaranteed ABI moving
> forward.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Wei Liu 
> CC: Henry Wang 
> CC: Marek Marczykowski-Górecki 
> CC: Demi Marie Obenour 
> ---
>  xen/arch/x86/cpu/common.c| 29 +
>  xen/arch/x86/cpuid.c |  5 +
>  xen/arch/x86/include/asm/processor.h |  2 ++
>  xen/tools/gen-cpuid.py   |  2 ++
>  4 files changed, 38 insertions(+)
> 
> diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
> index 0412dbc915e5..8c46a4db430a 100644
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -209,6 +209,34 @@ void ctxt_switch_levelling(const struct vcpu *next)
>   alternative_vcall(ctxt_switch_masking, next);
>  }
>  
> +bool __ro_after_init opt_doitm = true;
> +
> +static void doitm_init(void)
> +{
> +uint64_t val;
> +
> +if ( !opt_doitm || !cpu_has_arch_caps )
> +return;
> +
> +rdmsrl(MSR_ARCH_CAPABILITIES, val);
> +if ( !(val & ARCH_CAPS_DOITM) )
> +return;
> +
> +/*
> + * We are currently unable to enumerate MSR_ARCH_CAPS to guest.  As a
> + * consequence, guest kernels will believe they're safe even when they 
> are
> + * not.
> + *
> + * Until we can enumerate DOITM properly for guests, set it unilaterally.
> + * This prevents otherwise-correct crypto from becoming vulnerable to
> + * timing sidechannels.
> + */
> +
> +rdmsrl(MSR_UARCH_MISC_CTRL, val);
> +val |= UARCH_CTRL_DOITM;
> +wrmsrl(MSR_UARCH_MISC_CTRL, val);

Is it possible for the firmware to have enabled DOITM and Xen needing to
clear it if !opt_doitm?

Thanks, Roger.



Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 05:08:09PM +0100, Andrew Cooper wrote:
> A future change will want a cpuid-like identifier which doesn't have a mapping
> to a feature bit.

Why we prefer this logic over just giving such pseudo features a
synthetic bit or akin?

Could we have a bit more of a description about what would be
considered a pseudo feature?

Thanks, Roger.



[xen-unstable test] 173422: tolerable FAIL

2022-10-05 Thread osstest service owner
flight 173422 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173422/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-libvirt-xsm   7 xen-installfail pass in 173416

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt-xsm 15 migrate-support-check fail in 173416 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173416
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173416
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173416
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173416
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173416
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173416
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173416
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173416
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 173416
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173416
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173416
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173416
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version 

[PATCH v4 0/7] Make balloon drivers memory changes known to the rest of the kernel

2022-10-05 Thread Alexander Atanasov
Currently balloon drivers (Virtio,XEN, HyperV, VMWare, ...)
inflate and deflate the guest memory size but there is no
way to know how much the memory size is changed by them.

A common use of the ballooning is to emulate [1]
hot plug and hot unplug - due to the complexity of the later.
Hotplug has a notifier and one can also check the updated
memory size.

To improve this add InflatedTotal and InflatedFree
to /proc/meminfo and implement a balloon notifier.

Amount of inflated memory can be used:
 - si_meminfo(..) users can improve calculations
 - adjust cache/buffer sizes
 - adjust object/connection limits
 - as a hint for the oom a killer
 - by user space software that monitors memory pressure

Alexander Atanasov (7):
  Make place for common balloon code
  Enable balloon drivers to report inflated memory
  Display inflated memory to users
  drivers: virtio: balloon - update inflated memory
  Display inflated memory in logs
  drivers: vmware: balloon - report inflated memory
  drivers: hyperv: balloon - report inflated memory

 Documentation/filesystems/proc.rst|  6 +++
 MAINTAINERS   |  4 +-
 arch/powerpc/platforms/pseries/cmm.c  |  2 +-
 drivers/hv/hv_balloon.c   | 11 +
 drivers/misc/vmw_balloon.c|  6 ++-
 drivers/virtio/virtio_balloon.c   |  7 +++-
 fs/proc/meminfo.c | 10 +
 .../linux/{balloon_compaction.h => balloon.h} | 30 ++
 lib/show_mem.c|  8 
 mm/Makefile   |  2 +-
 mm/{balloon_compaction.c => balloon.c}| 40 +--
 mm/migrate.c  |  1 -
 mm/vmscan.c   |  1 -
 13 files changed, 110 insertions(+), 18 deletions(-)
 rename include/linux/{balloon_compaction.h => balloon.h} (86%)
 rename mm/{balloon_compaction.c => balloon.c} (88%)

v4:
 - add support in hyperV and vmware balloon drivers
 - display balloon memory in show_mem so it is logged on OOM and on sysrq
v3:
 - added missed EXPORT_SYMBOLS
Reported-by: kernel test robot 
 - instead of balloon_common.h just use balloon.h (yes, naming is hard)
 - cleaned up balloon.h - remove from files that do not use it and
   remove externs from function declarations
v2:
 - reworked from simple /proc/meminfo addition

Cc: Michael S. Tsirkin 
Cc: David Hildenbrand 
Cc: Wei Liu 
Cc: Nadav Amit 
Cc: pv-driv...@vmware.com
Cc: Jason Wang 
Cc: virtualizat...@lists.linux-foundation.org
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Dexuan Cui 
Cc: linux-hyp...@vger.kernel.org
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Oleksandr Tyshchenko 
Cc: xen-devel@lists.xenproject.org

1. https://lore.kernel.org/lkml/f0f12c84-a62e-5f8b-46ab-a651fe8f8...@redhat.com/

base-commit: 2bca25eaeba6190efbfcb38ed169bd7ee43b5aaf
-- 
2.31.1




Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions

2022-10-05 Thread Jan Beulich
On 05.10.2022 10:37, Roger Pau Monné wrote:
> On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote:
>> On 04.10.2022 17:36, Roger Pau Monne wrote:
>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and
>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by
>>> EFI firmware.
>>>
>>> The current parsing of the EFI memory map is translating
>>> EfiMemoryMappedIO to E820_RESERVED on x86.  This causes issues on some
>>> boxes as the firmware is relying on using those regions to position
>>> the BARs of devices being used (possibly during runtime) for the
>>> firmware.
>>>
>>> Xen will disable memory decoding on any device that has BARs
>>> positioned over any regions on the e820 memory map, hence the firmware
>>> will malfunction after Xen turning off memory decoding for the
>>> device(s) that have BARs mapped in EfiMemoryMappedIO regions.
>>>
>>> The system under which this was observed has:
>>>
>>> EFI memory map:
>>> [...]
>>>  0fd00-0fe7f type=11 attr=8000100d
>>> [...]
>>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
>>>
>>> The device behind this BAR is:
>>>
>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
>>> Controller (rev 09)
>>> Subsystem: Super Micro Computer Inc Device 091c
>>> Flags: fast devsel
>>> Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
>>>
>>> For the record, the symptom observed in that machine was a hard freeze
>>> when attempting to set an EFI variable (XEN_EFI_set_variable).
>>>
>>> Fix by allowing BARs of PCI devices to be positioned over reserved
>>> memory regions, but print a warning message about such overlap.
>>
>> Somewhat hesitantly I might ack this change, but I'd like to give
>> others (Andrew in particular) some time to voice their views. As
>> said during the earlier discussion - I think we're relaxing things
>> too much by going this route.
> 
> Another option would be to explicitly check in efi_memmap for
> EfiMemoryMappedIO regions and BAR overlap and only allow those.  That
> would be more cumbersome code wise AFAICT.

Indeed there's a question of balancing well here, between two outcomes
neither of which is really desirable.

Jan



Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions

2022-10-05 Thread Roger Pau Monné
On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote:
> On 04.10.2022 17:36, Roger Pau Monne wrote:
> > The EFI memory map contains two memory types (EfiMemoryMappedIO and
> > EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by
> > EFI firmware.
> > 
> > The current parsing of the EFI memory map is translating
> > EfiMemoryMappedIO to E820_RESERVED on x86.  This causes issues on some
> > boxes as the firmware is relying on using those regions to position
> > the BARs of devices being used (possibly during runtime) for the
> > firmware.
> > 
> > Xen will disable memory decoding on any device that has BARs
> > positioned over any regions on the e820 memory map, hence the firmware
> > will malfunction after Xen turning off memory decoding for the
> > device(s) that have BARs mapped in EfiMemoryMappedIO regions.
> > 
> > The system under which this was observed has:
> > 
> > EFI memory map:
> > [...]
> >  0fd00-0fe7f type=11 attr=8000100d
> > [...]
> > :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map
> > 
> > The device behind this BAR is:
> > 
> > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI 
> > Controller (rev 09)
> > Subsystem: Super Micro Computer Inc Device 091c
> > Flags: fast devsel
> > Memory at fe01 (32-bit, non-prefetchable) [size=4K]well
> > 
> > For the record, the symptom observed in that machine was a hard freeze
> > when attempting to set an EFI variable (XEN_EFI_set_variable).
> > 
> > Fix by allowing BARs of PCI devices to be positioned over reserved
> > memory regions, but print a warning message about such overlap.
> 
> Somewhat hesitantly I might ack this change, but I'd like to give
> others (Andrew in particular) some time to voice their views. As
> said during the earlier discussion - I think we're relaxing things
> too much by going this route.

Another option would be to explicitly check in efi_memmap for
EfiMemoryMappedIO regions and BAR overlap and only allow those.  That
would be more cumbersome code wise AFAICT.

> > --- a/xen/arch/x86/pci.c
> > +++ b/xen/arch/x86/pci.c
> > @@ -98,3 +98,30 @@ int pci_conf_write_intercept(unsigned int seg, unsigned 
> > int bdf,
> >  
> >  return rc;
> >  }
> > +
> > +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end)
> > +{
> > +unsigned long mfn;
> > +
> > +/*
> > + * Check if BAR is not overlapping with any memory region defined
> > + * in the memory map.
> > + */
> > +if ( is_memory_hole(start, end) )
> > +return true;
> > +
> > +/*
> > + * Also allow BARs placed on reserved regions in order to deal with EFI
> > + * firmware using EfiMemoryMappedIO regions to place the BARs of 
> > devices
> > + * that can be used during runtime.  But print a warning when doing so.
> > + */
> > +for ( mfn = mfn_x(start); mfn <= mfn_x(end); mfn++ )
> > +if ( !page_is_ram_type(mfn, RAM_TYPE_RESERVED) )
> > +return false;
> 
> We don't need to be arch-independent here and hence instead of this
> (potentially long) loop can't we use a single call to e820_all_mapped()?

Sure, was searching for a range checking functions but wrongly looked
into mm.c instead of e820.  I would have to make the function
non-init, but I think that's fine.

Thanks, Roger.



[linux-linus test] 173421: tolerable FAIL - PUSHED

2022-10-05 Thread osstest service owner
flight 173421 linux-linus real [real]
flight 173425 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/173421/
http://logs.test-lab.xenproject.org/osstest/logs/173425/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl   8 xen-bootfail pass in 173425-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl 15 migrate-support-check fail in 173425 never pass
 test-armhf-armhf-xl 16 saverestore-support-check fail in 173425 never pass
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173414
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173414
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173414
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173414
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 173414
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 173414
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 173414
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173414
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass

version targeted for testing:
 linux0326074ff4652329f2a1a9c8685104576bd8d131
baseline version:
 linux725737e7c21d2d25a4312c2aaa82a52bd03e3126

Last test of basis   173414  2022-10-04 08:14:24 Z1 days
Failing since173419  2022-10-04 17:40:01 Z0 days2 attempts
Testing same since   173421  2022-10-05 00:13:01 Z0 days1 attempts


535 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm

[ovmf test] 173424: all pass - PUSHED

2022-10-05 Thread osstest service owner
flight 173424 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/173424/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 1bd2ff18664b9564a5802d0ac153b5023f2fa41e
baseline version:
 ovmf f931506815548cee5a3929856bfc98266c3c

Last test of basis   173420  2022-10-04 19:41:57 Z0 days
Testing same since   173424  2022-10-05 04:40:32 Z0 days1 attempts


People who touched revisions under test:
  Chasel Chiu 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   f931506815..1bd2ff1866  1bd2ff18664b9564a5802d0ac153b5023f2fa41e -> 
xen-tested-master



Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers

2022-10-05 Thread Jan Beulich
On 04.10.2022 18:08, Andrew Cooper wrote:
> A future change will want a cpuid-like identifier which doesn't have a mapping
> to a feature bit.
> 
>  * Pass the feature name into the parse callback.
>  * Exclude a feature value of ~0u from falling into the general set/clear bit
>paths.
>  * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature
>names.

Hmm, I was envisioning this to be fitted into the existing model in a
less adhoc way: (parts of) MSRs holding feature bits aren't very different
from individual (pairs of) registers of CPUID output (in the case of
ARCH_CAPS there would be a perhaps just abstract mask limiting things to
the subset of bits which actually act as feature indicators). Hence I'd
have expected them to gain proper entries in the public interface
(cpufeatureset.h) and then be represented / processed the same way in
featuresets and policies. All that would be left out at this point would
be the exposing of the bit to guests (in patch 2, assuming the split into
two patches is then actually still warranted), integration into
guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for
example, could easily learn of such right away).

However, since I'm pretty sure you've considered such an approach, I guess
I might be overlooking some caveat?

> --- a/xen/tools/gen-cpuid.py
> +++ b/xen/tools/gen-cpuid.py
> @@ -297,6 +297,19 @@ def crunch_numbers(state):
>  RTM: [TSXLDTRK],
>  }
>  
> +#
> +# Pseudo feature names.  These don't map to a feature bit, but are
> +# inserted into the values dictionary so they can be parsed and handled
> +# specially
> +#
> +pseduo_names = (
> +)
> +
> +for n in pseduo_names:
> +if n in state.values:
> +raise Fail("Pseduo feature name %s aliases real feature" % (n, ))
> +state.values[n] = 0x

Throughout this hunk: s/pseduo/pseudo/g.

Jan



Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered

2022-10-05 Thread Jan Beulich
On 04.10.2022 17:46, Demi Marie Obenour wrote:
> Linux has a function called efi_mem_reserve() that is used to reserve
> EfiBootServicesData memory that contains e.g. EFI configuration tables.
> This function does not work under Xen because Xen could have already
> clobbered the memory.  efi_mem_reserve() not working is the whole reason
> for this thread, as it prevents EFI tables that are in
> EfiBootServicesData from being used under Xen.
> 
> A much nicer approach would be for Xen to reserve boot services memory
> unconditionally, but provide a hypercall that dom0 could used to free
> the parts of EfiBootServicesData memory that are no longer needed.  This
> would allow efi_mem_reserve() to work normally.

efi_mem_reserve() actually working would be a layering violation;
controlling the EFI memory map is entirely Xen's job.

As to the hypercall you suggest - I wouldn't mind its addition, but only
for the case when -mapbs is used. As I've indicated before, I'm of the
opinion that default behavior should be matching the intentions of the
spec, and the intention of EfiBootServices* is for the space to be
reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
that hypercall: It might use it for regions where data lives which it
wouldn't care about itself, but which an eventual kexec-ed (or alike)
entity would later want to consume. Code/data potentially usable by
_anyone_ between two resets of the system cannot legitimately be freed
(and hence imo is wrong to live in EfiBootServices* regions). In a way
one could view the Dom0 kernel as an "or alike" entity ...

Jan