Re: [PATCH] xen-pcifront: Handle missed Connected state
On 29.08.22 17:15, Jason Andryuk wrote: An HVM guest with linux stubdom and 2 PCI devices failed to start as libxl timed out waiting for the PCI devices to be added. It happens intermittently but with some regularity. libxl wrote the two xenstore entries for the devices, but then timed out waiting for backend state 4 (Connected) - the state stayed at 7 (Reconfiguring). (PCI passthrough to an HVM with stubdomain is PV passthrough to the stubdomain and then HVM passthrough with the QEMU inside the stubdomain.) The stubdom kernel never printed "pcifront pci-0: Installing PCI frontend", so it seems to have missed state 4 which would have called pcifront_try_connect -> pcifront_connect_and_init_dma Have pcifront_detach_devices special-case state Initialised and call pcifront_connect_and_init_dma. Don't use pcifront_try_connect because that sets the xenbus state which may throw off the backend. After connecting, skip the remainder of detach_devices since none have been initialized yet. When the backend switches to Reconfigured, pcifront_attach_devices will pick them up again. Signed-off-by: Jason Andryuk Pushed to xen/tip.git for-linus-6.1 Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 2/2] xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices
On 05.10.22 19:48, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko As find_xen_grant_dma_data() is called from both interrupt and process contexts, the access to xen_grant_dma_devices XArray must be protected by xa_lock_irqsave to avoid deadlock scenario. As XArray API doesn't provide xa_store_irqsave helper, call lockless __xa_store directly and guard it externally. Also move the storage of the XArray's entry to a separate helper. Signed-off-by: Oleksandr Tyshchenko Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access under Xen") Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[linux-5.4 test] 173433: regressions - FAIL
flight 173433 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/173433/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 173353 test-armhf-armhf-xl-credit2 14 guest-start fail REGR. vs. 173353 test-armhf-armhf-xl-multivcpu 18 guest-start/debian.repeat fail REGR. vs. 173353 Tests which are failing intermittently (not blocking): test-amd64-amd64-examine 5 host-install broken in 173427 pass in 173433 test-amd64-amd64-xl-rtds 8 xen-boot fail in 173427 pass in 173433 test-armhf-armhf-xl-credit1 14 guest-start fail in 173427 pass in 173433 test-armhf-armhf-xl-multivcpu 14 guest-start fail in 173427 pass in 173433 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 14 guest-start/debianhvm.repeat fail pass in 173427 test-armhf-armhf-xl-rtds 14 guest-startfail pass in 173427 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 173353 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail in 173427 like 173353 test-armhf-armhf-xl-rtds15 migrate-support-check fail in 173427 never pass test-armhf-armhf-xl-rtds 16 saverestore-support-check fail in 173427 never pass test-armhf-armhf-xl-credit1 18 guest-start/debian.repeatfail like 173353 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173353 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173353 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173353 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173353 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173353 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173353 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173353 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173353 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173353 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173353 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173353 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail
Re: [PATCH 1/2] xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
On 05.10.22 19:48, Oleksandr Tyshchenko wrote: From: Oleksandr Tyshchenko Take page offset into the account when calculating the number of pages to be granted. Signed-off-by: Oleksandr Tyshchenko Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access under Xen") Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered
On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > wrote: > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > This function does not work under Xen because Xen could have already > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > for this thread, as it prevents EFI tables that are in > > > > EfiBootServicesData from being used under Xen. > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > would allow efi_mem_reserve() to work normally. > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > controlling the EFI memory map is entirely Xen's job. > > > > Doing this properly would require Xen to understand all of the EFI > > tables that could validly be in EfiBootServices* and which could be of > > interest to dom0. It might (at least on some very buggy firmware) > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > opinion that default behavior should be matching the intentions of the > > > spec, and the intention of EfiBootServices* is for the space to be > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > that hypercall: It might use it for regions where data lives which it > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > entity would later want to consume. Code/data potentially usable by > > > _anyone_ between two resets of the system cannot legitimately be freed > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > configuration tables that point to EfiBootServicesData memory before > > freeing that memory. > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > RT_PROP are mostly relevant for bare metal where the host kernel maps > the runtime services, and in general, passing on these tables without > knowing what they do is kind of fishy anyway. You might even argue > that only known table types should be forwarded in the first place, > regardless of the memory type. Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and ESRT, but I am curious which others Xen should preserve. Currently, Xen does not know about RT_PROP or MEMATTR; could this be a cause of problems? -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
[qemu-mainline test] 173431: trouble: broken/fail/pass
flight 173431 qemu-mainline real [real] http://logs.test-lab.xenproject.org/osstest/logs/173431/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-shadow broken test-amd64-amd64-xl-shadow5 host-install(5)broken REGR. vs. 173408 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173408 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173408 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173408 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173408 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 173408 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173408 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173408 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173408 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass version targeted for testing: qemuufafd35a6dab8e70a7c395aaa8e1273267cf9f3c8 baseline version: qemuuefbf38d73e5dcc4d5f8b98c6e7a12be1f3b91745 Last test of basis 173408 2022-10-03 22:08:40 Z2 days
Re: [PATCH 2/2] xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices
On Wed, 5 Oct 2022, Oleksandr Tyshchenko wrote: > From: Oleksandr Tyshchenko > > As find_xen_grant_dma_data() is called from both interrupt and process > contexts, the access to xen_grant_dma_devices XArray must be protected > by xa_lock_irqsave to avoid deadlock scenario. > As XArray API doesn't provide xa_store_irqsave helper, call lockless > __xa_store directly and guard it externally. > > Also move the storage of the XArray's entry to a separate helper. > > Signed-off-by: Oleksandr Tyshchenko > Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access > under Xen") Reviewed-by: Stefano Stabellini > --- > drivers/xen/grant-dma-ops.c | 24 +++- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c > index 1998d0e8ce82..c66f56d24013 100644 > --- a/drivers/xen/grant-dma-ops.c > +++ b/drivers/xen/grant-dma-ops.c > @@ -25,7 +25,7 @@ struct xen_grant_dma_data { > bool broken; > }; > > -static DEFINE_XARRAY(xen_grant_dma_devices); > +static DEFINE_XARRAY_FLAGS(xen_grant_dma_devices, XA_FLAGS_LOCK_IRQ); > > #define XEN_GRANT_DMA_ADDR_OFF (1ULL << 63) > > @@ -42,14 +42,29 @@ static inline grant_ref_t dma_to_grant(dma_addr_t dma) > static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev) > { > struct xen_grant_dma_data *data; > + unsigned long flags; > > - xa_lock(_grant_dma_devices); > + xa_lock_irqsave(_grant_dma_devices, flags); > data = xa_load(_grant_dma_devices, (unsigned long)dev); > - xa_unlock(_grant_dma_devices); > + xa_unlock_irqrestore(_grant_dma_devices, flags); > > return data; > } > > +static int store_xen_grant_dma_data(struct device *dev, > + struct xen_grant_dma_data *data) > +{ > + unsigned long flags; > + int ret; > + > + xa_lock_irqsave(_grant_dma_devices, flags); > + ret = xa_err(__xa_store(_grant_dma_devices, (unsigned long)dev, > data, > + GFP_ATOMIC)); > + xa_unlock_irqrestore(_grant_dma_devices, flags); > + > + return ret; > +} > + > /* > * DMA ops for Xen frontends (e.g. virtio). > * > @@ -338,8 +353,7 @@ void xen_grant_setup_dma_ops(struct device *dev) >*/ > data->backend_domid = iommu_spec.args[0]; > > - if (xa_err(xa_store(_grant_dma_devices, (unsigned long)dev, data, > - GFP_KERNEL))) { > + if (store_xen_grant_dma_data(dev, data)) { > dev_err(dev, "Cannot store Xen grant DMA data\n"); > goto err; > } > -- > 2.25.1 >
Re: [PATCH 1/2] xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
On Wed, 5 Oct 2022, Oleksandr Tyshchenko wrote: > From: Oleksandr Tyshchenko > > Take page offset into the account when calculating the number of pages > to be granted. > > Signed-off-by: Oleksandr Tyshchenko > Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access > under Xen") Reviewed-by: Stefano Stabellini > --- > drivers/xen/grant-dma-ops.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c > index 8973fc1e9ccc..1998d0e8ce82 100644 > --- a/drivers/xen/grant-dma-ops.c > +++ b/drivers/xen/grant-dma-ops.c > @@ -153,7 +153,7 @@ static dma_addr_t xen_grant_dma_map_page(struct device > *dev, struct page *page, >unsigned long attrs) > { > struct xen_grant_dma_data *data; > - unsigned int i, n_pages = PFN_UP(size); > + unsigned int i, n_pages = PFN_UP(offset + size); > grant_ref_t grant; > dma_addr_t dma_handle; > > @@ -185,7 +185,8 @@ static void xen_grant_dma_unmap_page(struct device *dev, > dma_addr_t dma_handle, >unsigned long attrs) > { > struct xen_grant_dma_data *data; > - unsigned int i, n_pages = PFN_UP(size); > + unsigned long offset = dma_handle & (PAGE_SIZE - 1); > + unsigned int i, n_pages = PFN_UP(offset + size); > grant_ref_t grant; > > if (WARN_ON(dir == DMA_NONE)) > -- > 2.25.1 >
[xen-unstable test] 173430: tolerable FAIL - PUSHED
flight 173430 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/173430/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173422 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173422 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173422 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173422 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173422 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173422 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173422 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173422 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 173422 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173422 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173422 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173422 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass version targeted for testing: xen 66a5633aa038f4abb4455463755974febac69034 baseline version: xen
Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered
On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour wrote: > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > This function does not work under Xen because Xen could have already > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > for this thread, as it prevents EFI tables that are in > > > EfiBootServicesData from being used under Xen. > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > unconditionally, but provide a hypercall that dom0 could used to free > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > would allow efi_mem_reserve() to work normally. > > > > efi_mem_reserve() actually working would be a layering violation; > > controlling the EFI memory map is entirely Xen's job. > > Doing this properly would require Xen to understand all of the EFI > tables that could validly be in EfiBootServices* and which could be of > interest to dom0. It might (at least on some very buggy firmware) > require a partial ACPI and/or SMBIOS implementation too, if the firmware > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > for the case when -mapbs is used. As I've indicated before, I'm of the > > opinion that default behavior should be matching the intentions of the > > spec, and the intention of EfiBootServices* is for the space to be > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > that hypercall: It might use it for regions where data lives which it > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > entity would later want to consume. Code/data potentially usable by > > _anyone_ between two resets of the system cannot legitimately be freed > > (and hence imo is wrong to live in EfiBootServices* regions). > > I agree, but currently some such data *is* in EfiBootServices* regions, > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > configuration tables that point to EfiBootServicesData memory before > freeing that memory. > That seems like a reasonable approach to me. Tables like MEMATTR or RT_PROP are mostly relevant for bare metal where the host kernel maps the runtime services, and in general, passing on these tables without knowing what they do is kind of fishy anyway. You might even argue that only known table types should be forwarded in the first place, regardless of the memory type.
Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered
On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > Linux has a function called efi_mem_reserve() that is used to reserve > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > This function does not work under Xen because Xen could have already > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > for this thread, as it prevents EFI tables that are in > > EfiBootServicesData from being used under Xen. > > > > A much nicer approach would be for Xen to reserve boot services memory > > unconditionally, but provide a hypercall that dom0 could used to free > > the parts of EfiBootServicesData memory that are no longer needed. This > > would allow efi_mem_reserve() to work normally. > > efi_mem_reserve() actually working would be a layering violation; > controlling the EFI memory map is entirely Xen's job. Doing this properly would require Xen to understand all of the EFI tables that could validly be in EfiBootServices* and which could be of interest to dom0. It might (at least on some very buggy firmware) require a partial ACPI and/or SMBIOS implementation too, if the firmware decided to put an ACPI or SMBIOS table in EfiBootServices*. > As to the hypercall you suggest - I wouldn't mind its addition, but only > for the case when -mapbs is used. As I've indicated before, I'm of the > opinion that default behavior should be matching the intentions of the > spec, and the intention of EfiBootServices* is for the space to be > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > that hypercall: It might use it for regions where data lives which it > wouldn't care about itself, but which an eventual kexec-ed (or alike) > entity would later want to consume. Code/data potentially usable by > _anyone_ between two resets of the system cannot legitimately be freed > (and hence imo is wrong to live in EfiBootServices* regions). I agree, but currently some such data *is* in EfiBootServices* regions, sadly. When -mapbs is *not* used, I recommend uninstalling all of the configuration tables that point to EfiBootServicesData memory before freeing that memory. > In a way one could view the Dom0 kernel as an "or alike" entity ... It is indeed such an entity. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH][4.17] EFI: don't convert memory marked for runtime use to ordinary RAM
Hi Jan, On 05/10/2022 12:55, Jan Beulich wrote: On 05.10.2022 12:44, Julien Grall wrote: On 04/10/2022 16:58, Jan Beulich wrote: On 30.09.2022 14:51, Bertrand Marquis wrote: On 30 Sep 2022, at 09:50, Jan Beulich wrote: efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME higher priority than the type of the range. To avoid accessing memory at runtime which was re-used for other purposes, make efi_arch_process_memory_map() follow suit. While on x86 in theory the same would apply to EfiACPIReclaimMemory, we don't actually "reclaim" E820_ACPI memory there and hence that type's handling can be left alone. Fixes: bf6501a62e80 ("x86-64: EFI boot code") Fixes: facac0af87ef ("x86-64: EFI runtime code") Fixes: 6d70ea10d49f ("Add ARM EFI boot support") Signed-off-by: Jan Beulich Reviewed-by: Bertrand Marquis #arm Thanks. However ... --- Partly RFC for Arm, for two reasons: On Arm I question the conversion of EfiACPIReclaimMemory, in two ways: For one like on x86 such ranges would likely better be retained, as Dom0 may (will?) have a need to look at tables placed there. Plus converting such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to me as well. I'd be inclined to make the latter adjustment right here (while the other change probably would better be separate, if there aren't actually reasons for the present behavior). ... any views on this WB aspect at least (also Stefano or Julien)? Would be good to know before I send v2. I don't quite understand what you are questioning here. Looking at the code, EfiACPIReclaimMemory will not be converted to RAM but added in a separate array. Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 (see acpi_create_efi_mmap_table()). So to me the code looks correct. Oh, I've indeed not paid enough attention to the first argument passed to meminfo_add_bank(). I'm sorry for the extra noise. However, the question I wanted to have addressed before sending out v3 was that regarding the present using of memory when EFI_MEMORY_WB is not set. Is that correct for the EfiACPIReclaimMemory case, i.e. is the consumer (Dom0) aware that there might be a restriction? Looking at the code, we always set EFI_MEMORY_WB for the reclaimable region and the stage-2 mapping will be cachable. So it looks like there would be a mismatch if EFI_MEMORY_WB is not set. However, given the region is reclaimable, shouldn't this imply that the flag is always set? And would this memory then be guaranteed to never be freed into the general pool of RAM pages? The region is not treated as RAM by Xen and not owned by the dom0. Therefore, it should not be possible to free the page because get_page_from_gfn() would not be able to get a reference. Cheers, -- Julien Grall
[PATCH 2/2] xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices
From: Oleksandr Tyshchenko As find_xen_grant_dma_data() is called from both interrupt and process contexts, the access to xen_grant_dma_devices XArray must be protected by xa_lock_irqsave to avoid deadlock scenario. As XArray API doesn't provide xa_store_irqsave helper, call lockless __xa_store directly and guard it externally. Also move the storage of the XArray's entry to a separate helper. Signed-off-by: Oleksandr Tyshchenko Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access under Xen") --- drivers/xen/grant-dma-ops.c | 24 +++- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c index 1998d0e8ce82..c66f56d24013 100644 --- a/drivers/xen/grant-dma-ops.c +++ b/drivers/xen/grant-dma-ops.c @@ -25,7 +25,7 @@ struct xen_grant_dma_data { bool broken; }; -static DEFINE_XARRAY(xen_grant_dma_devices); +static DEFINE_XARRAY_FLAGS(xen_grant_dma_devices, XA_FLAGS_LOCK_IRQ); #define XEN_GRANT_DMA_ADDR_OFF (1ULL << 63) @@ -42,14 +42,29 @@ static inline grant_ref_t dma_to_grant(dma_addr_t dma) static struct xen_grant_dma_data *find_xen_grant_dma_data(struct device *dev) { struct xen_grant_dma_data *data; + unsigned long flags; - xa_lock(_grant_dma_devices); + xa_lock_irqsave(_grant_dma_devices, flags); data = xa_load(_grant_dma_devices, (unsigned long)dev); - xa_unlock(_grant_dma_devices); + xa_unlock_irqrestore(_grant_dma_devices, flags); return data; } +static int store_xen_grant_dma_data(struct device *dev, + struct xen_grant_dma_data *data) +{ + unsigned long flags; + int ret; + + xa_lock_irqsave(_grant_dma_devices, flags); + ret = xa_err(__xa_store(_grant_dma_devices, (unsigned long)dev, data, + GFP_ATOMIC)); + xa_unlock_irqrestore(_grant_dma_devices, flags); + + return ret; +} + /* * DMA ops for Xen frontends (e.g. virtio). * @@ -338,8 +353,7 @@ void xen_grant_setup_dma_ops(struct device *dev) */ data->backend_domid = iommu_spec.args[0]; - if (xa_err(xa_store(_grant_dma_devices, (unsigned long)dev, data, - GFP_KERNEL))) { + if (store_xen_grant_dma_data(dev, data)) { dev_err(dev, "Cannot store Xen grant DMA data\n"); goto err; } -- 2.25.1
[PATCH 0/2] Misc fixes for Xen grant DMA-mapping layer
From: Oleksandr Tyshchenko Hello all. These are several fixes I collected when playing with virtio-net device using Xen grant mappings. Tested with both virtio-blk and virtio-net devices. Oleksandr Tyshchenko (2): xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page() xen/virtio: Fix potential deadlock when accessing xen_grant_dma_devices drivers/xen/grant-dma-ops.c | 29 ++--- 1 file changed, 22 insertions(+), 7 deletions(-) -- 2.25.1
[PATCH 1/2] xen/virtio: Fix n_pages calculation in xen_grant_dma_map(unmap)_page()
From: Oleksandr Tyshchenko Take page offset into the account when calculating the number of pages to be granted. Signed-off-by: Oleksandr Tyshchenko Fixes: d6aca3504c7d ("xen/grant-dma-ops: Add option to restrict memory access under Xen") --- drivers/xen/grant-dma-ops.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/xen/grant-dma-ops.c b/drivers/xen/grant-dma-ops.c index 8973fc1e9ccc..1998d0e8ce82 100644 --- a/drivers/xen/grant-dma-ops.c +++ b/drivers/xen/grant-dma-ops.c @@ -153,7 +153,7 @@ static dma_addr_t xen_grant_dma_map_page(struct device *dev, struct page *page, unsigned long attrs) { struct xen_grant_dma_data *data; - unsigned int i, n_pages = PFN_UP(size); + unsigned int i, n_pages = PFN_UP(offset + size); grant_ref_t grant; dma_addr_t dma_handle; @@ -185,7 +185,8 @@ static void xen_grant_dma_unmap_page(struct device *dev, dma_addr_t dma_handle, unsigned long attrs) { struct xen_grant_dma_data *data; - unsigned int i, n_pages = PFN_UP(size); + unsigned long offset = dma_handle & (PAGE_SIZE - 1); + unsigned int i, n_pages = PFN_UP(offset + size); grant_ref_t grant; if (WARN_ON(dir == DMA_NONE)) -- 2.25.1
[linux-5.4 test] 173427: regressions - FAIL
flight 173427 linux-5.4 real [real] http://logs.test-lab.xenproject.org/osstest/logs/173427/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-examine 5 host-install broken REGR. vs. 173353 test-amd64-i386-qemut-rhel6hvm-amd 7 xen-installfail REGR. vs. 173353 test-armhf-armhf-xl-credit2 14 guest-start fail REGR. vs. 173353 test-armhf-armhf-xl-credit1 14 guest-start fail REGR. vs. 173353 test-armhf-armhf-xl-multivcpu 14 guest-start fail REGR. vs. 173353 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-xl-rtds 8 xen-boot fail REGR. vs. 173353 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail blocked in 173353 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail like 173353 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173353 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173353 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173353 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173353 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173353 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173353 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173353 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173353 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173353 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173353 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173353 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-xl-pvshim14 guest-start fail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass
[linux-linus test] 173426: tolerable FAIL - PUSHED
flight 173426 linux-linus real [real] flight 173432 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/173426/ http://logs.test-lab.xenproject.org/osstest/logs/173432/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-amd64-freebsd12-amd64 18 guest-saverestore.2 fail pass in 173432-retest Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173421 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173421 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173421 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173421 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 173421 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173421 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173421 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173421 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass version targeted for testing: linux2bca25eaeba6190efbfcb38ed169bd7ee43b5aaf baseline version: linux0326074ff4652329f2a1a9c8685104576bd8d131 Last test of basis 173421 2022-10-05 00:13:01 Z0 days Testing same since 173426 2022-10-05 08:41:36 Z0 days1 attempts People who touched revisions under test: Adam Skladowski Adrien Grassein # for lontium-lt8912b Akhil R Aleksa Savic Aleksandr Mezin Alexander Stein Alexandru Gagniuc
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On Wed, Oct 05, 2022 at 05:45:56PM +0200, Juergen Gross wrote: > On 05.10.22 17:35, Marek Marczykowski-Górecki wrote: > > On Wed, Oct 05, 2022 at 05:04:29PM +0200, Juergen Gross wrote: > > > On 05.10.22 15:51, Marek Marczykowski-Górecki wrote: > > > > On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote: > > > > > On 05.10.22 15:25, Marek Marczykowski-Górecki wrote: > > > > > > On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: > > > > > > > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > When booting Xen with Linux dom0 nested under KVM, > > > > > > > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio > > > > > > > > devices > > > > > > > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants > > > > > > > > are > > > > > > > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled. > > > > > > > > > > > > > > > > This is probably uncommon corner case, but one that has bitten > > > > > > > > me in my > > > > > > > > CI setup... I think Xen should set smarter > > > > > > > > virtio_require_restricted_mem_acc(), that enforces it only for > > > > > > > > devices > > > > > > > > really provided by another Xen VM (not by the "outer host"), > > > > > > > > but I'm not > > > > > > > > sure how that could be done. Any ideas? > > > > > > > > > > > > > > > > > > > > > > It should be possible to add a boot parameter for that purpose. > > > > > > > Using it > > > > > > > would open a security hole, though (basically like all PCI > > > > > > > passthrough to > > > > > > > PV guests). > > > > > > > > > > > > What about excluding just dom0? At least currently, there is no way > > > > > > for > > > > > > dom0 to see virtio devices provided by another Xen domU. > > > > > > > > > > Even not via hotplug? > > > > > > > > That's why I said "currently", IIUC hotplug of virtio devices under Xen > > > > doesn't work yet, no? > > > > With hotplug working, it would need to be a proper detection where the > > > > backend lives, and probably with some extra considerations re Xen on > > > > Xen (based on below, pv-shim could use grants). > > > > > > As stated before, this isn't a problem specific to virtio devices. The > > > same > > > applies to Xen PV devices. > > > > Why is that an issue for Xen PV devices? They always use grants, so no need > > for exception. But more relevant here, there is no protocol for L0 > > hypervisor (that would need to be Xen) to provide PV device to nested L1 > > guest (besides pv-shim case, which is already handled), so L1 guest > > cannot confuse PV device provided by L0 and L1. > > That's the point. Today using virtio the way you are using it is possible > only because virtio devices don't have the security features of Xen PV > devices. With adding grant support for virtio devices this has changed. > > BTW, you can have the old virtio behavior back by not enabling > CONFIG_XEN_VIRTIO. Yes, I know, and currently I'm doing it. But at some point I'd like to be able to enable CONFIG_XEN_VIRTIO without breaking the nested virt case. Ideally KVM virtio devices would use something like grants, and that thing would work even through nested virt, but I don't think that's strictly necessary for ensuring new security properties of virtio devices where they can be enforced. > > > > For me specifically, a command line option would work (because I don't > > > > use Xen-based virtio devices when nested under KVM, or anywhere at all, > > > > at least not yet), but I can see future cases where you have virtio > > > > devices from both L0 and L1 in the same guest, and then it wouldn't be > > > > that simple. > > > > > > Lets think of a general solution covering all PV devices (Xen and virtio). > > > > In fact, I wonder what's the security benefit of > > CONFIG_XEN_VIRTIO_FORCE_GRANT. If the backend lives in dom0 (or > > stubdomain), it can access whole guest memory anyway, whether frontend > > likes it or not. But if the backend is elsewhere (or guest is protected > > with AMD SEV-SNP, XSM or similar), then the backend won't be able to access > > memory outside of what frontend shares explicitly. So, in the non-dom0 case, > > backend trying to provide non-grant-based virtio device will simply not > > function (because of inability to access guest's memory), instead of > > gaining unintended access. Am I missing some implicit memory sharing > > here? > > You are missing the possibility to have a deprivileged user land virtio > backend. Ok, but that backend either can xenforeignmemory_map() arbitrary guest's page (in which case it doesn't matter if the frontend driver likes non-grant-based device or not), or it cannot (in which case, non-grant-based device will simply not work, backend still won't have access to the guest memory). Sure, the error reporting might be different, but I don't think it changes the outcome security-wise. > And BTW, SEV won't disable guest memory access, it
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On 05.10.22 17:35, Marek Marczykowski-Górecki wrote: On Wed, Oct 05, 2022 at 05:04:29PM +0200, Juergen Gross wrote: On 05.10.22 15:51, Marek Marczykowski-Górecki wrote: On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote: On 05.10.22 15:25, Marek Marczykowski-Górecki wrote: On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: Hi, When booting Xen with Linux dom0 nested under KVM, CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are required for virtio even if just CONFIG_XEN_VIRTIO is enabled. This is probably uncommon corner case, but one that has bitten me in my CI setup... I think Xen should set smarter virtio_require_restricted_mem_acc(), that enforces it only for devices really provided by another Xen VM (not by the "outer host"), but I'm not sure how that could be done. Any ideas? It should be possible to add a boot parameter for that purpose. Using it would open a security hole, though (basically like all PCI passthrough to PV guests). What about excluding just dom0? At least currently, there is no way for dom0 to see virtio devices provided by another Xen domU. Even not via hotplug? That's why I said "currently", IIUC hotplug of virtio devices under Xen doesn't work yet, no? With hotplug working, it would need to be a proper detection where the backend lives, and probably with some extra considerations re Xen on Xen (based on below, pv-shim could use grants). As stated before, this isn't a problem specific to virtio devices. The same applies to Xen PV devices. Why is that an issue for Xen PV devices? They always use grants, so no need for exception. But more relevant here, there is no protocol for L0 hypervisor (that would need to be Xen) to provide PV device to nested L1 guest (besides pv-shim case, which is already handled), so L1 guest cannot confuse PV device provided by L0 and L1. That's the point. Today using virtio the way you are using it is possible only because virtio devices don't have the security features of Xen PV devices. With adding grant support for virtio devices this has changed. BTW, you can have the old virtio behavior back by not enabling CONFIG_XEN_VIRTIO. For me specifically, a command line option would work (because I don't use Xen-based virtio devices when nested under KVM, or anywhere at all, at least not yet), but I can see future cases where you have virtio devices from both L0 and L1 in the same guest, and then it wouldn't be that simple. Lets think of a general solution covering all PV devices (Xen and virtio). In fact, I wonder what's the security benefit of CONFIG_XEN_VIRTIO_FORCE_GRANT. If the backend lives in dom0 (or stubdomain), it can access whole guest memory anyway, whether frontend likes it or not. But if the backend is elsewhere (or guest is protected with AMD SEV-SNP, XSM or similar), then the backend won't be able to access memory outside of what frontend shares explicitly. So, in the non-dom0 case, backend trying to provide non-grant-based virtio device will simply not function (because of inability to access guest's memory), instead of gaining unintended access. Am I missing some implicit memory sharing here? You are missing the possibility to have a deprivileged user land virtio backend. And BTW, SEV won't disable guest memory access, it will just make it impossible to interprete memory contents from outside. A malicious backend can still easily crash a SEV guest by clobbering its memory. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions
On 05.10.2022 16:12, Roger Pau Monné wrote: > Hm, I have the following diff attached below which is more limited, > and not so bad I think. Initially I wanted to introduce an > efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE > which is quite intrusive. > > Let me know if you think the proposal below is better and I will > formally send a patch with it. Hmm, personally I like this slightly better for, as you say, its more limited effect. Objectively, however, I'm still unconvinced of making this an EFI special case. How would non-EFI firmware go about communicating that it is going to access a device at runtime (which, as said before, I consider a no-go in the first place)? Likely by putting its BAR range(s) in an E820_RESERVED region. Plus the MMIO range covered on the system in question is pretty large. That way we're still allowing pretty wide an abuse by the firmware. Furthermore the MCFG range would also be covered by an EfiMemoryMappedIO descriptor (in fact that's the only use of the type I had been aware of so far). IOW the change specifically permits an overlap of a BAR with an MCFG range. Who's the manufacturer of the system? Or put in different words - how likely is it that we could first gain understanding on their intentions with this region? You did say the system hangs hard without some kind of workaround, but that doesn't clarify to me in how far a use of the device by the firmware was actually involved there. Have you considered other routes towards dealing with the issue? One approach coming to mind would build on top of what you've been doing so far (either variant): Besides avoiding the turning off of memory decode, also invoke pci_ro_device(), thus protecting it from having its BARs relocated, and also preventing any driver in Dom0 to gain control of it, thus avoiding two parties competing for the device. Relocating the BAR outside of the reserved region would be nice, but will likely not resolve the hang. In any event I'm still hoping to have a 3rd view on the situation as a whole, irrespective of specific ideas towards possible workarounds ... Independent of the above a couple of purely cosmetic comments: > @@ -98,3 +99,28 @@ int pci_conf_write_intercept(unsigned int seg, unsigned > int bdf, > > return rc; > } > + > +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) > +{ > +/* > + * Check if BAR is not overlapping with any memory region defined > + * in the memory map. > + */ > +if ( is_memory_hole(start, end) ) > +return true; > + > +/* > + * Also allow BARs placed on EfiMemoryMappedIO regions in order to deal > + * with EFI firmware using those regions to place the BARs of devices > that > + * can be used during runtime. But print a warning when doing so. > + */ > +if ( !efi_all_runtime_mmio(mfn_to_maddr(start), > + mfn_to_maddr(mfn_add(end, 1))) ) > +return false; > + > +printk(XENLOG_WARNING > + "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved > region\n", > + >sbdf, mfn_x(start), mfn_x(end)); Perhaps re-word the message now that the check is a different one? > --- a/xen/common/efi/runtime.c > +++ b/xen/common/efi/runtime.c > @@ -78,6 +78,30 @@ bool efi_enabled(unsigned int feature) > return test_bit(feature, _flags); > } > > +/* > + * This function checks if the entire range [start,end) is contained inside > of > + * a single EfiMemoryMappedIO descriptor with the runtime attribute set. > + */ > +bool efi_all_runtime_mmio(uint64_t start, uint64_t end) > +{ > +unsigned int i; > + > +for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size ) > +{ > +EFI_MEMORY_DESCRIPTOR *desc = efi_memmap + i; const? > +uint64_t len = desc->NumberOfPages << EFI_PAGE_SHIFT; > + > +if ( desc->Type != EfiMemoryMappedIO || > + !(desc->Attribute & EFI_MEMORY_RUNTIME) ) > +continue; > + > +if ( start >= desc->PhysicalStart && end <= desc->PhysicalStart + > len ) > +return true; > +} > + > +return false; > +} > + > #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */ Perhaps put the function inside this #ifdef? Jan
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On Wed, Oct 05, 2022 at 05:04:29PM +0200, Juergen Gross wrote: > On 05.10.22 15:51, Marek Marczykowski-Górecki wrote: > > On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote: > > > On 05.10.22 15:25, Marek Marczykowski-Górecki wrote: > > > > On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: > > > > > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: > > > > > > Hi, > > > > > > > > > > > > When booting Xen with Linux dom0 nested under KVM, > > > > > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio > > > > > > devices > > > > > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are > > > > > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled. > > > > > > > > > > > > This is probably uncommon corner case, but one that has bitten me > > > > > > in my > > > > > > CI setup... I think Xen should set smarter > > > > > > virtio_require_restricted_mem_acc(), that enforces it only for > > > > > > devices > > > > > > really provided by another Xen VM (not by the "outer host"), but > > > > > > I'm not > > > > > > sure how that could be done. Any ideas? > > > > > > > > > > > > > > > > It should be possible to add a boot parameter for that purpose. Using > > > > > it > > > > > would open a security hole, though (basically like all PCI > > > > > passthrough to > > > > > PV guests). > > > > > > > > What about excluding just dom0? At least currently, there is no way for > > > > dom0 to see virtio devices provided by another Xen domU. > > > > > > Even not via hotplug? > > > > That's why I said "currently", IIUC hotplug of virtio devices under Xen > > doesn't work yet, no? > > With hotplug working, it would need to be a proper detection where the > > backend lives, and probably with some extra considerations re Xen on > > Xen (based on below, pv-shim could use grants). > > As stated before, this isn't a problem specific to virtio devices. The same > applies to Xen PV devices. Why is that an issue for Xen PV devices? They always use grants, so no need for exception. But more relevant here, there is no protocol for L0 hypervisor (that would need to be Xen) to provide PV device to nested L1 guest (besides pv-shim case, which is already handled), so L1 guest cannot confuse PV device provided by L0 and L1. > > For me specifically, a command line option would work (because I don't > > use Xen-based virtio devices when nested under KVM, or anywhere at all, > > at least not yet), but I can see future cases where you have virtio > > devices from both L0 and L1 in the same guest, and then it wouldn't be > > that simple. > > Lets think of a general solution covering all PV devices (Xen and virtio). In fact, I wonder what's the security benefit of CONFIG_XEN_VIRTIO_FORCE_GRANT. If the backend lives in dom0 (or stubdomain), it can access whole guest memory anyway, whether frontend likes it or not. But if the backend is elsewhere (or guest is protected with AMD SEV-SNP, XSM or similar), then the backend won't be able to access memory outside of what frontend shares explicitly. So, in the non-dom0 case, backend trying to provide non-grant-based virtio device will simply not function (because of inability to access guest's memory), instead of gaining unintended access. Am I missing some implicit memory sharing here? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On 05.10.22 15:51, Marek Marczykowski-Górecki wrote: On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote: On 05.10.22 15:25, Marek Marczykowski-Górecki wrote: On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: Hi, When booting Xen with Linux dom0 nested under KVM, CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are required for virtio even if just CONFIG_XEN_VIRTIO is enabled. This is probably uncommon corner case, but one that has bitten me in my CI setup... I think Xen should set smarter virtio_require_restricted_mem_acc(), that enforces it only for devices really provided by another Xen VM (not by the "outer host"), but I'm not sure how that could be done. Any ideas? It should be possible to add a boot parameter for that purpose. Using it would open a security hole, though (basically like all PCI passthrough to PV guests). What about excluding just dom0? At least currently, there is no way for dom0 to see virtio devices provided by another Xen domU. Even not via hotplug? That's why I said "currently", IIUC hotplug of virtio devices under Xen doesn't work yet, no? With hotplug working, it would need to be a proper detection where the backend lives, and probably with some extra considerations re Xen on Xen (based on below, pv-shim could use grants). As stated before, this isn't a problem specific to virtio devices. The same applies to Xen PV devices. For me specifically, a command line option would work (because I don't use Xen-based virtio devices when nested under KVM, or anywhere at all, at least not yet), but I can see future cases where you have virtio devices from both L0 and L1 in the same guest, and then it wouldn't be that simple. Lets think of a general solution covering all PV devices (Xen and virtio). Something like this: ---8<--- diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 9b1a58dda935..6ac32b0b3720 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc); static void __init xen_pv_init_platform(void) { /* PV guests can't operate virtio devices without grants. */ - if (IS_ENABLED(CONFIG_XEN_VIRTIO)) + if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain()) virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc); populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP)); ---8<--- This BTW raises also a question what will happen on Xen nested inside Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0 wouldn't work on L0, no? Or maybe this is solved already for pv-shim case? This is a similar problem as with normal Xen PV devices. You will need either a simple grant passthrough like with pv-shim (enabling such devices for one guest in L1 only), or you need a grant multiplexer in L1 Xen in case you want to be able to have multiple guests in L1 being able to use L0 PV devices. This will be tricky, at least with the current frontend drivers. Frontend kernel is in charge of assigning grant refs, _and_ communicating them to the backend. Such multiplexer would need to intercept one or the other (either translate, or ensure they are allocated from distinct ranges to begin with). I don't see how that could be done with the current domU kernels. That might be better with your idea of multiple grant v3 trees, where the hypervisor might dictate grant ranges. Yes, this is another advantage of the V3 approach I haven't thought of before. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions
On Wed, Oct 05, 2022 at 10:53:38AM +0200, Jan Beulich wrote: > On 05.10.2022 10:37, Roger Pau Monné wrote: > > On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote: > >> On 04.10.2022 17:36, Roger Pau Monne wrote: > >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and > >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by > >>> EFI firmware. > >>> > >>> The current parsing of the EFI memory map is translating > >>> EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some > >>> boxes as the firmware is relying on using those regions to position > >>> the BARs of devices being used (possibly during runtime) for the > >>> firmware. > >>> > >>> Xen will disable memory decoding on any device that has BARs > >>> positioned over any regions on the e820 memory map, hence the firmware > >>> will malfunction after Xen turning off memory decoding for the > >>> device(s) that have BARs mapped in EfiMemoryMappedIO regions. > >>> > >>> The system under which this was observed has: > >>> > >>> EFI memory map: > >>> [...] > >>> 0fd00-0fe7f type=11 attr=8000100d > >>> [...] > >>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > >>> > >>> The device behind this BAR is: > >>> > >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI > >>> Controller (rev 09) > >>> Subsystem: Super Micro Computer Inc Device 091c > >>> Flags: fast devsel > >>> Memory at fe01 (32-bit, non-prefetchable) [size=4K]well > >>> > >>> For the record, the symptom observed in that machine was a hard freeze > >>> when attempting to set an EFI variable (XEN_EFI_set_variable). > >>> > >>> Fix by allowing BARs of PCI devices to be positioned over reserved > >>> memory regions, but print a warning message about such overlap. > >> > >> Somewhat hesitantly I might ack this change, but I'd like to give > >> others (Andrew in particular) some time to voice their views. As > >> said during the earlier discussion - I think we're relaxing things > >> too much by going this route. > > > > Another option would be to explicitly check in efi_memmap for > > EfiMemoryMappedIO regions and BAR overlap and only allow those. That > > would be more cumbersome code wise AFAICT. > > Indeed there's a question of balancing well here, between two outcomes > neither of which is really desirable. Hm, I have the following diff attached below which is more limited, and not so bad I think. Initially I wanted to introduce an efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE which is quite intrusive. Let me know if you think the proposal below is better and I will formally send a patch with it. Thanks, Roger. --- diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h index f4a58c8acf..c8e1a9ecdb 100644 --- a/xen/arch/x86/include/asm/pci.h +++ b/xen/arch/x86/include/asm/pci.h @@ -57,14 +57,4 @@ static always_inline bool is_pci_passthrough_enabled(void) void arch_pci_init_pdev(struct pci_dev *pdev); -static inline bool pci_check_bar(const struct pci_dev *pdev, - mfn_t start, mfn_t end) -{ -/* - * Check if BAR is not overlapping with any memory region defined - * in the memory map. - */ -return is_memory_hole(start, end); -} - #endif /* __X86_PCI_H__ */ diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c index 97b792e578..c3737e226d 100644 --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -4,6 +4,7 @@ * Architecture-dependent PCI access functions. */ +#include #include #include #include @@ -98,3 +99,28 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, return rc; } + +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) +{ +/* + * Check if BAR is not overlapping with any memory region defined + * in the memory map. + */ +if ( is_memory_hole(start, end) ) +return true; + +/* + * Also allow BARs placed on EfiMemoryMappedIO regions in order to deal + * with EFI firmware using those regions to place the BARs of devices that + * can be used during runtime. But print a warning when doing so. + */ +if ( !efi_all_runtime_mmio(mfn_to_maddr(start), + mfn_to_maddr(mfn_add(end, 1))) ) +return false; + +printk(XENLOG_WARNING + "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved region\n", + >sbdf, mfn_x(start), mfn_x(end)); + +return true; +} diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c index 13b0975866..b69c710ce3 100644 --- a/xen/common/efi/runtime.c +++ b/xen/common/efi/runtime.c @@ -78,6 +78,30 @@ bool efi_enabled(unsigned int feature) return test_bit(feature, _flags); } +/* + * This function checks if the entire range [start,end) is contained inside of + * a single EfiMemoryMappedIO descriptor with the runtime attribute set. + */ +bool
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On Wed, Oct 05, 2022 at 01:34:06PM +, Andrew Cooper wrote: > On 05/10/2022 10:55, Roger Pau Monné wrote: > > On Tue, Oct 04, 2022 at 05:08:09PM +0100, Andrew Cooper wrote: > >> A future change will want a cpuid-like identifier which doesn't have a > >> mapping > >> to a feature bit. > > Why we prefer this logic over just giving such pseudo features a > > synthetic bit or akin? > > Synthetic bits are (intentionally) not available to cmdline parsing. We > need something that is available for parsing. I think Jan expressed my view better, in that it would be nicer to just have the MSR_ARCH_CAPS bits in the featureset, and listed in cpufeatureset.h like we handle CPUID features. Maybe we want to go your proposed route now if that's easier however. Thanks, Roger.
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On Wed, Oct 05, 2022 at 03:34:56PM +0200, Juergen Gross wrote: > On 05.10.22 15:25, Marek Marczykowski-Górecki wrote: > > On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: > > > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: > > > > Hi, > > > > > > > > When booting Xen with Linux dom0 nested under KVM, > > > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices > > > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are > > > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled. > > > > > > > > This is probably uncommon corner case, but one that has bitten me in my > > > > CI setup... I think Xen should set smarter > > > > virtio_require_restricted_mem_acc(), that enforces it only for devices > > > > really provided by another Xen VM (not by the "outer host"), but I'm not > > > > sure how that could be done. Any ideas? > > > > > > > > > > It should be possible to add a boot parameter for that purpose. Using it > > > would open a security hole, though (basically like all PCI passthrough to > > > PV guests). > > > > What about excluding just dom0? At least currently, there is no way for > > dom0 to see virtio devices provided by another Xen domU. > > Even not via hotplug? That's why I said "currently", IIUC hotplug of virtio devices under Xen doesn't work yet, no? With hotplug working, it would need to be a proper detection where the backend lives, and probably with some extra considerations re Xen on Xen (based on below, pv-shim could use grants). For me specifically, a command line option would work (because I don't use Xen-based virtio devices when nested under KVM, or anywhere at all, at least not yet), but I can see future cases where you have virtio devices from both L0 and L1 in the same guest, and then it wouldn't be that simple. > > Something like this: > > ---8<--- > > diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c > > index 9b1a58dda935..6ac32b0b3720 100644 > > --- a/arch/x86/xen/enlighten_pv.c > > +++ b/arch/x86/xen/enlighten_pv.c > > @@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, > > shadow_tls_desc); > > static void __init xen_pv_init_platform(void) > > { > > /* PV guests can't operate virtio devices without grants. */ > > - if (IS_ENABLED(CONFIG_XEN_VIRTIO)) > > + if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain()) > > virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc); > > populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP)); > > ---8<--- > > > > This BTW raises also a question what will happen on Xen nested inside > > Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0 > > wouldn't work on L0, no? Or maybe this is solved already for pv-shim > > case? > > This is a similar problem as with normal Xen PV devices. > > You will need either a simple grant passthrough like with pv-shim (enabling > such devices for one guest in L1 only), or you need a grant multiplexer in > L1 Xen in case you want to be able to have multiple guests in L1 being able > to > use L0 PV devices. This will be tricky, at least with the current frontend drivers. Frontend kernel is in charge of assigning grant refs, _and_ communicating them to the backend. Such multiplexer would need to intercept one or the other (either translate, or ensure they are allocated from distinct ranges to begin with). I don't see how that could be done with the current domU kernels. That might be better with your idea of multiple grant v3 trees, where the hypervisor might dictate grant ranges. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On 05/10/2022 14:11, Jan Beulich wrote: > On 05.10.2022 14:58, Andrew Cooper wrote: >> On 05/10/2022 07:59, Jan Beulich wrote: >>> On 04.10.2022 18:08, Andrew Cooper wrote: A future change will want a cpuid-like identifier which doesn't have a mapping to a feature bit. * Pass the feature name into the parse callback. * Exclude a feature value of ~0u from falling into the general set/clear bit paths. * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature names. >>> Hmm, I was envisioning this to be fitted into the existing model in a >>> less adhoc way: (parts of) MSRs holding feature bits aren't very different >>> from individual (pairs of) registers of CPUID output (in the case of >>> ARCH_CAPS there would be a perhaps just abstract mask limiting things to >>> the subset of bits which actually act as feature indicators). Hence I'd >>> have expected them to gain proper entries in the public interface >>> (cpufeatureset.h) and then be represented / processed the same way in >>> featuresets and policies. All that would be left out at this point would >>> be the exposing of the bit to guests (in patch 2, assuming the split into >>> two patches is then actually still warranted), integration into >>> guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for >>> example, could easily learn of such right away). >>> >>> However, since I'm pretty sure you've considered such an approach, I guess >>> I might be overlooking some caveat? >> I have on multiple occasions considered putting MSR_ARCH_CAPS into the >> existing X86_FEATURE_* infrastructure. In the future, it's likely the >> right course of action to take. >> >> However, doing so now will break speculation safety for guests in some > Considering further text - s/speculation/migration/, I guess? More "speculation safety on migrate". Except its not just on migrate. It can go wrong for suspend/resume on the same host across a reboot which changes the microcode version. >> cases. The featureset API intended to be safe to treat as a regular >> bitmap, and this is how it is used in practice. >> >> Honestly, I didn't imagine that we'd get bits like RSBA and RRSBA that >> need to be treated with opposite polarity to be levelled safely. >> >> Even if we do end up putting MSR_ARCH_CAPS in X86_FEATURE_*, we still >> need to remove the featureset api (replaced by the cpu policy >> infrastructure and libx86) to retain migration safety. > Well, I didn't mean to suggest to put all of the feature-like bits there > right away. Just the one bit we're after would do for now. Afaict that > wouldn't affect migration safety, while it would allow doing things in > Xen in a more "natural" way. That requires reworking how the MSR policies get set up, patches for which have been stagnent on xen-devel, as well as a reasonable amount of plumbing because the featureset<->policy conversions which only have CPUID data right now. If I go down this route, realise it is going to have to go out in an XSA so will be backported, and that you're also committing to taking the VMX MSRs in due course. (which is all on my plan, but I don't want to start the work now and have an argument down the line.) ~Andrew
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On 05.10.22 15:25, Marek Marczykowski-Górecki wrote: On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: Hi, When booting Xen with Linux dom0 nested under KVM, CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are required for virtio even if just CONFIG_XEN_VIRTIO is enabled. This is probably uncommon corner case, but one that has bitten me in my CI setup... I think Xen should set smarter virtio_require_restricted_mem_acc(), that enforces it only for devices really provided by another Xen VM (not by the "outer host"), but I'm not sure how that could be done. Any ideas? It should be possible to add a boot parameter for that purpose. Using it would open a security hole, though (basically like all PCI passthrough to PV guests). What about excluding just dom0? At least currently, there is no way for dom0 to see virtio devices provided by another Xen domU. Even not via hotplug? Something like this: ---8<--- diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 9b1a58dda935..6ac32b0b3720 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc); static void __init xen_pv_init_platform(void) { /* PV guests can't operate virtio devices without grants. */ - if (IS_ENABLED(CONFIG_XEN_VIRTIO)) + if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain()) virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc); populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP)); ---8<--- This BTW raises also a question what will happen on Xen nested inside Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0 wouldn't work on L0, no? Or maybe this is solved already for pv-shim case? This is a similar problem as with normal Xen PV devices. You will need either a simple grant passthrough like with pv-shim (enabling such devices for one guest in L1 only), or you need a grant multiplexer in L1 Xen in case you want to be able to have multiple guests in L1 being able to use L0 PV devices. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On 05/10/2022 10:55, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 05:08:09PM +0100, Andrew Cooper wrote: >> A future change will want a cpuid-like identifier which doesn't have a >> mapping >> to a feature bit. > Why we prefer this logic over just giving such pseudo features a > synthetic bit or akin? Synthetic bits are (intentionally) not available to cmdline parsing. We need something that is available for parsing. > Could we have a bit more of a description about what would be > considered a pseudo feature? I don't really have anything further to add beyond the comment in gen-cpuid.py It's a misc collection of things requiring special handling. ~Andrew
Re: [PATCH 2/2] x86: Activate Data Operand Invariant Timing Mode by default
On 05/10/2022 13:09, Jan Beulich wrote: > On 05.10.2022 12:20, Roger Pau Monné wrote: >> On Tue, Oct 04, 2022 at 05:08:10PM +0100, Andrew Cooper wrote: >>> --- a/xen/arch/x86/cpu/common.c >>> +++ b/xen/arch/x86/cpu/common.c >>> @@ -209,6 +209,34 @@ void ctxt_switch_levelling(const struct vcpu *next) >>> alternative_vcall(ctxt_switch_masking, next); >>> } >>> >>> +bool __ro_after_init opt_doitm = true; >>> + >>> +static void doitm_init(void) >>> +{ >>> +uint64_t val; >>> + >>> +if ( !opt_doitm || !cpu_has_arch_caps ) >>> +return; >>> + >>> +rdmsrl(MSR_ARCH_CAPABILITIES, val); >>> +if ( !(val & ARCH_CAPS_DOITM) ) >>> +return; >>> + >>> +/* >>> + * We are currently unable to enumerate MSR_ARCH_CAPS to guest. As a >>> + * consequence, guest kernels will believe they're safe even when they >>> are >>> + * not. >>> + * >>> + * Until we can enumerate DOITM properly for guests, set it >>> unilaterally. >>> + * This prevents otherwise-correct crypto from becoming vulnerable to >>> + * timing sidechannels. >>> + */ >>> + >>> +rdmsrl(MSR_UARCH_MISC_CTRL, val); >>> +val |= UARCH_CTRL_DOITM; >>> +wrmsrl(MSR_UARCH_MISC_CTRL, val); >> Is it possible for the firmware to have enabled DOITM and Xen needing to >> clear it if !opt_doitm? > I think a firmware setup option is quite plausible to expect, such that > safety can also be achieved underneath an unaware OS. Note how in my > earlier patch I did specifically set the bit both ways, for this very > reason. Firmware is not likely to set it, but we should cope with the case when we're somewhere along a kexec chain. I'll adjust. ~Andrew
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On Wed, Oct 05, 2022 at 02:57:01PM +0200, Juergen Gross wrote: > On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: > > Hi, > > > > When booting Xen with Linux dom0 nested under KVM, > > CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices > > provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are > > required for virtio even if just CONFIG_XEN_VIRTIO is enabled. > > > > This is probably uncommon corner case, but one that has bitten me in my > > CI setup... I think Xen should set smarter > > virtio_require_restricted_mem_acc(), that enforces it only for devices > > really provided by another Xen VM (not by the "outer host"), but I'm not > > sure how that could be done. Any ideas? > > > > It should be possible to add a boot parameter for that purpose. Using it > would open a security hole, though (basically like all PCI passthrough to > PV guests). What about excluding just dom0? At least currently, there is no way for dom0 to see virtio devices provided by another Xen domU. Something like this: ---8<--- diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 9b1a58dda935..6ac32b0b3720 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -111,7 +111,7 @@ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc); static void __init xen_pv_init_platform(void) { /* PV guests can't operate virtio devices without grants. */ - if (IS_ENABLED(CONFIG_XEN_VIRTIO)) + if (IS_ENABLED(CONFIG_XEN_VIRTIO) && !xen_initial_domain()) virtio_set_mem_acc_cb(virtio_require_restricted_mem_acc); populate_extra_pte(fix_to_virt(FIX_PARAVIRT_BOOTMAP)); ---8<--- This BTW raises also a question what will happen on Xen nested inside Xen, when L0 will provide virtio devices to L1. Grants set by L1 dom0 wouldn't work on L0, no? Or maybe this is solved already for pv-shim case? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
[xen-unstable-smoke test] 173428: tolerable all pass - PUSHED
flight 173428 xen-unstable-smoke real [real] http://logs.test-lab.xenproject.org/osstest/logs/173428/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass version targeted for testing: xen 66a5633aa038f4abb4455463755974febac69034 baseline version: xen 3161231abcb461314b205337fbd0530c7ead1696 Last test of basis 173410 2022-10-04 01:01:52 Z1 days Testing same since 173428 2022-10-05 09:00:30 Z0 days1 attempts People who touched revisions under test: Jan Beulich Roger Pau Monné jobs: build-arm64-xsm pass build-amd64 pass build-armhf pass build-amd64-libvirt pass test-armhf-armhf-xl pass test-arm64-arm64-xl-xsm pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-amd64-libvirt pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/xen.git 3161231abc..66a5633aa0 66a5633aa038f4abb4455463755974febac69034 -> smoke
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On 05.10.2022 14:58, Andrew Cooper wrote: > On 05/10/2022 07:59, Jan Beulich wrote: >> On 04.10.2022 18:08, Andrew Cooper wrote: >>> A future change will want a cpuid-like identifier which doesn't have a >>> mapping >>> to a feature bit. >>> >>> * Pass the feature name into the parse callback. >>> * Exclude a feature value of ~0u from falling into the general set/clear >>> bit >>>paths. >>> * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature >>>names. >> Hmm, I was envisioning this to be fitted into the existing model in a >> less adhoc way: (parts of) MSRs holding feature bits aren't very different >> from individual (pairs of) registers of CPUID output (in the case of >> ARCH_CAPS there would be a perhaps just abstract mask limiting things to >> the subset of bits which actually act as feature indicators). Hence I'd >> have expected them to gain proper entries in the public interface >> (cpufeatureset.h) and then be represented / processed the same way in >> featuresets and policies. All that would be left out at this point would >> be the exposing of the bit to guests (in patch 2, assuming the split into >> two patches is then actually still warranted), integration into >> guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for >> example, could easily learn of such right away). >> >> However, since I'm pretty sure you've considered such an approach, I guess >> I might be overlooking some caveat? > > I have on multiple occasions considered putting MSR_ARCH_CAPS into the > existing X86_FEATURE_* infrastructure. In the future, it's likely the > right course of action to take. > > However, doing so now will break speculation safety for guests in some Considering further text - s/speculation/migration/, I guess? > cases. The featureset API intended to be safe to treat as a regular > bitmap, and this is how it is used in practice. > > Honestly, I didn't imagine that we'd get bits like RSBA and RRSBA that > need to be treated with opposite polarity to be levelled safely. > > Even if we do end up putting MSR_ARCH_CAPS in X86_FEATURE_*, we still > need to remove the featureset api (replaced by the cpu policy > infrastructure and libx86) to retain migration safety. Well, I didn't mean to suggest to put all of the feature-like bits there right away. Just the one bit we're after would do for now. Afaict that wouldn't affect migration safety, while it would allow doing things in Xen in a more "natural" way. Jan
Re: CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
On 05.10.22 14:41, Marek Marczykowski-Górecki wrote: Hi, When booting Xen with Linux dom0 nested under KVM, CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are required for virtio even if just CONFIG_XEN_VIRTIO is enabled. This is probably uncommon corner case, but one that has bitten me in my CI setup... I think Xen should set smarter virtio_require_restricted_mem_acc(), that enforces it only for devices really provided by another Xen VM (not by the "outer host"), but I'm not sure how that could be done. Any ideas? It should be possible to add a boot parameter for that purpose. Using it would open a security hole, though (basically like all PCI passthrough to PV guests). Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On 05/10/2022 07:59, Jan Beulich wrote: > On 04.10.2022 18:08, Andrew Cooper wrote: >> A future change will want a cpuid-like identifier which doesn't have a >> mapping >> to a feature bit. >> >> * Pass the feature name into the parse callback. >> * Exclude a feature value of ~0u from falling into the general set/clear bit >>paths. >> * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature >>names. > Hmm, I was envisioning this to be fitted into the existing model in a > less adhoc way: (parts of) MSRs holding feature bits aren't very different > from individual (pairs of) registers of CPUID output (in the case of > ARCH_CAPS there would be a perhaps just abstract mask limiting things to > the subset of bits which actually act as feature indicators). Hence I'd > have expected them to gain proper entries in the public interface > (cpufeatureset.h) and then be represented / processed the same way in > featuresets and policies. All that would be left out at this point would > be the exposing of the bit to guests (in patch 2, assuming the split into > two patches is then actually still warranted), integration into > guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for > example, could easily learn of such right away). > > However, since I'm pretty sure you've considered such an approach, I guess > I might be overlooking some caveat? I have on multiple occasions considered putting MSR_ARCH_CAPS into the existing X86_FEATURE_* infrastructure. In the future, it's likely the right course of action to take. However, doing so now will break speculation safety for guests in some cases. The featureset API intended to be safe to treat as a regular bitmap, and this is how it is used in practice. Honestly, I didn't imagine that we'd get bits like RSBA and RRSBA that need to be treated with opposite polarity to be levelled safely. Even if we do end up putting MSR_ARCH_CAPS in X86_FEATURE_*, we still need to remove the featureset api (replaced by the cpu policy infrastructure and libx86) to retain migration safety. > >> --- a/xen/tools/gen-cpuid.py >> +++ b/xen/tools/gen-cpuid.py >> @@ -297,6 +297,19 @@ def crunch_numbers(state): >> RTM: [TSXLDTRK], >> } >> >> +# >> +# Pseudo feature names. These don't map to a feature bit, but are >> +# inserted into the values dictionary so they can be parsed and handled >> +# specially >> +# >> +pseduo_names = ( >> +) >> + >> +for n in pseduo_names: >> +if n in state.values: >> +raise Fail("Pseduo feature name %s aliases real feature" % (n, >> )) >> +state.values[n] = 0x > Throughout this hunk: s/pseduo/pseudo/g. Oops, yes. Will fix. ~Andrew
CONFIG_XEN_VIRTIO{_FORCE_GRANT} interferes with nested virt
Hi, When booting Xen with Linux dom0 nested under KVM, CONFIG_XEN_VIRTIO_FORCE_GRANT=y makes it unable to use virtio devices provided by L0 hypervisor (KVM with qemu). With PV dom0, grants are required for virtio even if just CONFIG_XEN_VIRTIO is enabled. This is probably uncommon corner case, but one that has bitten me in my CI setup... I think Xen should set smarter virtio_require_restricted_mem_acc(), that enforces it only for devices really provided by another Xen VM (not by the "outer host"), but I'm not sure how that could be done. Any ideas? -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab signature.asc Description: PGP signature
Re: [PATCH 2/2] x86: Activate Data Operand Invariant Timing Mode by default
On 05.10.2022 12:20, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 05:08:10PM +0100, Andrew Cooper wrote: >> --- a/xen/arch/x86/cpu/common.c >> +++ b/xen/arch/x86/cpu/common.c >> @@ -209,6 +209,34 @@ void ctxt_switch_levelling(const struct vcpu *next) >> alternative_vcall(ctxt_switch_masking, next); >> } >> >> +bool __ro_after_init opt_doitm = true; >> + >> +static void doitm_init(void) >> +{ >> +uint64_t val; >> + >> +if ( !opt_doitm || !cpu_has_arch_caps ) >> +return; >> + >> +rdmsrl(MSR_ARCH_CAPABILITIES, val); >> +if ( !(val & ARCH_CAPS_DOITM) ) >> +return; >> + >> +/* >> + * We are currently unable to enumerate MSR_ARCH_CAPS to guest. As a >> + * consequence, guest kernels will believe they're safe even when they >> are >> + * not. >> + * >> + * Until we can enumerate DOITM properly for guests, set it >> unilaterally. >> + * This prevents otherwise-correct crypto from becoming vulnerable to >> + * timing sidechannels. >> + */ >> + >> +rdmsrl(MSR_UARCH_MISC_CTRL, val); >> +val |= UARCH_CTRL_DOITM; >> +wrmsrl(MSR_UARCH_MISC_CTRL, val); > > Is it possible for the firmware to have enabled DOITM and Xen needing to > clear it if !opt_doitm? I think a firmware setup option is quite plausible to expect, such that safety can also be achieved underneath an unaware OS. Note how in my earlier patch I did specifically set the bit both ways, for this very reason. Jan
Re: [PATCH v3 2/4] xen/pv: fix vendor checks for pmu emulation
On 05.10.2022 13:03, Juergen Gross wrote: > The CPU vendor checks for pmu emulation are rather limited today, as > the assumption seems to be that only Intel and AMD are existing and/or > supported vendors. > > Fix that by handling Centaur and Zhaoxin CPUs the same way as Intel, > and Hygon the same way as AMD. > > While at it fix the return type of is_intel_pmu_msr(). > > Suggested-by: Jan Beulich > Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich
Re: [PATCH v3 1/4] xen/pv: add fault recovery control to pmu msr accesses
On 05.10.2022 13:02, Juergen Gross wrote: > Today pmu_msr_read() and pmu_msr_write() fall back to the safe variants > of read/write MSR in case the MSR access isn't emulated via Xen. Allow > the caller to select that faults should not be recovered from by passing > NULL for the error pointer. > > Restructure the code to make it more readable. > > Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich
Re: [PATCH][4.17] EFI: don't convert memory marked for runtime use to ordinary RAM
On 05.10.2022 12:44, Julien Grall wrote: > On 04/10/2022 16:58, Jan Beulich wrote: >> On 30.09.2022 14:51, Bertrand Marquis wrote: On 30 Sep 2022, at 09:50, Jan Beulich wrote: efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME higher priority than the type of the range. To avoid accessing memory at runtime which was re-used for other purposes, make efi_arch_process_memory_map() follow suit. While on x86 in theory the same would apply to EfiACPIReclaimMemory, we don't actually "reclaim" E820_ACPI memory there and hence that type's handling can be left alone. Fixes: bf6501a62e80 ("x86-64: EFI boot code") Fixes: facac0af87ef ("x86-64: EFI runtime code") Fixes: 6d70ea10d49f ("Add ARM EFI boot support") Signed-off-by: Jan Beulich >>> >>> Reviewed-by: Bertrand Marquis #arm >> >> Thanks. However ... >> --- Partly RFC for Arm, for two reasons: On Arm I question the conversion of EfiACPIReclaimMemory, in two ways: For one like on x86 such ranges would likely better be retained, as Dom0 may (will?) have a need to look at tables placed there. Plus converting such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to me as well. I'd be inclined to make the latter adjustment right here (while the other change probably would better be separate, if there aren't actually reasons for the present behavior). >> >> ... any views on this WB aspect at least (also Stefano or Julien)? Would be >> good to know before I send v2. > > I don't quite understand what you are questioning here. Looking at the > code, EfiACPIReclaimMemory will not be converted to RAM but added in a > separate array. > > Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 > (see acpi_create_efi_mmap_table()). > > So to me the code looks correct. Oh, I've indeed not paid enough attention to the first argument passed to meminfo_add_bank(). I'm sorry for the extra noise. However, the question I wanted to have addressed before sending out v3 was that regarding the present using of memory when EFI_MEMORY_WB is not set. Is that correct for the EfiACPIReclaimMemory case, i.e. is the consumer (Dom0) aware that there might be a restriction? And would this memory then be guaranteed to never be freed into the general pool of RAM pages? Jan
[PATCH v3 4/4] xen/pv: support selecting safe/unsafe msr accesses
Instead of always doing the safe variants for reading and writing MSRs in Xen PV guests, make the behavior controllable via Kconfig option and a boot parameter. The default will be the current behavior, which is to always use the safe variant. Signed-off-by: Juergen Gross --- .../admin-guide/kernel-parameters.txt | 6 + arch/x86/xen/Kconfig | 9 +++ arch/x86/xen/enlighten_pv.c | 24 +++ 3 files changed, 29 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 426fa892d311..1bda9cf18fae 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6836,6 +6836,12 @@ Crash from Xen panic notifier, without executing late panic() code such as dumping handler. + xen_msr_safe= [X86,XEN] + Format: + Select whether to always use non-faulting (safe) MSR + access functions when running as Xen PV guest. The + default value is controlled by CONFIG_XEN_PV_MSR_SAFE. + xen_nopvspin[X86,XEN] Disables the qspinlock slowpath using Xen PV optimizations. This parameter is obsoleted by "nopvspin" parameter, which diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 85246dd9faa1..9b1ec5d8c99c 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig @@ -92,3 +92,12 @@ config XEN_DOM0 select X86_X2APIC if XEN_PVH && X86_64 help Support running as a Xen Dom0 guest. + +config XEN_PV_MSR_SAFE + bool "Always use safe MSR accesses in PV guests" + default y + depends on XEN_PV + help + Use safe (not faulting) MSR access functions even if the MSR access + should not fault anyway. + The default can be changed by using the "xen_msr_safe" boot parameter. diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index d5b0844a1b7c..daae454191f2 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -108,6 +108,16 @@ struct tls_descs { */ static DEFINE_PER_CPU(struct tls_descs, shadow_tls_desc); +static __read_mostly bool xen_msr_safe = IS_ENABLED(CONFIG_XEN_PV_MSR_SAFE); + +static int __init parse_xen_msr_safe(char *str) +{ + if (str) + return strtobool(str, _msr_safe); + return -EINVAL; +} +early_param("xen_msr_safe", parse_xen_msr_safe); + static void __init xen_pv_init_platform(void) { /* PV guests can't operate virtio devices without grants. */ @@ -1011,22 +1021,16 @@ static int xen_write_msr_safe(unsigned int msr, unsigned int low, static u64 xen_read_msr(unsigned int msr) { - /* -* This will silently swallow a #GP from RDMSR. It may be worth -* changing that. -*/ int err; - return xen_read_msr_safe(msr, ); + return xen_do_read_msr(msr, xen_msr_safe ? : NULL); } static void xen_write_msr(unsigned int msr, unsigned low, unsigned high) { - /* -* This will silently swallow a #GP from WRMSR. It may be worth -* changing that. -*/ - xen_write_msr_safe(msr, low, high); + int err; + + xen_do_write_msr(msr, low, high, xen_msr_safe ? : NULL); } /* This is called once we have the cpu_possible_mask */ -- 2.35.3
[PATCH v3 3/4] xen/pv: refactor msr access functions to support safe and unsafe accesses
Refactor and rename xen_read_msr_safe() and xen_write_msr_safe() to support both cases of MSR accesses, safe ones and potentially GP-fault generating ones. This will prepare to no longer swallow GPs silently in xen_read_msr() and xen_write_msr(). Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich --- V2: - init val in xen_do_read_msr() to 0 (Jan Beulich) --- arch/x86/xen/enlighten_pv.c | 75 +++-- 1 file changed, 56 insertions(+), 19 deletions(-) diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 9b1a58dda935..d5b0844a1b7c 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -917,14 +917,18 @@ static void xen_write_cr4(unsigned long cr4) native_write_cr4(cr4); } -static u64 xen_read_msr_safe(unsigned int msr, int *err) +static u64 xen_do_read_msr(unsigned int msr, int *err) { - u64 val; + u64 val = 0;/* Avoid uninitialized value for safe variant. */ if (pmu_msr_read(msr, , err)) return val; - val = native_read_msr_safe(msr, err); + if (err) + val = native_read_msr_safe(msr, err); + else + val = native_read_msr(msr); + switch (msr) { case MSR_IA32_APICBASE: val &= ~X2APIC_ENABLE; @@ -933,23 +937,39 @@ static u64 xen_read_msr_safe(unsigned int msr, int *err) return val; } -static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) +static void set_seg(unsigned int which, unsigned int low, unsigned int high, + int *err) { - int ret; - unsigned int which; - u64 base; + u64 base = ((u64)high << 32) | low; + + if (HYPERVISOR_set_segment_base(which, base) == 0) + return; - ret = 0; + if (err) + *err = -EIO; + else + WARN(1, "Xen set_segment_base(%u, %llx) failed\n", which, base); +} +/* + * Support write_msr_safe() and write_msr() semantics. + * With err == NULL write_msr() semantics are selected. + * Supplying an err pointer requires err to be pre-initialized with 0. + */ +static void xen_do_write_msr(unsigned int msr, unsigned int low, +unsigned int high, int *err) +{ switch (msr) { - case MSR_FS_BASE: which = SEGBASE_FS; goto set; - case MSR_KERNEL_GS_BASE:which = SEGBASE_GS_USER; goto set; - case MSR_GS_BASE: which = SEGBASE_GS_KERNEL; goto set; - - set: - base = ((u64)high << 32) | low; - if (HYPERVISOR_set_segment_base(which, base) != 0) - ret = -EIO; + case MSR_FS_BASE: + set_seg(SEGBASE_FS, low, high, err); + break; + + case MSR_KERNEL_GS_BASE: + set_seg(SEGBASE_GS_USER, low, high, err); + break; + + case MSR_GS_BASE: + set_seg(SEGBASE_GS_KERNEL, low, high, err); break; case MSR_STAR: @@ -965,11 +985,28 @@ static int xen_write_msr_safe(unsigned int msr, unsigned low, unsigned high) break; default: - if (!pmu_msr_write(msr, low, high, )) - ret = native_write_msr_safe(msr, low, high); + if (!pmu_msr_write(msr, low, high, err)) { + if (err) + *err = native_write_msr_safe(msr, low, high); + else + native_write_msr(msr, low, high); + } } +} + +static u64 xen_read_msr_safe(unsigned int msr, int *err) +{ + return xen_do_read_msr(msr, err); +} + +static int xen_write_msr_safe(unsigned int msr, unsigned int low, + unsigned int high) +{ + int err = 0; + + xen_do_write_msr(msr, low, high, ); - return ret; + return err; } static u64 xen_read_msr(unsigned int msr) -- 2.35.3
[PATCH v3 2/4] xen/pv: fix vendor checks for pmu emulation
The CPU vendor checks for pmu emulation are rather limited today, as the assumption seems to be that only Intel and AMD are existing and/or supported vendors. Fix that by handling Centaur and Zhaoxin CPUs the same way as Intel, and Hygon the same way as AMD. While at it fix the return type of is_intel_pmu_msr(). Suggested-by: Jan Beulich Signed-off-by: Juergen Gross --- V3: - new patch --- arch/x86/xen/pmu.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c index 0f98cb1077e3..68aff1382872 100644 --- a/arch/x86/xen/pmu.c +++ b/arch/x86/xen/pmu.c @@ -131,7 +131,8 @@ static inline uint32_t get_fam15h_addr(u32 addr) static inline bool is_amd_pmu_msr(unsigned int msr) { - if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD && + boot_cpu_data.x86_vendor != X86_VENDOR_HYGON) return false; if ((msr >= MSR_F15H_PERF_CTL && @@ -143,11 +144,13 @@ static inline bool is_amd_pmu_msr(unsigned int msr) return false; } -static int is_intel_pmu_msr(u32 msr_index, int *type, int *index) +static bool is_intel_pmu_msr(u32 msr_index, int *type, int *index) { u32 msr_index_pmc; - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL && + boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR && + boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN) return false; switch (msr_index) { -- 2.35.3
[PATCH v3 0/4] xen/pv: sanitize xen pv guest msr accesses
Historically when running as Xen PV guest all MSR accesses have been silently swallowing any GP faults, even when the kernel was using not the *msr_safe() access functions. Change that by making the behavior controllable via kernel config and via a boot parameter. This will help finding paths where MSRs are being accessed under Xen which are not emulated by the hypervisor. Changes in V3: - new patch 2 - addressed comments Juergen Gross (4): xen/pv: add fault recovery control to pmu msr accesses xen/pv: fix vendor checks for pmu emulation xen/pv: refactor msr access functions to support safe and unsafe accesses xen/pv: support selecting safe/unsafe msr accesses .../admin-guide/kernel-parameters.txt | 6 ++ arch/x86/xen/Kconfig | 9 ++ arch/x86/xen/enlighten_pv.c | 99 +-- arch/x86/xen/pmu.c| 71 +++-- 4 files changed, 127 insertions(+), 58 deletions(-) -- 2.35.3
[PATCH v3 1/4] xen/pv: add fault recovery control to pmu msr accesses
Today pmu_msr_read() and pmu_msr_write() fall back to the safe variants of read/write MSR in case the MSR access isn't emulated via Xen. Allow the caller to select that faults should not be recovered from by passing NULL for the error pointer. Restructure the code to make it more readable. Signed-off-by: Juergen Gross --- V2: - do some restructuring (Jan Beulich, Boris Ostrovsky) V3: - commit message rephrasing (Jan Beulich) - more restructuring (Boris Ostrovsky) --- arch/x86/xen/pmu.c | 66 ++ 1 file changed, 38 insertions(+), 28 deletions(-) diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c index 21ecbe754cb2..0f98cb1077e3 100644 --- a/arch/x86/xen/pmu.c +++ b/arch/x86/xen/pmu.c @@ -131,6 +131,9 @@ static inline uint32_t get_fam15h_addr(u32 addr) static inline bool is_amd_pmu_msr(unsigned int msr) { + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) + return false; + if ((msr >= MSR_F15H_PERF_CTL && msr < MSR_F15H_PERF_CTR + (amd_num_counters * 2)) || (msr >= MSR_K7_EVNTSEL0 && @@ -144,6 +147,9 @@ static int is_intel_pmu_msr(u32 msr_index, int *type, int *index) { u32 msr_index_pmc; + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return false; + switch (msr_index) { case MSR_CORE_PERF_FIXED_CTR_CTRL: case MSR_IA32_DS_AREA: @@ -290,48 +296,52 @@ static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read) return false; } +static bool pmu_msr_chk_emulated(unsigned int msr, uint64_t *val, bool is_read, +bool *emul) +{ + int type, index; + + if (is_amd_pmu_msr(msr)) + *emul = xen_amd_pmu_emulate(msr, val, is_read); + else if (is_intel_pmu_msr(msr, , )) + *emul = xen_intel_pmu_emulate(msr, val, type, index, is_read); + else + return false; + + return true; +} + bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err) { - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) { - if (is_amd_pmu_msr(msr)) { - if (!xen_amd_pmu_emulate(msr, val, 1)) - *val = native_read_msr_safe(msr, err); - return true; - } - } else { - int type, index; + bool emulated; - if (is_intel_pmu_msr(msr, , )) { - if (!xen_intel_pmu_emulate(msr, val, type, index, 1)) - *val = native_read_msr_safe(msr, err); - return true; - } + if (!pmu_msr_chk_emulated(msr, val, true, )) + return false; + + if (!emulated) { + *val = err ? native_read_msr_safe(msr, err) + : native_read_msr(msr); } - return false; + return true; } bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err) { uint64_t val = ((uint64_t)high << 32) | low; + bool emulated; - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) { - if (is_amd_pmu_msr(msr)) { - if (!xen_amd_pmu_emulate(msr, , 0)) - *err = native_write_msr_safe(msr, low, high); - return true; - } - } else { - int type, index; + if (!pmu_msr_chk_emulated(msr, , false, )) + return false; - if (is_intel_pmu_msr(msr, , )) { - if (!xen_intel_pmu_emulate(msr, , type, index, 0)) - *err = native_write_msr_safe(msr, low, high); - return true; - } + if (!emulated) { + if (err) + *err = native_write_msr_safe(msr, low, high); + else + native_write_msr(msr, low, high); } - return false; + return true; } static unsigned long long xen_amd_read_pmc(int counter) -- 2.35.3
[libvirt test] 173423: tolerable all pass - PUSHED
flight 173423 libvirt real [real] http://logs.test-lab.xenproject.org/osstest/logs/173423/ Failures :-/ but no regressions. Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 173413 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173413 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173413 test-amd64-i386-libvirt-xsm 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass version targeted for testing: libvirt 92f7aafced8d354cead03e50e1e7d57a99d29435 baseline version: libvirt d6245e36c24969efcaa835ce6e9a9c953f47cc87 Last test of basis 173413 2022-10-04 04:25:20 Z1 days Testing same since 173423 2022-10-05 04:20:23 Z0 days1 attempts People who touched revisions under test: Christian Ehrhardt Michal Privoznik Peter Krempa Stefan Berger jobs: build-amd64-xsm pass build-arm64-xsm pass build-i386-xsm pass build-amd64 pass build-arm64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-arm64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-arm64-pvopspass build-armhf-pvopspass build-i386-pvops pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass test-amd64-amd64-libvirt-xsm pass test-arm64-arm64-libvirt-xsm pass test-amd64-i386-libvirt-xsm pass test-amd64-amd64-libvirt pass test-arm64-arm64-libvirt pass test-armhf-armhf-libvirt pass test-amd64-i386-libvirt pass test-amd64-amd64-libvirt-pairpass test-amd64-i386-libvirt-pair pass test-arm64-arm64-libvirt-qcow2 pass test-armhf-armhf-libvirt-qcow2 pass test-arm64-arm64-libvirt-raw pass test-armhf-armhf-libvirt-raw pass test-amd64-i386-libvirt-raw pass test-amd64-amd64-libvirt-vhd pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs
Re: [PATCH][4.17] EFI: don't convert memory marked for runtime use to ordinary RAM
Hi Jan, On 04/10/2022 16:58, Jan Beulich wrote: On 30.09.2022 14:51, Bertrand Marquis wrote: On 30 Sep 2022, at 09:50, Jan Beulich wrote: efi_init_memory() in both relevant places is treating EFI_MEMORY_RUNTIME higher priority than the type of the range. To avoid accessing memory at runtime which was re-used for other purposes, make efi_arch_process_memory_map() follow suit. While on x86 in theory the same would apply to EfiACPIReclaimMemory, we don't actually "reclaim" E820_ACPI memory there and hence that type's handling can be left alone. Fixes: bf6501a62e80 ("x86-64: EFI boot code") Fixes: facac0af87ef ("x86-64: EFI runtime code") Fixes: 6d70ea10d49f ("Add ARM EFI boot support") Signed-off-by: Jan Beulich Reviewed-by: Bertrand Marquis #arm Thanks. However ... --- Partly RFC for Arm, for two reasons: On Arm I question the conversion of EfiACPIReclaimMemory, in two ways: For one like on x86 such ranges would likely better be retained, as Dom0 may (will?) have a need to look at tables placed there. Plus converting such ranges to RAM even if EFI_MEMORY_WB is not set looks suspicious to me as well. I'd be inclined to make the latter adjustment right here (while the other change probably would better be separate, if there aren't actually reasons for the present behavior). ... any views on this WB aspect at least (also Stefano or Julien)? Would be good to know before I send v2. I don't quite understand what you are questioning here. Looking at the code, EfiACPIReclaimMemory will not be converted to RAM but added in a separate array. Furthermore, all the EfiACPIReclaimMemory regions will be passed to dom0 (see acpi_create_efi_mmap_table()). So to me the code looks correct. Cheers, -- Julien Grall
Re: [PATCH 2/2] x86: Activate Data Operand Invariant Timing Mode by default
On Tue, Oct 04, 2022 at 05:08:10PM +0100, Andrew Cooper wrote: > Intel IceLake and later CPUs have microarchitectural behaviours which cause > data-dependent timing behaviour. This is not an issue for 99% of software, > but it is a problem for cryptography routines. On these CPUs, a new > architectural feature, DOITM, was retrofitted in microcode. > > For now, Xen can't enumerate DOITM to guest kernels; getting this working is > still in progress. The consequence is that guest kernels will incorrectly > conclude that they are safe. > > To maintain the safety of current software, activate DOITM unilaterally. This > will be relaxed in the future when we can enumerate the feature properly to > guests. > > As an emergency stopgap, this behaviour can be disabled by specifying > `cpuid=no-doitm` on Xen's command line, but is not guaranteed ABI moving > forward. > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Roger Pau Monné > CC: Wei Liu > CC: Henry Wang > CC: Marek Marczykowski-Górecki > CC: Demi Marie Obenour > --- > xen/arch/x86/cpu/common.c| 29 + > xen/arch/x86/cpuid.c | 5 + > xen/arch/x86/include/asm/processor.h | 2 ++ > xen/tools/gen-cpuid.py | 2 ++ > 4 files changed, 38 insertions(+) > > diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c > index 0412dbc915e5..8c46a4db430a 100644 > --- a/xen/arch/x86/cpu/common.c > +++ b/xen/arch/x86/cpu/common.c > @@ -209,6 +209,34 @@ void ctxt_switch_levelling(const struct vcpu *next) > alternative_vcall(ctxt_switch_masking, next); > } > > +bool __ro_after_init opt_doitm = true; > + > +static void doitm_init(void) > +{ > +uint64_t val; > + > +if ( !opt_doitm || !cpu_has_arch_caps ) > +return; > + > +rdmsrl(MSR_ARCH_CAPABILITIES, val); > +if ( !(val & ARCH_CAPS_DOITM) ) > +return; > + > +/* > + * We are currently unable to enumerate MSR_ARCH_CAPS to guest. As a > + * consequence, guest kernels will believe they're safe even when they > are > + * not. > + * > + * Until we can enumerate DOITM properly for guests, set it unilaterally. > + * This prevents otherwise-correct crypto from becoming vulnerable to > + * timing sidechannels. > + */ > + > +rdmsrl(MSR_UARCH_MISC_CTRL, val); > +val |= UARCH_CTRL_DOITM; > +wrmsrl(MSR_UARCH_MISC_CTRL, val); Is it possible for the firmware to have enabled DOITM and Xen needing to clear it if !opt_doitm? Thanks, Roger.
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On Tue, Oct 04, 2022 at 05:08:09PM +0100, Andrew Cooper wrote: > A future change will want a cpuid-like identifier which doesn't have a mapping > to a feature bit. Why we prefer this logic over just giving such pseudo features a synthetic bit or akin? Could we have a bit more of a description about what would be considered a pseudo feature? Thanks, Roger.
[xen-unstable test] 173422: tolerable FAIL
flight 173422 xen-unstable real [real] http://logs.test-lab.xenproject.org/osstest/logs/173422/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-amd64-i386-libvirt-xsm 7 xen-installfail pass in 173416 Tests which did not succeed, but are not blocking: test-amd64-i386-libvirt-xsm 15 migrate-support-check fail in 173416 never pass test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173416 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 173416 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173416 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173416 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173416 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 173416 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 173416 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173416 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 173416 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173416 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 173416 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173416 test-amd64-i386-xl-pvshim14 guest-start fail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-i386-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-amd64-i386-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl 15 migrate-support-checkfail never pass test-armhf-armhf-xl 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass version
[PATCH v4 0/7] Make balloon drivers memory changes known to the rest of the kernel
Currently balloon drivers (Virtio,XEN, HyperV, VMWare, ...) inflate and deflate the guest memory size but there is no way to know how much the memory size is changed by them. A common use of the ballooning is to emulate [1] hot plug and hot unplug - due to the complexity of the later. Hotplug has a notifier and one can also check the updated memory size. To improve this add InflatedTotal and InflatedFree to /proc/meminfo and implement a balloon notifier. Amount of inflated memory can be used: - si_meminfo(..) users can improve calculations - adjust cache/buffer sizes - adjust object/connection limits - as a hint for the oom a killer - by user space software that monitors memory pressure Alexander Atanasov (7): Make place for common balloon code Enable balloon drivers to report inflated memory Display inflated memory to users drivers: virtio: balloon - update inflated memory Display inflated memory in logs drivers: vmware: balloon - report inflated memory drivers: hyperv: balloon - report inflated memory Documentation/filesystems/proc.rst| 6 +++ MAINTAINERS | 4 +- arch/powerpc/platforms/pseries/cmm.c | 2 +- drivers/hv/hv_balloon.c | 11 + drivers/misc/vmw_balloon.c| 6 ++- drivers/virtio/virtio_balloon.c | 7 +++- fs/proc/meminfo.c | 10 + .../linux/{balloon_compaction.h => balloon.h} | 30 ++ lib/show_mem.c| 8 mm/Makefile | 2 +- mm/{balloon_compaction.c => balloon.c}| 40 +-- mm/migrate.c | 1 - mm/vmscan.c | 1 - 13 files changed, 110 insertions(+), 18 deletions(-) rename include/linux/{balloon_compaction.h => balloon.h} (86%) rename mm/{balloon_compaction.c => balloon.c} (88%) v4: - add support in hyperV and vmware balloon drivers - display balloon memory in show_mem so it is logged on OOM and on sysrq v3: - added missed EXPORT_SYMBOLS Reported-by: kernel test robot - instead of balloon_common.h just use balloon.h (yes, naming is hard) - cleaned up balloon.h - remove from files that do not use it and remove externs from function declarations v2: - reworked from simple /proc/meminfo addition Cc: Michael S. Tsirkin Cc: David Hildenbrand Cc: Wei Liu Cc: Nadav Amit Cc: pv-driv...@vmware.com Cc: Jason Wang Cc: virtualizat...@lists.linux-foundation.org Cc: "K. Y. Srinivasan" Cc: Haiyang Zhang Cc: Stephen Hemminger Cc: Dexuan Cui Cc: linux-hyp...@vger.kernel.org Cc: Juergen Gross Cc: Stefano Stabellini Cc: Oleksandr Tyshchenko Cc: xen-devel@lists.xenproject.org 1. https://lore.kernel.org/lkml/f0f12c84-a62e-5f8b-46ab-a651fe8f8...@redhat.com/ base-commit: 2bca25eaeba6190efbfcb38ed169bd7ee43b5aaf -- 2.31.1
Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions
On 05.10.2022 10:37, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote: >> On 04.10.2022 17:36, Roger Pau Monne wrote: >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by >>> EFI firmware. >>> >>> The current parsing of the EFI memory map is translating >>> EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some >>> boxes as the firmware is relying on using those regions to position >>> the BARs of devices being used (possibly during runtime) for the >>> firmware. >>> >>> Xen will disable memory decoding on any device that has BARs >>> positioned over any regions on the e820 memory map, hence the firmware >>> will malfunction after Xen turning off memory decoding for the >>> device(s) that have BARs mapped in EfiMemoryMappedIO regions. >>> >>> The system under which this was observed has: >>> >>> EFI memory map: >>> [...] >>> 0fd00-0fe7f type=11 attr=8000100d >>> [...] >>> :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map >>> >>> The device behind this BAR is: >>> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI >>> Controller (rev 09) >>> Subsystem: Super Micro Computer Inc Device 091c >>> Flags: fast devsel >>> Memory at fe01 (32-bit, non-prefetchable) [size=4K]well >>> >>> For the record, the symptom observed in that machine was a hard freeze >>> when attempting to set an EFI variable (XEN_EFI_set_variable). >>> >>> Fix by allowing BARs of PCI devices to be positioned over reserved >>> memory regions, but print a warning message about such overlap. >> >> Somewhat hesitantly I might ack this change, but I'd like to give >> others (Andrew in particular) some time to voice their views. As >> said during the earlier discussion - I think we're relaxing things >> too much by going this route. > > Another option would be to explicitly check in efi_memmap for > EfiMemoryMappedIO regions and BAR overlap and only allow those. That > would be more cumbersome code wise AFAICT. Indeed there's a question of balancing well here, between two outcomes neither of which is really desirable. Jan
Re: [PATCH for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions
On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote: > On 04.10.2022 17:36, Roger Pau Monne wrote: > > The EFI memory map contains two memory types (EfiMemoryMappedIO and > > EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by > > EFI firmware. > > > > The current parsing of the EFI memory map is translating > > EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some > > boxes as the firmware is relying on using those regions to position > > the BARs of devices being used (possibly during runtime) for the > > firmware. > > > > Xen will disable memory decoding on any device that has BARs > > positioned over any regions on the e820 memory map, hence the firmware > > will malfunction after Xen turning off memory decoding for the > > device(s) that have BARs mapped in EfiMemoryMappedIO regions. > > > > The system under which this was observed has: > > > > EFI memory map: > > [...] > > 0fd00-0fe7f type=11 attr=8000100d > > [...] > > :00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > > > > The device behind this BAR is: > > > > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI > > Controller (rev 09) > > Subsystem: Super Micro Computer Inc Device 091c > > Flags: fast devsel > > Memory at fe01 (32-bit, non-prefetchable) [size=4K]well > > > > For the record, the symptom observed in that machine was a hard freeze > > when attempting to set an EFI variable (XEN_EFI_set_variable). > > > > Fix by allowing BARs of PCI devices to be positioned over reserved > > memory regions, but print a warning message about such overlap. > > Somewhat hesitantly I might ack this change, but I'd like to give > others (Andrew in particular) some time to voice their views. As > said during the earlier discussion - I think we're relaxing things > too much by going this route. Another option would be to explicitly check in efi_memmap for EfiMemoryMappedIO regions and BAR overlap and only allow those. That would be more cumbersome code wise AFAICT. > > --- a/xen/arch/x86/pci.c > > +++ b/xen/arch/x86/pci.c > > @@ -98,3 +98,30 @@ int pci_conf_write_intercept(unsigned int seg, unsigned > > int bdf, > > > > return rc; > > } > > + > > +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) > > +{ > > +unsigned long mfn; > > + > > +/* > > + * Check if BAR is not overlapping with any memory region defined > > + * in the memory map. > > + */ > > +if ( is_memory_hole(start, end) ) > > +return true; > > + > > +/* > > + * Also allow BARs placed on reserved regions in order to deal with EFI > > + * firmware using EfiMemoryMappedIO regions to place the BARs of > > devices > > + * that can be used during runtime. But print a warning when doing so. > > + */ > > +for ( mfn = mfn_x(start); mfn <= mfn_x(end); mfn++ ) > > +if ( !page_is_ram_type(mfn, RAM_TYPE_RESERVED) ) > > +return false; > > We don't need to be arch-independent here and hence instead of this > (potentially long) loop can't we use a single call to e820_all_mapped()? Sure, was searching for a range checking functions but wrongly looked into mm.c instead of e820. I would have to make the function non-init, but I think that's fine. Thanks, Roger.
[linux-linus test] 173421: tolerable FAIL - PUSHED
flight 173421 linux-linus real [real] flight 173425 linux-linus real-retest [real] http://logs.test-lab.xenproject.org/osstest/logs/173421/ http://logs.test-lab.xenproject.org/osstest/logs/173425/ Failures :-/ but no regressions. Tests which are failing intermittently (not blocking): test-armhf-armhf-xl 8 xen-bootfail pass in 173425-retest Tests which did not succeed, but are not blocking: test-armhf-armhf-xl 15 migrate-support-check fail in 173425 never pass test-armhf-armhf-xl 16 saverestore-support-check fail in 173425 never pass test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 173414 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 173414 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 173414 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 173414 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check fail like 173414 test-armhf-armhf-libvirt 16 saverestore-support-checkfail like 173414 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail like 173414 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 173414 test-amd64-amd64-libvirt 15 migrate-support-checkfail never pass test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 15 migrate-support-checkfail never pass test-arm64-arm64-xl-seattle 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail never pass test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit1 15 migrate-support-checkfail never pass test-arm64-arm64-xl-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit1 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl 15 migrate-support-checkfail never pass test-arm64-arm64-xl 16 saverestore-support-checkfail never pass test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail never pass test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail never pass test-arm64-arm64-xl-credit2 15 migrate-support-checkfail never pass test-arm64-arm64-xl-credit2 16 saverestore-support-checkfail never pass test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check fail never pass test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail never pass test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail never pass test-arm64-arm64-xl-vhd 14 migrate-support-checkfail never pass test-arm64-arm64-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-xl-arndale 15 migrate-support-checkfail never pass test-armhf-armhf-xl-arndale 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit2 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit2 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-credit1 15 migrate-support-checkfail never pass test-armhf-armhf-xl-credit1 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-rtds 15 migrate-support-checkfail never pass test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail never pass test-armhf-armhf-xl-vhd 14 migrate-support-checkfail never pass test-armhf-armhf-xl-vhd 15 saverestore-support-checkfail never pass test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass test-armhf-armhf-libvirt 15 migrate-support-checkfail never pass test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass version targeted for testing: linux0326074ff4652329f2a1a9c8685104576bd8d131 baseline version: linux725737e7c21d2d25a4312c2aaa82a52bd03e3126 Last test of basis 173414 2022-10-04 08:14:24 Z1 days Failing since173419 2022-10-04 17:40:01 Z0 days2 attempts Testing same since 173421 2022-10-05 00:13:01 Z0 days1 attempts 535 people touched revisions under test, not listing them all jobs: build-amd64-xsm
[ovmf test] 173424: all pass - PUSHED
flight 173424 ovmf real [real] http://logs.test-lab.xenproject.org/osstest/logs/173424/ Perfect :-) All tests in this flight passed as required version targeted for testing: ovmf 1bd2ff18664b9564a5802d0ac153b5023f2fa41e baseline version: ovmf f931506815548cee5a3929856bfc98266c3c Last test of basis 173420 2022-10-04 19:41:57 Z0 days Testing same since 173424 2022-10-05 04:40:32 Z0 days1 attempts People who touched revisions under test: Chasel Chiu jobs: build-amd64-xsm pass build-i386-xsm pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl-qemuu-ovmf-amd64 pass test-amd64-i386-xl-qemuu-ovmf-amd64 pass sg-report-flight on osstest.test-lab.xenproject.org logs: /home/logs/logs images: /home/logs/images Logs, config files, etc. are available at http://logs.test-lab.xenproject.org/osstest/logs Explanation of these reports, and of osstest in general, is at http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master Test harness code can be found at http://xenbits.xen.org/gitweb?p=osstest.git;a=summary Pushing revision : To xenbits.xen.org:/home/xen/git/osstest/ovmf.git f931506815..1bd2ff1866 1bd2ff18664b9564a5802d0ac153b5023f2fa41e -> xen-tested-master
Re: [PATCH 1/2] x86/cpuid: Infrastructure to support pseudo feature identifiers
On 04.10.2022 18:08, Andrew Cooper wrote: > A future change will want a cpuid-like identifier which doesn't have a mapping > to a feature bit. > > * Pass the feature name into the parse callback. > * Exclude a feature value of ~0u from falling into the general set/clear bit >paths. > * In gen-cpuid.py, insert a placeholder to collect all the pseudo feature >names. Hmm, I was envisioning this to be fitted into the existing model in a less adhoc way: (parts of) MSRs holding feature bits aren't very different from individual (pairs of) registers of CPUID output (in the case of ARCH_CAPS there would be a perhaps just abstract mask limiting things to the subset of bits which actually act as feature indicators). Hence I'd have expected them to gain proper entries in the public interface (cpufeatureset.h) and then be represented / processed the same way in featuresets and policies. All that would be left out at this point would be the exposing of the bit to guests (in patch 2, assuming the split into two patches is then actually still warranted), integration into guest_rdmsr(), and at least some of the tool stack side (xen-cpuid, for example, could easily learn of such right away). However, since I'm pretty sure you've considered such an approach, I guess I might be overlooking some caveat? > --- a/xen/tools/gen-cpuid.py > +++ b/xen/tools/gen-cpuid.py > @@ -297,6 +297,19 @@ def crunch_numbers(state): > RTM: [TSXLDTRK], > } > > +# > +# Pseudo feature names. These don't map to a feature bit, but are > +# inserted into the values dictionary so they can be parsed and handled > +# specially > +# > +pseduo_names = ( > +) > + > +for n in pseduo_names: > +if n in state.values: > +raise Fail("Pseduo feature name %s aliases real feature" % (n, )) > +state.values[n] = 0x Throughout this hunk: s/pseduo/pseudo/g. Jan
Re: [PATCH v4 1/2] Avoid using EFI tables Xen may have clobbered
On 04.10.2022 17:46, Demi Marie Obenour wrote: > Linux has a function called efi_mem_reserve() that is used to reserve > EfiBootServicesData memory that contains e.g. EFI configuration tables. > This function does not work under Xen because Xen could have already > clobbered the memory. efi_mem_reserve() not working is the whole reason > for this thread, as it prevents EFI tables that are in > EfiBootServicesData from being used under Xen. > > A much nicer approach would be for Xen to reserve boot services memory > unconditionally, but provide a hypercall that dom0 could used to free > the parts of EfiBootServicesData memory that are no longer needed. This > would allow efi_mem_reserve() to work normally. efi_mem_reserve() actually working would be a layering violation; controlling the EFI memory map is entirely Xen's job. As to the hypercall you suggest - I wouldn't mind its addition, but only for the case when -mapbs is used. As I've indicated before, I'm of the opinion that default behavior should be matching the intentions of the spec, and the intention of EfiBootServices* is for the space to be reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using that hypercall: It might use it for regions where data lives which it wouldn't care about itself, but which an eventual kexec-ed (or alike) entity would later want to consume. Code/data potentially usable by _anyone_ between two resets of the system cannot legitimately be freed (and hence imo is wrong to live in EfiBootServices* regions). In a way one could view the Dom0 kernel as an "or alike" entity ... Jan