Re: [PATCH v2 3/3] hvc/xen: fix console unplug
On 20.10.23 18:15, David Woodhouse wrote: From: David Woodhouse On unplug of a Xen console, xencons_disconnect_backend() unconditionally calls free_irq() via unbind_from_irqhandler(), causing a warning of freeing an already-free IRQ: (qemu) device_del con1 [ 32.050919] [ cut here ] [ 32.050942] Trying to free already-free IRQ 33 [ 32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330 It should be using evtchn_put() to tear down the event channel binding, and let the Linux IRQ side of it be handled by notifier_del_irq() through the HVC code. On which topic... xencons_disconnect_backend() should call hvc_remove() *first*, rather than tearing down the event channel and grant mapping while they are in use. And then the IRQ is guaranteed to be freed by the time it's torn down by evtchn_put(). Since evtchn_put() also closes the actual event channel, avoid calling xenbus_free_evtchn() except in the failure path where the IRQ was not successfully set up. However, calling hvc_remove() at the start of xencons_disconnect_backend() still isn't early enough. An unplug request is indicated by the backend setting its state to XenbusStateClosing, which triggers a notification to xencons_backend_changed(), which... does nothing except set its own frontend state directly to XenbusStateClosed without *actually* tearing down the HVC device or, you know, making sure it isn't actively in use. So the backend sees the guest frontend set its state to XenbusStateClosed and stops servicing the interrupt... and the guest spins for ever in the domU_write_console() function waiting for the ring to drain. Fix that one by calling hvc_remove() from xencons_backend_changed() before signalling to the backend that it's OK to proceed with the removal. Tested with 'dd if=/dev/zero of=/dev/hvc1' while telling Qemu to remove the console device. Signed-off-by: David Woodhouse Cc: sta...@vger.kernel.org Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [PATCH] hvc/xen: fix event channel handling for secondary consoles
On 18.10.23 01:46, David Woodhouse wrote: From: David Woodhouse The xencons_connect_backend() function allocates a local interdomain event channel with xenbus_alloc_evtchn(), then calls bind_interdomain_evtchn_to_irq_lateeoi() to bind to that port# on the *remote* domain. That doesn't work very well: (qemu) device_add xen-console,id=con1,chardev=pty0 [ 44.323872] xenconsole console-1: 2 xenbus_dev_probe on device/console/1 [ 44.323995] xenconsole: probe of console-1 failed with error -2 Fix it to use bind_evtchn_to_irq_lateeoi(), which does the right thing by just binding that *local* event channel to an irq. The backend will do the interdomain binding. This didn't affect the primary console because the setup for that is special — the toolstack allocates the guest event channel and the guest discovers it with HVMOP_get_param. Once that's fixed, there's also a warning on hot-unplug because xencons_disconnect_backend() unconditionally calls free_irq() via unbind_from_irqhandler(): (qemu) device_del con1 [ 32.050919] [ cut here ] [ 32.050942] Trying to free already-free IRQ 33 [ 32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330 Fix that by calling notifier_del_irq() first, which only calls free_irq() if the irq was requested in the first place. Then use I don't think the "if the irq was requested in the first place" is the correct reasoning. I think the problem is that notifier_del_irq() will be called another time through the .notifier_del hook. Two calls of notifier_del_irq() are fine, but one call of it and another call of free_irq() via unbind_from_irqhandler() is a problem. evtchn_put() to release the irq and event channel. Avoid calling xenbus_free_evtchn() in the normal case, as evtchn_put() will do that too. The only time xenbus_free_evtchn() needs to be called is for the cleanup when bind_evtchn_to_irq_lateeoi() fails. Finally, fix the error path in xen_hvc_init() when there's no primary console. It should still register the frontend driver, as there may be secondary consoles. (Qemu can always add secondary consoles, but only the toolstack can add the primary because it's special.) Fixes: fe415186b4 ("xen/console: harden hvc_xen against event channel storms") Signed-off-by: David Woodhouse Cc: sta...@vger.kernel.org With above fixed in the commit message: Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature.asc Description: OpenPGP digital signature
Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring
On 30.11.22 16:09, Roger Pau Monne wrote: The hvc machinery registers both a console and a tty device based on the hv ops provided by the specific implementation. Those two interfaces however have different locks, and there's no single locks that's shared between the tty and the console implementations, hence the driver needs to protect itself against concurrent accesses. Otherwise concurrent calls using the split interfaces are likely to corrupt the ring indexes, leaving the console unusable. Introduce a lock to xencons_info to serialize accesses to the shared ring. This is only required when using the shared memory console, concurrent accesses to the hypercall based console implementation are not an issue. Note the conditional logic in domU_read_console() is slightly modified so the notify_daemon() call can be done outside of the locked region: it's an hypercall and there's no need for it to be done with the lock held. Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen console') Signed-off-by: Roger Pau Monné Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v2] hvc/xen: lock console list traversal
On 30.11.22 17:36, Roger Pau Monne wrote: The currently lockless access to the xen console list in vtermno_to_xencons() is incorrect, as additions and removals from the list can happen anytime, and as such the traversal of the list to get the private console data for a given termno needs to happen with the lock held. Note users that modify the list already do so with the lock taken. Adjust current lock takers to use the _irq{save,restore} helpers, since the context in which vtermno_to_xencons() is called can have interrupts disabled. Use the _irq{save,restore} set of helpers to switch the current callers to disable interrupts in the locked region. I haven't checked if existing users could instead use the _irq variant, as I think it's safer to use _irq{save,restore} upfront. While there switch from using list_for_each_entry_safe to list_for_each_entry: the current entry cursor won't be removed as part of the code in the loop body, so using the _safe variant is pointless. Fixes: 02e19f9c7cac ('hvc_xen: implement multiconsole support') Signed-off-by: Roger Pau Monné Pushed to xen/tip.git for-linus-6.2 Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 4/6] xen: make remove callback of xen driver void returned
On 05.12.22 16:36, Dawei Li wrote: Since commit fc7a6209d571 ("bus: Make remove callback return void") forces bus_type::remove be void-returned, it doesn't make much sense for any bus based driver implementing remove callbalk to return non-void to its caller. This change is for xen bus based drivers. Signed-off-by: Dawei Li Acked-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v2 30/44] cpuidle,xenpv: Make more PARAVIRT_XXL noinstr clean
On 19.09.22 12:00, Peter Zijlstra wrote: vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() leaves .noinstr.text section vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves .noinstr.text section vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Srivatsa S. Bhat (VMware) Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 0/5] xen: cleanup detection of non-essential pv devices
On 22.10.21 09:24, Jan Beulich wrote: On 22.10.2021 08:47, Juergen Gross wrote: Today the non-essential pv devices are hard coded in the xenbus driver and this list is lacking multiple entries. This series reworks the detection logic of non-essential devices by adding a flag for that purpose to struct xenbus_driver. I'm wondering whether it wouldn't better be the other way around: The (hopefully few) essential ones get flagged, thus also making it more prominent during patch review that a flag gets added (and justification provided), instead of having to spot the lack of a flag getting set. Not flagging a non-essential one is less problematic than not flagging an essential driver IMO. For some drivers I'm on the edge, BTW. The pv 9pfs driver ought to be non-essential in most cases, but there might be use cases where it is needed, so I didn't set its non_essential flag. Same applies to pv-usb and maybe pv-scsi, while pv-tpm probably really is essential. With the current series I'm ending up with 6 non-essential drivers and 6 essential ones, so either way needs the same number of drivers modified. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[PATCH 0/5] xen: cleanup detection of non-essential pv devices
Today the non-essential pv devices are hard coded in the xenbus driver and this list is lacking multiple entries. This series reworks the detection logic of non-essential devices by adding a flag for that purpose to struct xenbus_driver. Juergen Gross (5): xen: add "not_essential" flag to struct xenbus_driver xen: flag xen_drm_front to be not essential for system boot xen: flag hvc_xen to be not essential for system boot xen: flag pvcalls-front to be not essential for system boot xen: flag xen_snd_front to be not essential for system boot drivers/gpu/drm/xen/xen_drm_front.c| 1 + drivers/input/misc/xen-kbdfront.c | 1 + drivers/tty/hvc/hvc_xen.c | 1 + drivers/video/fbdev/xen-fbfront.c | 1 + drivers/xen/pvcalls-front.c| 1 + drivers/xen/xenbus/xenbus_probe_frontend.c | 14 +++--- include/xen/xenbus.h | 1 + sound/xen/xen_snd_front.c | 1 + 8 files changed, 10 insertions(+), 11 deletions(-) -- 2.26.2
[PATCH 3/5] xen: flag hvc_xen to be not essential for system boot
The Xen pv console driver is not essential for boot. Set the respective flag. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index f0bf01ea069a..71e0dd2c0ce5 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -522,6 +522,7 @@ static struct xenbus_driver xencons_driver = { .remove = xencons_remove, .resume = xencons_resume, .otherend_changed = xencons_backend_changed, + .not_essential = true, }; #endif /* CONFIG_HVC_XEN_FRONTEND */ -- 2.26.2
Re: [PATCH 5/9] xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU
On 07.09.21 12:10, Jan Beulich wrote: xenboot_write_console() is dealing with these quite fine so I don't see why xenboot_console_setup() would return -ENOENT in this case. Adjust documentation accordingly. Signed-off-by: Jan Beulich Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 3/9] xen/x86: make "earlyprintk=xen" work better for PVH Dom0
On 07.09.21 12:09, Jan Beulich wrote: The xen_hvm_early_write() path better wouldn't be taken in this case; while port 0xE9 can be used, the hypercall path is quite a bit more efficient. Put that first, as it may also work for DomU-s (see also xen_raw_console_write()). While there also bail from the function when the first domU_write_console() failed - later ones aren't going to succeed. Signed-off-by: Jan Beulich Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 3/3] memblock: cleanup memblock_free interface
On 23.09.21 09:43, Mike Rapoport wrote: From: Mike Rapoport For ages memblock_free() interface dealt with physical addresses even despite the existence of memblock_alloc_xx() functions that return a virtual pointer. Introduce memblock_phys_free() for freeing physical ranges and repurpose memblock_free() to free virtual pointers to make the following pairing abundantly clear: int memblock_phys_free(phys_addr_t base, phys_addr_t size); phys_addr_t memblock_phys_alloc(phys_addr_t base, phys_addr_t size); void *memblock_alloc(phys_addr_t size, phys_addr_t align); void memblock_free(void *ptr, size_t size); Replace intermediate memblock_free_ptr() with memblock_free() and drop unnecessary aliases memblock_free_early() and memblock_free_early_nid(). Suggested-by: Linus Torvalds Signed-off-by: Mike Rapoport arch/x86/xen/ parts: Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 2/3] xen/x86: free_p2m_page: use memblock_free_ptr() to free a virtual pointer
On 23.09.21 09:43, Mike Rapoport wrote: From: Mike Rapoport free_p2m_page() wrongly passes a virtual pointer to memblock_free() that treats it as a physical address. Call memblock_free_ptr() instead that gets a virtual address to free the memory. Signed-off-by: Mike Rapoport Reviewed-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[PATCH 0/2] kvm: fix KVM_MAX_VCPU_ID handling
Revert commit 76b4f357d0e7d8f6f00 which was based on wrong reasoning and rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS in order to avoid the same issue in future. Juergen Gross (2): x86/kvm: revert commit 76b4f357d0e7d8f6f00 kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS Documentation/virt/kvm/devices/xics.rst| 2 +- Documentation/virt/kvm/devices/xive.rst| 2 +- arch/mips/kvm/mips.c | 2 +- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h| 4 ++-- arch/powerpc/kvm/book3s_xive.c | 2 +- arch/powerpc/kvm/powerpc.c | 2 +- arch/x86/include/asm/kvm_host.h| 2 +- arch/x86/kvm/ioapic.c | 2 +- arch/x86/kvm/ioapic.h | 4 ++-- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 4 ++-- tools/testing/selftests/kvm/kvm_create_max_vcpus.c | 2 +- virt/kvm/kvm_main.c| 2 +- 14 files changed, 17 insertions(+), 17 deletions(-) -- 2.26.2
[PATCH 2/2] kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS
KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the number of allowed vcpu-ids. This has already led to confusion, so rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more clear Suggested-by: Eduardo Habkost Signed-off-by: Juergen Gross --- Documentation/virt/kvm/devices/xics.rst| 2 +- Documentation/virt/kvm/devices/xive.rst| 2 +- arch/mips/kvm/mips.c | 2 +- arch/powerpc/include/asm/kvm_book3s.h | 2 +- arch/powerpc/include/asm/kvm_host.h| 4 ++-- arch/powerpc/kvm/book3s_xive.c | 2 +- arch/powerpc/kvm/powerpc.c | 2 +- arch/x86/include/asm/kvm_host.h| 2 +- arch/x86/kvm/ioapic.c | 2 +- arch/x86/kvm/ioapic.h | 4 ++-- arch/x86/kvm/x86.c | 2 +- include/linux/kvm_host.h | 4 ++-- tools/testing/selftests/kvm/kvm_create_max_vcpus.c | 2 +- virt/kvm/kvm_main.c| 2 +- 14 files changed, 17 insertions(+), 17 deletions(-) diff --git a/Documentation/virt/kvm/devices/xics.rst b/Documentation/virt/kvm/devices/xics.rst index 2d6927e0b776..bf32c77174ab 100644 --- a/Documentation/virt/kvm/devices/xics.rst +++ b/Documentation/virt/kvm/devices/xics.rst @@ -22,7 +22,7 @@ Groups: Errors: === == --EINVAL Value greater than KVM_MAX_VCPU_ID. +-EINVAL Value greater than KVM_MAX_VCPU_IDS. -EFAULT Invalid user pointer for attr->addr. -EBUSY A vcpu is already connected to the device. === == diff --git a/Documentation/virt/kvm/devices/xive.rst b/Documentation/virt/kvm/devices/xive.rst index 8bdf3dc38f01..8b5e7b40bdf8 100644 --- a/Documentation/virt/kvm/devices/xive.rst +++ b/Documentation/virt/kvm/devices/xive.rst @@ -91,7 +91,7 @@ the legacy interrupt mode, referred as XICS (POWER7/8). Errors: === == - -EINVAL Value greater than KVM_MAX_VCPU_ID. + -EINVAL Value greater than KVM_MAX_VCPU_IDS. -EFAULT Invalid user pointer for attr->addr. -EBUSY A vCPU is already connected to the device. === == diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index 75c6f264c626..562aa878b266 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -1073,7 +1073,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_MAX_VCPUS; break; case KVM_CAP_MAX_VCPU_ID: - r = KVM_MAX_VCPU_ID; + r = KVM_MAX_VCPU_IDS; break; case KVM_CAP_MIPS_FPU: /* We don't handle systems with inconsistent cpu_has_fpu */ diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h index caaa0f592d8e..3d31f2c59e43 100644 --- a/arch/powerpc/include/asm/kvm_book3s.h +++ b/arch/powerpc/include/asm/kvm_book3s.h @@ -434,7 +434,7 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu); #define SPLIT_HACK_OFFS0xfb00 /* - * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the + * This packs a VCPU ID from the [0..KVM_MAX_VCPU_IDS) space down to the * [0..KVM_MAX_VCPUS) space, using knowledge of the guest's core stride * (but not its actual threading mode, which is not available) to avoid * collisions. diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 080a7feb7731..59cb38b04ede 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -33,11 +33,11 @@ #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE #include /* for MAX_SMT_THREADS */ -#define KVM_MAX_VCPU_ID(MAX_SMT_THREADS * KVM_MAX_VCORES) +#define KVM_MAX_VCPU_IDS (MAX_SMT_THREADS * KVM_MAX_VCORES) #define KVM_MAX_NESTED_GUESTS KVMPPC_NR_LPIDS #else -#define KVM_MAX_VCPU_IDKVM_MAX_VCPUS +#define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ #define __KVM_HAVE_ARCH_INTC_INITIALIZED diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index a18db9e16ea4..225008882958 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -1928,7 +1928,7 @@ int kvmppc_xive_set_nr_servers(struct kvmppc_xive *xive, u64 addr) pr_devel("%s nr_servers=%u\n", __func__, nr_servers); - if (!nr_servers || nr_servers > KVM_MAX_VCPU_ID) + if (!nr_servers || nr_servers > KVM_MAX_VCPU_IDS) return -EINVAL; mutex_lock(>lock); diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index b4e6f70b97b9..8ab90ce8738f 100644 --- a/arch/po
Re: [PATCH v2] xen/hvc: replace BUG_ON() with negative return value
On 07.07.21 11:57, Jan Beulich wrote: On 07.07.2021 11:10, Juergen Gross wrote: Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,7 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ Largely unrelated note: While in general the barriers here may want switching to virt_*mb(), this particular one looks to be too heavy anyway: a read barrier is all that's needed here afaict, just like there's only a write barrier between ring contents and producer writing in __write_console(). I agree. And btw, since I've got puzzled by the linuxppc-dev@ in the recipients list, I did look up relevant entries in ./MAINTAINERS. Shouldn't the file be part of "XEN HYPERVISOR INTERFACE"? I wouldn't mind. Greg, Jiri, what do you think? Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[PATCH v2] xen/hvc: replace BUG_ON() with negative return value
Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- V2: - drop BUG_ON() (Christophe Leroy, Greg Kroah-Hartmann) - replace WARN_ONCE() by pr_err_once() (Greg Kroah-Hartmann) - break out from original series --- drivers/tty/hvc/hvc_xen.c | 17 ++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..8f143c09a169 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,7 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ - BUG_ON((prod - cons) > sizeof(intf->out)); + + if ((prod - cons) > sizeof(intf->out)) { + pr_err_once("xencons: Illegal ring page indices"); + return -EINVAL; + } while ((sent < len) && ((prod - cons) < sizeof(intf->out))) intf->out[MASK_XENCONS_IDX(prod++, intf->out)] = data[sent++]; @@ -114,7 +118,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len) */ while (len) { int sent = __write_console(cons, data, len); - + + if (sent < 0) + return sent; + data += sent; len -= sent; @@ -138,7 +145,11 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */ - BUG_ON((prod - cons) > sizeof(intf->in)); + + if ((prod - cons) > sizeof(intf->in)) { + pr_err_once("xencons: Illegal ring page indices"); + return -EINVAL; + } while (cons != prod && recv < len) buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)]; -- 2.26.2
Re: [PATCH] bus: Make remove callback return void
On 06.07.21 11:50, Uwe Kleine-König wrote: The driver core ignores the return value of this callback because there is only little it can do when a device disappears. This is the final bit of a long lasting cleanup quest where several buses were converted to also return void from their remove callback. Additionally some resource leaks were fixed that were caused by drivers returning an error code in the expectation that the driver won't go away. With struct bus_type::remove returning void it's prevented that newly implemented buses return an ignored error code and so don't anticipate wrong expectations for driver authors. Signed-off-by: Uwe Kleine-König Xen-bits: Acked-by: Juergen Gross Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature
[PATCH 0/8] xen: harden frontends against malicious backends
Xen backends of para-virtualized devices can live in dom0 kernel, dom0 user land, or in a driver domain. This means that a backend might reside in a less trusted environment than the Xen core components, so a backend should not be able to do harm to a Xen guest (it can still mess up I/O data, but it shouldn't be able to e.g. crash a guest by other means or cause a privilege escalation in the guest). Unfortunately many frontends in the Linux kernel are fully trusting their respective backends. This series is starting to fix the most important frontends: console, disk and network. It was discussed to handle this as a security problem, but the topic was discussed in public before, so it isn't a real secret. Juergen Gross (8): xen: sync include/xen/interface/io/ring.h with Xen's newest version xen/blkfront: read response from backend only once xen/blkfront: don't take local copy of a request from the ring page xen/blkfront: don't trust the backend response data blindly xen/netfront: read response from backend only once xen/netfront: don't read data from request on the ring page xen/netfront: don't trust the backend response data blindly xen/hvc: replace BUG_ON() with negative return value drivers/block/xen-blkfront.c| 118 +- drivers/net/xen-netfront.c | 184 ++--- drivers/tty/hvc/hvc_xen.c | 15 +- include/xen/interface/io/ring.h | 278 ++-- 4 files changed, 369 insertions(+), 226 deletions(-) -- 2.26.2
Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
On 13.05.21 12:25, Greg Kroah-Hartman wrote: On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote: Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; How nice, you just rebooted on panic-on-warn systems :( + BUG_ON((prod - cons) > sizeof(intf->out)); Why keep this line? Failed to delete it, sorry. Please just fix this up properly, if userspace can trigger this, then both the WARN_ON() and BUG_ON() are not correct and need to be correctly handled. It can be triggered by the console backend, but I agree a WARN isn't the way to go here. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
On 13.05.21 12:16, Christophe Leroy wrote: Le 13/05/2021 à 12:03, Juergen Gross a écrit : Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; + BUG_ON((prod - cons) > sizeof(intf->out)); Why keep the BUG_ON() ? Oh, failed to delete it. Thanks for noticing. Juergen OpenPGP_0xB0DE9DD628BF132F.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
[PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value
Xen frontends shouldn't BUG() in case of illegal data received from their backends. So replace the BUG_ON()s when reading illegal data from the ring page with negative return values. Signed-off-by: Juergen Gross --- drivers/tty/hvc/hvc_xen.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c index 92c9a476defc..30d7ffb1e04c 100644 --- a/drivers/tty/hvc/hvc_xen.c +++ b/drivers/tty/hvc/hvc_xen.c @@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons, cons = intf->out_cons; prod = intf->out_prod; mb(); /* update queue values before going on */ + + if (WARN_ONCE((prod - cons) > sizeof(intf->out), + "Illegal ring page indices")) + return -EINVAL; + BUG_ON((prod - cons) > sizeof(intf->out)); while ((sent < len) && ((prod - cons) < sizeof(intf->out))) @@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char *data, int len) */ while (len) { int sent = __write_console(cons, data, len); - + + if (sent < 0) + return sent; + data += sent; len -= sent; @@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, int len) cons = intf->in_cons; prod = intf->in_prod; mb(); /* get pointers before reading ring */ - BUG_ON((prod - cons) > sizeof(intf->in)); + + if (WARN_ONCE((prod - cons) > sizeof(intf->in), + "Illegal ring page indices")) + return -EINVAL; while (cons != prod && recv < len) buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)]; -- 2.26.2
Re: [Xen-devel] [PATCH 19/21] treewide: add checks for the return value of memblock_alloc*()
On 16/01/2019 14:44, Mike Rapoport wrote: > Add check for the return value of memblock_alloc*() functions and call > panic() in case of error. > The panic message repeats the one used by panicing memblock allocators with > adjustment of parameters to include only relevant ones. > > The replacement was mostly automated with semantic patches like the one > below with manual massaging of format strings. > > @@ > expression ptr, size, align; > @@ > ptr = memblock_alloc(size, align); > + if (!ptr) > + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__, > size, align); > > Signed-off-by: Mike Rapoport For the Xen part: Reviewed-by: Juergen Gross Juergen
Re: [PATCH] idle/x86: remove the call to boot_init_stack_canary() from cpu_startup_entry()
On 19/10/2018 11:29, Christophe Leroy wrote: > commit d7880812b359 ("idle: Add the stack canary init to > cpu_startup_entry()") added the call to boot_init_stack_canary() > in cpu_startup_entry() in an #ifdef CONFIG_X86 statement, with > the intention to remove that #ifdef later. > > While implementing stack protector for powerpc, it has been > observed that calling boot_init_stack_canary() is also needed > for powerpc which uses per task (TLS) stack canary like the X86. > > However, calling boot_init_stack_canary() would break arches > using global stack canary (ARM, SH, MIPS and XTENSA). > > Instead of adding modifying the #ifdef in a > implemented the call to boot_init_stack_canary() in the function > calling cpu_startup_entry() I can't parse this sentence. > > On x86, we have two functions calling cpu_startup_entry(): > - start_secondary() > - cpu_bringup_and_idle() > > start_secondary() already calls boot_init_stack_canary(). > > This patch adds the call to boot_init_stack_canary() in > cpu_bringup_and_idle() and removes it from cpu_startup_entry() > > Signed-off-by: Christophe Leroy With the commit message made understandable you can add my Reviewed-by: Juergen Gross Juergen
Re: [PATCH 2/5] stop_machine: yield CPU during stop machine
On 21/10/16 14:05, Peter Zijlstra wrote: > On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote: >> stop_machine can take a very long time if the hypervisor does >> overcommitment for guest CPUs. When waiting for "the one", lets >> give up our CPU by using the new cpu_relax_yield. > > This seems something that would apply to most other virt stuff. Lets Cc > a few more lists for that. Corrected xen-devel mail address. Juergen > >> Signed-off-by: Christian Borntraeger>> --- >> kernel/stop_machine.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c >> index ec9ab2f..1eb8266 100644 >> --- a/kernel/stop_machine.c >> +++ b/kernel/stop_machine.c >> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data) >> /* Simple state machine */ >> do { >> /* Chill out and ensure we re-read multi_stop_state. */ >> -cpu_relax(); >> +cpu_relax_yield(); >> if (msdata->state != curstate) { >> curstate = msdata->state; >> switch (curstate) { >> -- >> 2.5.5 >> > ___ > Virtualization mailing list > virtualizat...@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/virtualization >
Re: [PATCH v5 7/9] x86, xen: support vcpu preempted check
Corrected xen-devel mailing list address, added other Xen maintainers On 20/10/16 23:27, Pan Xinhui wrote: > From: Juergen Gross <jgr...@suse.com> > > Support the vcpu_is_preempted() functionality under Xen. This will > enhance lock performance on overcommitted hosts (more runnable vcpus > than physical cpus in the system) as doing busy waits for preempted > vcpus will hurt system performance far worse than early yielding. > > A quick test (4 vcpus on 1 physical cpu doing a parallel build job > with "make -j 8") reduced system time by about 5% with this patch. > > Signed-off-by: Juergen Gross <jgr...@suse.com> > Signed-off-by: Pan Xinhui <xinhui@linux.vnet.ibm.com> > --- > arch/x86/xen/spinlock.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c > index 3d6e006..74756bb 100644 > --- a/arch/x86/xen/spinlock.c > +++ b/arch/x86/xen/spinlock.c > @@ -114,7 +114,6 @@ void xen_uninit_lock_cpu(int cpu) > per_cpu(irq_name, cpu) = NULL; > } > > - > /* > * Our init of PV spinlocks is split in two init functions due to us > * using paravirt patching and jump labels patching and having to do > @@ -137,6 +136,8 @@ void __init xen_init_spinlocks(void) > pv_lock_ops.queued_spin_unlock = > PV_CALLEE_SAVE(__pv_queued_spin_unlock); > pv_lock_ops.wait = xen_qlock_wait; > pv_lock_ops.kick = xen_qlock_kick; > + > + pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen; > } > > /* >
Re: [PATCH v4 0/5] implement vcpu preempted check
On 19/10/16 12:20, Pan Xinhui wrote: > change from v3: > add x86 vcpu preempted check patch > change from v2: > no code change, fix typos, update some comments > change from v1: > a simplier definition of default vcpu_is_preempted > skip mahcine type check on ppc, and add config. remove dedicated macro. > add one patch to drop overload of rwsem_spin_on_owner and > mutex_spin_on_owner. > add more comments > thanks boqun and Peter's suggestion. > > This patch set aims to fix lock holder preemption issues. > > test-case: > perf record -a perf bench sched messaging -g 400 -p && perf report > > 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock > 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner > 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock > 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task > 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq > 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is > 2.49% sched-messaging [kernel.vmlinux] [k] system_call > > We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin > loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner. > These spin_on_onwer variant also cause rcu stall before we apply this patch > set > > We also have observed some performace improvements. > > PPC test result: > > 1 copy - 0.94% > 2 copy - 7.17% > 4 copy - 11.9% > 8 copy - 3.04% > 16 copy - 15.11% > > details below: > Without patch: > > 1 copy - File Write 4096 bufsize 8000 maxblocks 2188223.0 KBps (30.0 s, > 1 samples) > 2 copy - File Write 4096 bufsize 8000 maxblocks 1804433.0 KBps (30.0 s, > 1 samples) > 4 copy - File Write 4096 bufsize 8000 maxblocks 1237257.0 KBps (30.0 s, > 1 samples) > 8 copy - File Write 4096 bufsize 8000 maxblocks 1032658.0 KBps (30.0 s, > 1 samples) > 16 copy - File Write 4096 bufsize 8000 maxblocks 768000.0 KBps (30.1 > s, 1 samples) > > With patch: > > 1 copy - File Write 4096 bufsize 8000 maxblocks 2209189.0 KBps (30.0 s, > 1 samples) > 2 copy - File Write 4096 bufsize 8000 maxblocks 1943816.0 KBps (30.0 s, > 1 samples) > 4 copy - File Write 4096 bufsize 8000 maxblocks 1405591.0 KBps (30.0 s, > 1 samples) > 8 copy - File Write 4096 bufsize 8000 maxblocks 1065080.0 KBps (30.0 s, > 1 samples) > 16 copy - File Write 4096 bufsize 8000 maxblocks 904762.0 KBps (30.0 > s, 1 samples) > > X86 test result: > test-case after-patch before-patch > Execl Throughput |18307.9 lps |11701.6 lps > File Copy 1024 bufsize 2000 maxblocks | 1352407.3 KBps | 790418.9 KBps > File Copy 256 bufsize 500 maxblocks| 367555.6 KBps | 222867.7 KBps > File Copy 4096 bufsize 8000 maxblocks | 3675649.7 KBps | 1780614.4 KBps > Pipe Throughput| 11872208.7 lps | 11855628.9 lps > Pipe-based Context Switching | 1495126.5 lps | 1490533.9 lps > Process Creation |29881.2 lps |28572.8 lps > Shell Scripts (1 concurrent) |23224.3 lpm |22607.4 lpm > Shell Scripts (8 concurrent) | 3531.4 lpm | 3211.9 lpm > System Call Overhead | 10385653.0 lps | 10419979.0 lps > > Pan Xinhui (5): > kernel/sched: introduce vcpu preempted check interface > locking/osq: Drop the overload of osq_lock() > kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner > powerpc/spinlock: support vcpu preempted check > x86, kvm: support vcpu preempted check The attached patch adds Xen support for x86. Please tell me whether you want to add this patch to your series or if I should post it when your series has been accepted. You can add my Tested-by: Juergen Gross <jgr...@suse.com> for patches 1-3 and 5 (paravirt parts only). Juergen > > arch/powerpc/include/asm/spinlock.h | 8 > arch/x86/include/asm/paravirt_types.h | 6 ++ > arch/x86/include/asm/spinlock.h | 8 > arch/x86/include/uapi/asm/kvm_para.h | 3 ++- > arch/x86/kernel/kvm.c | 11 +++ > arch/x86/kernel/paravirt.c| 11 +++ > arch/x86/kvm/x86.c| 12 > include/linux/sched.h | 12 > kernel/locking/mutex.c | 15 +-- > kernel/locking/osq_lock.c | 10 +- > kernel/locking/rwsem-xadd.c | 16 +--- > 11 files changed, 105 insertions(+), 7 deletions(-) > >From c79b86d00a812d6207ef788d453e2d0
Re: [PATCH v2 0/4] implement vcpu preempted check
On 11/07/16 17:10, Waiman Long wrote: > On 07/06/2016 02:52 AM, Peter Zijlstra wrote: >> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote: >>> change fomr v1: >>> a simplier definition of default vcpu_is_preempted >>> skip mahcine type check on ppc, and add config. remove dedicated >>> macro. >>> add one patch to drop overload of rwsem_spin_on_owner and >>> mutex_spin_on_owner. >>> add more comments >>> thanks boqun and Peter's suggestion. >>> >>> This patch set aims to fix lock holder preemption issues. >>> >>> test-case: >>> perf record -a perf bench sched messaging -g 400 -p&& perf report >>> >>> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock >>> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner >>> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock >>> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task >>> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq >>> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is >>> 2.49% sched-messaging [kernel.vmlinux] [k] system_call >>> >>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in >>> some spin >>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner. >>> These spin_on_onwer variant also cause rcu stall before we apply this >>> patch set >>> >> Paolo, could you help out with an (x86) KVM interface for this? >> >> Waiman, could you see if you can utilize this to get rid of the >> SPIN_THRESHOLD in qspinlock_paravirt? > > That API is certainly useful to make the paravirt spinlock perform > better. However, I am not sure if we can completely get rid of the > SPIN_THRESHOLD at this point. It is not just the kvm, the xen code need > to be modified as well. This should be rather easy. The relevant information is included in the runstate data mapped into kernel memory. I can provide a patch for Xen if needed. Juergen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/4] implement vcpu preempted check
On 06/07/16 10:19, Peter Zijlstra wrote: > On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote: >> On 06/07/16 08:52, Peter Zijlstra wrote: > >>> Paolo, could you help out with an (x86) KVM interface for this? >> >> Xen support of this interface should be rather easy. Could you please >> Cc: xen-devel-requ...@lists.xenproject.org in the next version? > > So meta question; aren't all you virt people looking at the regular > virtualization list? Or should we really dig out all the various > hypervisor lists and Cc them? Hmm, good question. Up to now I didn't look at the virtualization list, just changed that. :-) Can't speak for the other virt people. Juergen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2 0/4] implement vcpu preempted check
On 06/07/16 08:52, Peter Zijlstra wrote: > On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote: >> change fomr v1: >> a simplier definition of default vcpu_is_preempted >> skip mahcine type check on ppc, and add config. remove dedicated macro. >> add one patch to drop overload of rwsem_spin_on_owner and >> mutex_spin_on_owner. >> add more comments >> thanks boqun and Peter's suggestion. >> >> This patch set aims to fix lock holder preemption issues. >> >> test-case: >> perf record -a perf bench sched messaging -g 400 -p && perf report >> >> 18.09% sched-messaging [kernel.vmlinux] [k] osq_lock >> 12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner >> 5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock >> 3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task >> 3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq >> 3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is >> 2.49% sched-messaging [kernel.vmlinux] [k] system_call >> >> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some >> spin >> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner. >> These spin_on_onwer variant also cause rcu stall before we apply this patch >> set >> > > Paolo, could you help out with an (x86) KVM interface for this? Xen support of this interface should be rather easy. Could you please Cc: xen-devel-requ...@lists.xenproject.org in the next version? Juergen ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev