Re: [PATCH v2 3/3] hvc/xen: fix console unplug

2023-10-23 Thread Juergen Gross

On 20.10.23 18:15, David Woodhouse wrote:

From: David Woodhouse 

On unplug of a Xen console, xencons_disconnect_backend() unconditionally
calls free_irq() via unbind_from_irqhandler(), causing a warning of
freeing an already-free IRQ:

(qemu) device_del con1
[   32.050919] [ cut here ]
[   32.050942] Trying to free already-free IRQ 33
[   32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 
__free_irq+0x1d4/0x330

It should be using evtchn_put() to tear down the event channel binding,
and let the Linux IRQ side of it be handled by notifier_del_irq() through
the HVC code.

On which topic... xencons_disconnect_backend() should call hvc_remove()
*first*, rather than tearing down the event channel and grant mapping
while they are in use. And then the IRQ is guaranteed to be freed by
the time it's torn down by evtchn_put().

Since evtchn_put() also closes the actual event channel, avoid calling
xenbus_free_evtchn() except in the failure path where the IRQ was not
successfully set up.

However, calling hvc_remove() at the start of xencons_disconnect_backend()
still isn't early enough. An unplug request is indicated by the backend
setting its state to XenbusStateClosing, which triggers a notification
to xencons_backend_changed(), which... does nothing except set its own
frontend state directly to XenbusStateClosed without *actually* tearing
down the HVC device or, you know, making sure it isn't actively in use.

So the backend sees the guest frontend set its state to XenbusStateClosed
and stops servicing the interrupt... and the guest spins for ever in the
domU_write_console() function waiting for the ring to drain.

Fix that one by calling hvc_remove() from xencons_backend_changed() before
signalling to the backend that it's OK to proceed with the removal.

Tested with 'dd if=/dev/zero of=/dev/hvc1' while telling Qemu to remove
the console device.

Signed-off-by: David Woodhouse 
Cc: sta...@vger.kernel.org


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH] hvc/xen: fix event channel handling for secondary consoles

2023-10-20 Thread Juergen Gross

On 18.10.23 01:46, David Woodhouse wrote:

From: David Woodhouse 

The xencons_connect_backend() function allocates a local interdomain
event channel with xenbus_alloc_evtchn(), then calls
bind_interdomain_evtchn_to_irq_lateeoi() to bind to that port# on the
*remote* domain.

That doesn't work very well:

(qemu) device_add xen-console,id=con1,chardev=pty0
[   44.323872] xenconsole console-1: 2 xenbus_dev_probe on device/console/1
[   44.323995] xenconsole: probe of console-1 failed with error -2

Fix it to use bind_evtchn_to_irq_lateeoi(), which does the right thing
by just binding that *local* event channel to an irq. The backend will
do the interdomain binding.

This didn't affect the primary console because the setup for that is
special — the toolstack allocates the guest event channel and the guest
discovers it with HVMOP_get_param.

Once that's fixed, there's also a warning on hot-unplug because
xencons_disconnect_backend() unconditionally calls free_irq() via
unbind_from_irqhandler():

(qemu) device_del con1
[   32.050919] [ cut here ]
[   32.050942] Trying to free already-free IRQ 33
[   32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 
__free_irq+0x1d4/0x330

Fix that by calling notifier_del_irq() first, which only calls
free_irq() if the irq was requested in the first place. Then use


I don't think the "if the irq was requested in the first place" is the correct
reasoning.

I think the problem is that notifier_del_irq() will be called another time
through the .notifier_del hook. Two calls of notifier_del_irq() are fine, but
one call of it and another call of free_irq() via unbind_from_irqhandler() is
a problem.


evtchn_put() to release the irq and event channel. Avoid calling
xenbus_free_evtchn() in the normal case, as evtchn_put() will do that
too. The only time xenbus_free_evtchn() needs to be called is for the
cleanup when bind_evtchn_to_irq_lateeoi() fails.

Finally, fix the error path in xen_hvc_init() when there's no primary
console. It should still register the frontend driver, as there may be
secondary consoles. (Qemu can always add secondary consoles, but only
the toolstack can add the primary because it's special.)

Fixes: fe415186b4 ("xen/console: harden hvc_xen against event channel storms")
Signed-off-by: David Woodhouse 
Cc: sta...@vger.kernel.org


With above fixed in the commit message:

Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2] hvc/xen: prevent concurrent accesses to the shared ring

2023-03-17 Thread Juergen Gross

On 30.11.22 16:09, Roger Pau Monne wrote:

The hvc machinery registers both a console and a tty device based on
the hv ops provided by the specific implementation.  Those two
interfaces however have different locks, and there's no single locks
that's shared between the tty and the console implementations, hence
the driver needs to protect itself against concurrent accesses.
Otherwise concurrent calls using the split interfaces are likely to
corrupt the ring indexes, leaving the console unusable.

Introduce a lock to xencons_info to serialize accesses to the shared
ring.  This is only required when using the shared memory console,
concurrent accesses to the hypercall based console implementation are
not an issue.

Note the conditional logic in domU_read_console() is slightly modified
so the notify_daemon() call can be done outside of the locked region:
it's an hypercall and there's no need for it to be done with the lock
held.

Fixes: b536b4b96230 ('xen: use the hvc console infrastructure for Xen console')
Signed-off-by: Roger Pau Monné 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2] hvc/xen: lock console list traversal

2023-01-10 Thread Juergen Gross

On 30.11.22 17:36, Roger Pau Monne wrote:

The currently lockless access to the xen console list in
vtermno_to_xencons() is incorrect, as additions and removals from the
list can happen anytime, and as such the traversal of the list to get
the private console data for a given termno needs to happen with the
lock held.  Note users that modify the list already do so with the
lock taken.

Adjust current lock takers to use the _irq{save,restore} helpers,
since the context in which vtermno_to_xencons() is called can have
interrupts disabled.  Use the _irq{save,restore} set of helpers to
switch the current callers to disable interrupts in the locked region.
I haven't checked if existing users could instead use the _irq
variant, as I think it's safer to use _irq{save,restore} upfront.

While there switch from using list_for_each_entry_safe to
list_for_each_entry: the current entry cursor won't be removed as
part of the code in the loop body, so using the _safe variant is
pointless.

Fixes: 02e19f9c7cac ('hvc_xen: implement multiconsole support')
Signed-off-by: Roger Pau Monné 


Pushed to xen/tip.git for-linus-6.2


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 4/6] xen: make remove callback of xen driver void returned

2022-12-05 Thread Juergen Gross

On 05.12.22 16:36, Dawei Li wrote:

Since commit fc7a6209d571 ("bus: Make remove callback return
void") forces bus_type::remove be void-returned, it doesn't
make much sense for any bus based driver implementing remove
callbalk to return non-void to its caller.

This change is for xen bus based drivers.

Signed-off-by: Dawei Li 


Acked-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v2 30/44] cpuidle,xenpv: Make more PARAVIRT_XXL noinstr clean

2022-09-19 Thread Juergen Gross

On 19.09.22 12:00, Peter Zijlstra wrote:

vmlinux.o: warning: objtool: acpi_idle_enter_s2idle+0xde: call to wbinvd() 
leaves .noinstr.text section
vmlinux.o: warning: objtool: default_idle+0x4: call to arch_safe_halt() leaves 
.noinstr.text section
vmlinux.o: warning: objtool: xen_safe_halt+0xa: call to 
HYPERVISOR_sched_op.constprop.0() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) 
Reviewed-by: Srivatsa S. Bhat (VMware) 


Reviewed-by: Juergen Gross 


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 0/5] xen: cleanup detection of non-essential pv devices

2021-10-22 Thread Juergen Gross

On 22.10.21 09:24, Jan Beulich wrote:

On 22.10.2021 08:47, Juergen Gross wrote:

Today the non-essential pv devices are hard coded in the xenbus driver
and this list is lacking multiple entries.

This series reworks the detection logic of non-essential devices by
adding a flag for that purpose to struct xenbus_driver.


I'm wondering whether it wouldn't better be the other way around: The
(hopefully few) essential ones get flagged, thus also making it more
prominent during patch review that a flag gets added (and justification
provided), instead of having to spot the lack of a flag getting set.


Not flagging a non-essential one is less problematic than not flagging
an essential driver IMO.

For some drivers I'm on the edge, BTW. The pv 9pfs driver ought to be
non-essential in most cases, but there might be use cases where it is
needed, so I didn't set its non_essential flag.

Same applies to pv-usb and maybe pv-scsi, while pv-tpm probably really
is essential.

With the current series I'm ending up with 6 non-essential drivers and
6 essential ones, so either way needs the same number of drivers
modified.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH 0/5] xen: cleanup detection of non-essential pv devices

2021-10-22 Thread Juergen Gross
Today the non-essential pv devices are hard coded in the xenbus driver
and this list is lacking multiple entries.

This series reworks the detection logic of non-essential devices by
adding a flag for that purpose to struct xenbus_driver.

Juergen Gross (5):
  xen: add "not_essential" flag to struct xenbus_driver
  xen: flag xen_drm_front to be not essential for system boot
  xen: flag hvc_xen to be not essential for system boot
  xen: flag pvcalls-front to be not essential for system boot
  xen: flag xen_snd_front to be not essential for system boot

 drivers/gpu/drm/xen/xen_drm_front.c|  1 +
 drivers/input/misc/xen-kbdfront.c  |  1 +
 drivers/tty/hvc/hvc_xen.c  |  1 +
 drivers/video/fbdev/xen-fbfront.c  |  1 +
 drivers/xen/pvcalls-front.c|  1 +
 drivers/xen/xenbus/xenbus_probe_frontend.c | 14 +++---
 include/xen/xenbus.h   |  1 +
 sound/xen/xen_snd_front.c  |  1 +
 8 files changed, 10 insertions(+), 11 deletions(-)

-- 
2.26.2



[PATCH 3/5] xen: flag hvc_xen to be not essential for system boot

2021-10-22 Thread Juergen Gross
The Xen pv console driver is not essential for boot. Set the respective
flag.

Signed-off-by: Juergen Gross 
---
 drivers/tty/hvc/hvc_xen.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index f0bf01ea069a..71e0dd2c0ce5 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -522,6 +522,7 @@ static struct xenbus_driver xencons_driver = {
.remove = xencons_remove,
.resume = xencons_resume,
.otherend_changed = xencons_backend_changed,
+   .not_essential = true,
 };
 #endif /* CONFIG_HVC_XEN_FRONTEND */
 
-- 
2.26.2



Re: [PATCH 5/9] xen/x86: make "earlyprintk=xen" work for HVM/PVH DomU

2021-09-23 Thread Juergen Gross

On 07.09.21 12:10, Jan Beulich wrote:

xenboot_write_console() is dealing with these quite fine so I don't see
why xenboot_console_setup() would return -ENOENT in this case.

Adjust documentation accordingly.

Signed-off-by: Jan Beulich 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 3/9] xen/x86: make "earlyprintk=xen" work better for PVH Dom0

2021-09-23 Thread Juergen Gross

On 07.09.21 12:09, Jan Beulich wrote:

The xen_hvm_early_write() path better wouldn't be taken in this case;
while port 0xE9 can be used, the hypercall path is quite a bit more
efficient. Put that first, as it may also work for DomU-s (see also
xen_raw_console_write()).

While there also bail from the function when the first
domU_write_console() failed - later ones aren't going to succeed.

Signed-off-by: Jan Beulich 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 3/3] memblock: cleanup memblock_free interface

2021-09-23 Thread Juergen Gross

On 23.09.21 09:43, Mike Rapoport wrote:

From: Mike Rapoport 

For ages memblock_free() interface dealt with physical addresses even
despite the existence of memblock_alloc_xx() functions that return a
virtual pointer.

Introduce memblock_phys_free() for freeing physical ranges and repurpose
memblock_free() to free virtual pointers to make the following pairing
abundantly clear:

int memblock_phys_free(phys_addr_t base, phys_addr_t size);
phys_addr_t memblock_phys_alloc(phys_addr_t base, phys_addr_t size);

void *memblock_alloc(phys_addr_t size, phys_addr_t align);
void memblock_free(void *ptr, size_t size);

Replace intermediate memblock_free_ptr() with memblock_free() and drop
unnecessary aliases memblock_free_early() and memblock_free_early_nid().

Suggested-by: Linus Torvalds 
Signed-off-by: Mike Rapoport 


arch/x86/xen/ parts: Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 2/3] xen/x86: free_p2m_page: use memblock_free_ptr() to free a virtual pointer

2021-09-23 Thread Juergen Gross

On 23.09.21 09:43, Mike Rapoport wrote:

From: Mike Rapoport 

free_p2m_page() wrongly passes a virtual pointer to memblock_free() that
treats it as a physical address.

Call memblock_free_ptr() instead that gets a virtual address to free the
memory.

Signed-off-by: Mike Rapoport 


Reviewed-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH 0/2] kvm: fix KVM_MAX_VCPU_ID handling

2021-09-13 Thread Juergen Gross
Revert commit 76b4f357d0e7d8f6f00 which was based on wrong reasoning
and rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS in order to avoid the
same issue in future.

Juergen Gross (2):
  x86/kvm: revert commit 76b4f357d0e7d8f6f00
  kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS

 Documentation/virt/kvm/devices/xics.rst| 2 +-
 Documentation/virt/kvm/devices/xive.rst| 2 +-
 arch/mips/kvm/mips.c   | 2 +-
 arch/powerpc/include/asm/kvm_book3s.h  | 2 +-
 arch/powerpc/include/asm/kvm_host.h| 4 ++--
 arch/powerpc/kvm/book3s_xive.c | 2 +-
 arch/powerpc/kvm/powerpc.c | 2 +-
 arch/x86/include/asm/kvm_host.h| 2 +-
 arch/x86/kvm/ioapic.c  | 2 +-
 arch/x86/kvm/ioapic.h  | 4 ++--
 arch/x86/kvm/x86.c | 2 +-
 include/linux/kvm_host.h   | 4 ++--
 tools/testing/selftests/kvm/kvm_create_max_vcpus.c | 2 +-
 virt/kvm/kvm_main.c| 2 +-
 14 files changed, 17 insertions(+), 17 deletions(-)

-- 
2.26.2



[PATCH 2/2] kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS

2021-09-13 Thread Juergen Gross
KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the
number of allowed vcpu-ids. This has already led to confusion, so
rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more
clear

Suggested-by: Eduardo Habkost 
Signed-off-by: Juergen Gross 
---
 Documentation/virt/kvm/devices/xics.rst| 2 +-
 Documentation/virt/kvm/devices/xive.rst| 2 +-
 arch/mips/kvm/mips.c   | 2 +-
 arch/powerpc/include/asm/kvm_book3s.h  | 2 +-
 arch/powerpc/include/asm/kvm_host.h| 4 ++--
 arch/powerpc/kvm/book3s_xive.c | 2 +-
 arch/powerpc/kvm/powerpc.c | 2 +-
 arch/x86/include/asm/kvm_host.h| 2 +-
 arch/x86/kvm/ioapic.c  | 2 +-
 arch/x86/kvm/ioapic.h  | 4 ++--
 arch/x86/kvm/x86.c | 2 +-
 include/linux/kvm_host.h   | 4 ++--
 tools/testing/selftests/kvm/kvm_create_max_vcpus.c | 2 +-
 virt/kvm/kvm_main.c| 2 +-
 14 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/Documentation/virt/kvm/devices/xics.rst 
b/Documentation/virt/kvm/devices/xics.rst
index 2d6927e0b776..bf32c77174ab 100644
--- a/Documentation/virt/kvm/devices/xics.rst
+++ b/Documentation/virt/kvm/devices/xics.rst
@@ -22,7 +22,7 @@ Groups:
   Errors:
 
 ===  ==
--EINVAL  Value greater than KVM_MAX_VCPU_ID.
+-EINVAL  Value greater than KVM_MAX_VCPU_IDS.
 -EFAULT  Invalid user pointer for attr->addr.
 -EBUSY   A vcpu is already connected to the device.
 ===  ==
diff --git a/Documentation/virt/kvm/devices/xive.rst 
b/Documentation/virt/kvm/devices/xive.rst
index 8bdf3dc38f01..8b5e7b40bdf8 100644
--- a/Documentation/virt/kvm/devices/xive.rst
+++ b/Documentation/virt/kvm/devices/xive.rst
@@ -91,7 +91,7 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
 Errors:
 
   ===  ==
-  -EINVAL  Value greater than KVM_MAX_VCPU_ID.
+  -EINVAL  Value greater than KVM_MAX_VCPU_IDS.
   -EFAULT  Invalid user pointer for attr->addr.
   -EBUSY   A vCPU is already connected to the device.
   ===  ==
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 75c6f264c626..562aa878b266 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -1073,7 +1073,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long 
ext)
r = KVM_MAX_VCPUS;
break;
case KVM_CAP_MAX_VCPU_ID:
-   r = KVM_MAX_VCPU_ID;
+   r = KVM_MAX_VCPU_IDS;
break;
case KVM_CAP_MIPS_FPU:
/* We don't handle systems with inconsistent cpu_has_fpu */
diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index caaa0f592d8e..3d31f2c59e43 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -434,7 +434,7 @@ extern int kvmppc_h_logical_ci_store(struct kvm_vcpu *vcpu);
 #define SPLIT_HACK_OFFS0xfb00
 
 /*
- * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the
+ * This packs a VCPU ID from the [0..KVM_MAX_VCPU_IDS) space down to the
  * [0..KVM_MAX_VCPUS) space, using knowledge of the guest's core stride
  * (but not its actual threading mode, which is not available) to avoid
  * collisions.
diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 080a7feb7731..59cb38b04ede 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -33,11 +33,11 @@
 
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 #include /* for MAX_SMT_THREADS */
-#define KVM_MAX_VCPU_ID(MAX_SMT_THREADS * KVM_MAX_VCORES)
+#define KVM_MAX_VCPU_IDS   (MAX_SMT_THREADS * KVM_MAX_VCORES)
 #define KVM_MAX_NESTED_GUESTS  KVMPPC_NR_LPIDS
 
 #else
-#define KVM_MAX_VCPU_IDKVM_MAX_VCPUS
+#define KVM_MAX_VCPU_IDS   KVM_MAX_VCPUS
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #define __KVM_HAVE_ARCH_INTC_INITIALIZED
diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index a18db9e16ea4..225008882958 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1928,7 +1928,7 @@ int kvmppc_xive_set_nr_servers(struct kvmppc_xive *xive, 
u64 addr)
 
pr_devel("%s nr_servers=%u\n", __func__, nr_servers);
 
-   if (!nr_servers || nr_servers > KVM_MAX_VCPU_ID)
+   if (!nr_servers || nr_servers > KVM_MAX_VCPU_IDS)
return -EINVAL;
 
mutex_lock(>lock);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index b4e6f70b97b9..8ab90ce8738f 100644
--- a/arch/po

Re: [PATCH v2] xen/hvc: replace BUG_ON() with negative return value

2021-07-07 Thread Juergen Gross

On 07.07.21 11:57, Jan Beulich wrote:

On 07.07.2021 11:10, Juergen Gross wrote:

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 


Reviewed-by: Jan Beulich 


--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,7 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */


Largely unrelated note: While in general the barriers here may want
switching to virt_*mb(), this particular one looks to be too heavy
anyway: a read barrier is all that's needed here afaict, just like
there's only a write barrier between ring contents and producer
writing in __write_console().


I agree.


And btw, since I've got puzzled by the linuxppc-dev@ in the recipients
list, I did look up relevant entries in ./MAINTAINERS. Shouldn't the
file be part of "XEN HYPERVISOR INTERFACE"?


I wouldn't mind. Greg, Jiri, what do you think?


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH v2] xen/hvc: replace BUG_ON() with negative return value

2021-07-07 Thread Juergen Gross
Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
V2:
- drop BUG_ON() (Christophe Leroy, Greg Kroah-Hartmann)
- replace WARN_ONCE() by pr_err_once() (Greg Kroah-Hartmann)
- break out from original series
---
 drivers/tty/hvc/hvc_xen.c | 17 ++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..8f143c09a169 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,7 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */
-   BUG_ON((prod - cons) > sizeof(intf->out));
+
+   if ((prod - cons) > sizeof(intf->out)) {
+   pr_err_once("xencons: Illegal ring page indices");
+   return -EINVAL;
+   }
 
while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
intf->out[MASK_XENCONS_IDX(prod++, intf->out)] = data[sent++];
@@ -114,7 +118,10 @@ static int domU_write_console(uint32_t vtermno, const char 
*data, int len)
 */
while (len) {
int sent = __write_console(cons, data, len);
-   
+
+   if (sent < 0)
+   return sent;
+
data += sent;
len -= sent;
 
@@ -138,7 +145,11 @@ static int domU_read_console(uint32_t vtermno, char *buf, 
int len)
cons = intf->in_cons;
prod = intf->in_prod;
mb();   /* get pointers before reading ring */
-   BUG_ON((prod - cons) > sizeof(intf->in));
+
+   if ((prod - cons) > sizeof(intf->in)) {
+   pr_err_once("xencons: Illegal ring page indices");
+   return -EINVAL;
+   }
 
while (cons != prod && recv < len)
buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
-- 
2.26.2



Re: [PATCH] bus: Make remove callback return void

2021-07-06 Thread Juergen Gross

On 06.07.21 11:50, Uwe Kleine-König wrote:

The driver core ignores the return value of this callback because there
is only little it can do when a device disappears.

This is the final bit of a long lasting cleanup quest where several
buses were converted to also return void from their remove callback.
Additionally some resource leaks were fixed that were caused by drivers
returning an error code in the expectation that the driver won't go
away.

With struct bus_type::remove returning void it's prevented that newly
implemented buses return an ignored error code and so don't anticipate
wrong expectations for driver authors.

Signed-off-by: Uwe Kleine-König 


Xen-bits:

Acked-by: Juergen Gross 


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH 0/8] xen: harden frontends against malicious backends

2021-05-13 Thread Juergen Gross
Xen backends of para-virtualized devices can live in dom0 kernel, dom0
user land, or in a driver domain. This means that a backend might
reside in a less trusted environment than the Xen core components, so
a backend should not be able to do harm to a Xen guest (it can still
mess up I/O data, but it shouldn't be able to e.g. crash a guest by
other means or cause a privilege escalation in the guest).

Unfortunately many frontends in the Linux kernel are fully trusting
their respective backends. This series is starting to fix the most
important frontends: console, disk and network.

It was discussed to handle this as a security problem, but the topic
was discussed in public before, so it isn't a real secret.

Juergen Gross (8):
  xen: sync include/xen/interface/io/ring.h with Xen's newest version
  xen/blkfront: read response from backend only once
  xen/blkfront: don't take local copy of a request from the ring page
  xen/blkfront: don't trust the backend response data blindly
  xen/netfront: read response from backend only once
  xen/netfront: don't read data from request on the ring page
  xen/netfront: don't trust the backend response data blindly
  xen/hvc: replace BUG_ON() with negative return value

 drivers/block/xen-blkfront.c| 118 +-
 drivers/net/xen-netfront.c  | 184 ++---
 drivers/tty/hvc/hvc_xen.c   |  15 +-
 include/xen/interface/io/ring.h | 278 ++--
 4 files changed, 369 insertions(+), 226 deletions(-)

-- 
2.26.2



Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

2021-05-13 Thread Juergen Gross

On 13.05.21 12:25, Greg Kroah-Hartman wrote:

On Thu, May 13, 2021 at 12:03:02PM +0200, Juergen Gross wrote:

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
  drivers/tty/hvc/hvc_xen.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+ "Illegal ring page indices"))
+   return -EINVAL;


How nice, you just rebooted on panic-on-warn systems :(


+
BUG_ON((prod - cons) > sizeof(intf->out));


Why keep this line?


Failed to delete it, sorry.



Please just fix this up properly, if userspace can trigger this, then
both the WARN_ON() and BUG_ON() are not correct and need to be correctly
handled.


It can be triggered by the console backend, but I agree a WARN isn't the
way to go here.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

2021-05-13 Thread Juergen Gross

On 13.05.21 12:16, Christophe Leroy wrote:



Le 13/05/2021 à 12:03, Juergen Gross a écrit :

Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
  drivers/tty/hvc/hvc_xen.c | 15 +--
  1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info 
*xencons,

  cons = intf->out_cons;
  prod = intf->out_prod;
  mb();    /* update queue values before going on */
+
+    if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+  "Illegal ring page indices"))
+    return -EINVAL;
+
  BUG_ON((prod - cons) > sizeof(intf->out));


Why keep the BUG_ON() ?


Oh, failed to delete it. Thanks for noticing.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH 8/8] xen/hvc: replace BUG_ON() with negative return value

2021-05-13 Thread Juergen Gross
Xen frontends shouldn't BUG() in case of illegal data received from
their backends. So replace the BUG_ON()s when reading illegal data from
the ring page with negative return values.

Signed-off-by: Juergen Gross 
---
 drivers/tty/hvc/hvc_xen.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 92c9a476defc..30d7ffb1e04c 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -86,6 +86,11 @@ static int __write_console(struct xencons_info *xencons,
cons = intf->out_cons;
prod = intf->out_prod;
mb();   /* update queue values before going on */
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->out),
+ "Illegal ring page indices"))
+   return -EINVAL;
+
BUG_ON((prod - cons) > sizeof(intf->out));
 
while ((sent < len) && ((prod - cons) < sizeof(intf->out)))
@@ -114,7 +119,10 @@ static int domU_write_console(uint32_t vtermno, const char 
*data, int len)
 */
while (len) {
int sent = __write_console(cons, data, len);
-   
+
+   if (sent < 0)
+   return sent;
+
data += sent;
len -= sent;
 
@@ -138,7 +146,10 @@ static int domU_read_console(uint32_t vtermno, char *buf, 
int len)
cons = intf->in_cons;
prod = intf->in_prod;
mb();   /* get pointers before reading ring */
-   BUG_ON((prod - cons) > sizeof(intf->in));
+
+   if (WARN_ONCE((prod - cons) > sizeof(intf->in),
+ "Illegal ring page indices"))
+   return -EINVAL;
 
while (cons != prod && recv < len)
buf[recv++] = intf->in[MASK_XENCONS_IDX(cons++, intf->in)];
-- 
2.26.2



Re: [Xen-devel] [PATCH 19/21] treewide: add checks for the return value of memblock_alloc*()

2019-01-16 Thread Juergen Gross
On 16/01/2019 14:44, Mike Rapoport wrote:
> Add check for the return value of memblock_alloc*() functions and call
> panic() in case of error.
> The panic message repeats the one used by panicing memblock allocators with
> adjustment of parameters to include only relevant ones.
> 
> The replacement was mostly automated with semantic patches like the one
> below with manual massaging of format strings.
> 
> @@
> expression ptr, size, align;
> @@
> ptr = memblock_alloc(size, align);
> + if (!ptr)
> + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", __func__,
> size, align);
> 
> Signed-off-by: Mike Rapoport 

For the Xen part:

Reviewed-by: Juergen Gross 


Juergen


Re: [PATCH] idle/x86: remove the call to boot_init_stack_canary() from cpu_startup_entry()

2018-10-19 Thread Juergen Gross
On 19/10/2018 11:29, Christophe Leroy wrote:
> commit d7880812b359 ("idle: Add the stack canary init to
> cpu_startup_entry()") added the call to boot_init_stack_canary()
> in cpu_startup_entry() in an #ifdef CONFIG_X86 statement, with
> the intention to remove that #ifdef later.
> 
> While implementing stack protector for powerpc, it has been
> observed that calling boot_init_stack_canary() is also needed
> for powerpc which uses per task (TLS) stack canary like the X86.
> 
> However, calling boot_init_stack_canary() would break arches
> using global stack canary (ARM, SH, MIPS and XTENSA).
> 
> Instead of adding modifying the #ifdef in a
> implemented the call to boot_init_stack_canary() in the function
> calling cpu_startup_entry()

I can't parse this sentence.

> 
> On x86, we have two functions calling cpu_startup_entry():
> - start_secondary()
> - cpu_bringup_and_idle()
> 
> start_secondary() already calls boot_init_stack_canary().
> 
> This patch adds the call to boot_init_stack_canary() in
> cpu_bringup_and_idle() and removes it from cpu_startup_entry()
> 
> Signed-off-by: Christophe Leroy 

With the commit message made understandable you can add my

Reviewed-by: Juergen Gross 


Juergen


Re: [PATCH 2/5] stop_machine: yield CPU during stop machine

2016-10-21 Thread Juergen Gross
On 21/10/16 14:05, Peter Zijlstra wrote:
> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>> stop_machine can take a very long time if the hypervisor does
>> overcommitment for guest CPUs. When waiting for "the one", lets
>> give up our CPU by using the new cpu_relax_yield.
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.

Corrected xen-devel mail address.


Juergen

> 
>> Signed-off-by: Christian Borntraeger 
>> ---
>>  kernel/stop_machine.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index ec9ab2f..1eb8266 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>  /* Simple state machine */
>>  do {
>>  /* Chill out and ensure we re-read multi_stop_state. */
>> -cpu_relax();
>> +cpu_relax_yield();
>>  if (msdata->state != curstate) {
>>  curstate = msdata->state;
>>  switch (curstate) {
>> -- 
>> 2.5.5
>>
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 



Re: [PATCH v5 7/9] x86, xen: support vcpu preempted check

2016-10-20 Thread Juergen Gross
Corrected xen-devel mailing list address, added other Xen maintainers

On 20/10/16 23:27, Pan Xinhui wrote:
> From: Juergen Gross <jgr...@suse.com>
> 
> Support the vcpu_is_preempted() functionality under Xen. This will
> enhance lock performance on overcommitted hosts (more runnable vcpus
> than physical cpus in the system) as doing busy waits for preempted
> vcpus will hurt system performance far worse than early yielding.
> 
> A quick test (4 vcpus on 1 physical cpu doing a parallel build job
> with "make -j 8") reduced system time by about 5% with this patch.
> 
> Signed-off-by: Juergen Gross <jgr...@suse.com>
> Signed-off-by: Pan Xinhui <xinhui@linux.vnet.ibm.com>
> ---
>  arch/x86/xen/spinlock.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 3d6e006..74756bb 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -114,7 +114,6 @@ void xen_uninit_lock_cpu(int cpu)
>   per_cpu(irq_name, cpu) = NULL;
>  }
>  
> -
>  /*
>   * Our init of PV spinlocks is split in two init functions due to us
>   * using paravirt patching and jump labels patching and having to do
> @@ -137,6 +136,8 @@ void __init xen_init_spinlocks(void)
>   pv_lock_ops.queued_spin_unlock = 
> PV_CALLEE_SAVE(__pv_queued_spin_unlock);
>   pv_lock_ops.wait = xen_qlock_wait;
>   pv_lock_ops.kick = xen_qlock_kick;
> +
> + pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen;
>  }
>  
>  /*
> 



Re: [PATCH v4 0/5] implement vcpu preempted check

2016-10-19 Thread Juergen Gross
On 19/10/16 12:20, Pan Xinhui wrote:
> change from v3:
>   add x86 vcpu preempted check patch
> change from v2:
>   no code change, fix typos, update some comments
> change from v1:
>   a simplier definition of default vcpu_is_preempted
>   skip mahcine type check on ppc, and add config. remove dedicated macro.
>   add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner. 
>   add more comments
>   thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch 
> set
> 
> We also have observed some performace improvements.
> 
> PPC test result:
> 
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy -  3.04%
> 16 copy - 15.11%
> 
> details below:
> Without patch:
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2188223.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1804433.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1237257.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1032658.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   768000.0 KBps  (30.1 
> s, 1 samples)
> 
> With patch: 
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2209189.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1943816.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1405591.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1065080.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   904762.0 KBps  (30.0 
> s, 1 samples)
> 
> X86 test result:
>   test-case   after-patch   before-patch
> Execl Throughput   |18307.9 lps  |11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks|   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput| 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching   |  1495126.5 lps  |  1490533.9 lps 
> Process Creation   |29881.2 lps  |28572.8 lps 
> Shell Scripts (1 concurrent)   |23224.3 lpm  |22607.4 lpm 
> Shell Scripts (8 concurrent)   | 3531.4 lpm  | 3211.9 lpm 
> System Call Overhead   | 10385653.0 lps  | 10419979.0 lps 
> 
> Pan Xinhui (5):
>   kernel/sched: introduce vcpu preempted check interface
>   locking/osq: Drop the overload of osq_lock()
>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>   powerpc/spinlock: support vcpu preempted check
>   x86, kvm: support vcpu preempted check

The attached patch adds Xen support for x86. Please tell me whether you
want to add this patch to your series or if I should post it when your
series has been accepted.

You can add my

Tested-by: Juergen Gross <jgr...@suse.com>

for patches 1-3 and 5 (paravirt parts only).


Juergen

> 
>  arch/powerpc/include/asm/spinlock.h   |  8 
>  arch/x86/include/asm/paravirt_types.h |  6 ++
>  arch/x86/include/asm/spinlock.h   |  8 
>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>  arch/x86/kernel/kvm.c | 11 +++
>  arch/x86/kernel/paravirt.c| 11 +++
>  arch/x86/kvm/x86.c| 12 
>  include/linux/sched.h | 12 
>  kernel/locking/mutex.c    | 15 +--
>  kernel/locking/osq_lock.c | 10 +-
>  kernel/locking/rwsem-xadd.c   | 16 +---
>  11 files changed, 105 insertions(+), 7 deletions(-)
> 

>From c79b86d00a812d6207ef788d453e2d0

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-11 Thread Juergen Gross
On 11/07/16 17:10, Waiman Long wrote:
> On 07/06/2016 02:52 AM, Peter Zijlstra wrote:
>> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>>> change fomr v1:
>>> a simplier definition of default vcpu_is_preempted
>>> skip mahcine type check on ppc, and add config. remove dedicated
>>> macro.
>>> add one patch to drop overload of rwsem_spin_on_owner and
>>> mutex_spin_on_owner.
>>> add more comments
>>> thanks boqun and Peter's suggestion.
>>>
>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> test-case:
>>> perf record -a perf bench sched messaging -g 400 -p&&  perf report
>>>
>>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>>   5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>>   3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>>   3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>>   3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>>   2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>>
>>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in
>>> some spin
>>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>>> These spin_on_onwer variant also cause rcu stall before we apply this
>>> patch set
>>>
>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Waiman, could you see if you can utilize this to get rid of the
>> SPIN_THRESHOLD in qspinlock_paravirt?
> 
> That API is certainly useful to make the paravirt spinlock perform
> better. However, I am not sure if we can completely get rid of the
> SPIN_THRESHOLD at this point. It is not just the kvm, the xen code need
> to be modified as well.

This should be rather easy. The relevant information is included in the
runstate data mapped into kernel memory. I can provide a patch for Xen
if needed.


Juergen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Juergen Gross
On 06/07/16 10:19, Peter Zijlstra wrote:
> On Wed, Jul 06, 2016 at 09:47:18AM +0200, Juergen Gross wrote:
>> On 06/07/16 08:52, Peter Zijlstra wrote:
> 
>>> Paolo, could you help out with an (x86) KVM interface for this?
>>
>> Xen support of this interface should be rather easy. Could you please
>> Cc: xen-devel-requ...@lists.xenproject.org in the next version?
> 
> So meta question; aren't all you virt people looking at the regular
> virtualization list? Or should we really dig out all the various
> hypervisor lists and Cc them?

Hmm, good question. Up to now I didn't look at the virtualization list,
just changed that. :-)

Can't speak for the other virt people.


Juergen

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/4] implement vcpu preempted check

2016-07-06 Thread Juergen Gross
On 06/07/16 08:52, Peter Zijlstra wrote:
> On Tue, Jun 28, 2016 at 10:43:07AM -0400, Pan Xinhui wrote:
>> change fomr v1:
>>  a simplier definition of default vcpu_is_preempted
>>  skip mahcine type check on ppc, and add config. remove dedicated macro.
>>  add one patch to drop overload of rwsem_spin_on_owner and 
>> mutex_spin_on_owner. 
>>  add more comments
>>  thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>>
>> test-case:
>> perf record -a perf bench sched messaging -g 400 -p && perf report
>>
>> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
>> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
>>
>> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some 
>> spin
>> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
>> These spin_on_onwer variant also cause rcu stall before we apply this patch 
>> set
>>
> 
> Paolo, could you help out with an (x86) KVM interface for this?

Xen support of this interface should be rather easy. Could you please
Cc: xen-devel-requ...@lists.xenproject.org in the next version?


Juergen
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev