On 24.10.23 12:41, Juergen Gross wrote:
On 24.10.23 09:43, David Woodhouse wrote:
On Tue, 2023-10-24 at 08:53 +0200, Juergen Gross wrote:

I'm puzzled. This path doesn't contain any of the RCU usage I've added in
commit 87797fad6cce.

Are you sure that with just reverting commit 87797fad6cce the issue doesn't
manifest anymore? I'd rather expect commit 721255b9826b having caused this
behavior, just telling from the messages above.

Retesting in the cold light of day, yes. Using v6.6-rc5 which is the
parent commit of the offending 87797fad6cce.

I now see this warning at boot time again, which I believe was an
aspect of what you were trying to fix:

[    0.059014] xen:events: Using FIFO-based ABI
[    0.059029] xen:events: Xen HVM callback vector for event delivery is enabled
[    0.059227] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.059296]
[    0.059297] =============================
[    0.059298] [ BUG: Invalid wait context ]
[    0.059299] 6.6.0-rc5 #1374 Not tainted
[    0.059300] -----------------------------
[    0.059301] swapper/0/0 is trying to lock:
[    0.059303] ffffffff8ad595f8 (evtchn_rwlock){....}-{3:3}, at: xen_evtchn_do_upcall+0x59/0xd0

Indeed.

What I still not get is why the rcu_dereference_check() splat isn't
happening without my patch.

IMHO it should be related to the fact that cpuhp_report_idle_dead()
is trying to send an IPI via xen_send_IPI_one(), which is using
notify_remote_via_irq(), which in turn needs to call irq_get_chip_data().
This is using the maple-tree since 721255b9826b, which is using
rcu_read_lock().

I can probably change xen_send_IPI_one() to not need irq_get_chip_data().

David, could you test the attached patch, please? Build tested only.


Juergen
From 9bcb7b86578db4c14f460ce954eb35b821e11ad8 Mon Sep 17 00:00:00 2001
From: Juergen Gross <jgr...@suse.com>
Date: Tue, 24 Oct 2023 13:51:36 +0200
Subject: [PATCH] xen/events: avoid using info_for_irq() in xen_send_IPI_one()

xen_send_IPI_one() is being used by cpuhp_report_idle_dead() after
it calls rcu_report_dead(), meaning that any RCU usage by
xen_send_IPI_one() is a bad idea.

Unfortunately xen_send_IPI_one() is using notify_remote_via_irq()
today, which is using irq_get_chip_data() via info_for_irq(). And
irq_get_chip_data() in turn is using a maple-tree lookup requiring
RCU.

Avoid this problem by caching the ipi event channels in another
percpu variable, allowing the use notify_remote_via_evtchn() in
xen_send_IPI_one().

Fixes: 721255b9826b ("genirq: Use a maple tree for interrupt descriptor management")
Reported-by: David Woodhouse <dw...@infradead.org>
Signed-off-by: Juergen Gross <jgr...@suse.com>
---
 drivers/xen/events/events_base.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 1b2136fe0fa5..2cf0c2b69386 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -164,6 +164,8 @@ static DEFINE_PER_CPU(int [NR_VIRQS], virq_to_irq) = {[0 ... NR_VIRQS-1] = -1};
 
 /* IRQ <-> IPI mapping */
 static DEFINE_PER_CPU(int [XEN_NR_IPIS], ipi_to_irq) = {[0 ... XEN_NR_IPIS-1] = -1};
+/* Cache for IPI event channels - needed for hot cpu unplug (avoid RCU usage). */
+static DEFINE_PER_CPU(evtchn_port_t [XEN_NR_IPIS], ipi_to_evtchn) = {[0 ... XEN_NR_IPIS-1] = 0};
 
 /* Event channel distribution data */
 static atomic_t channels_on_cpu[NR_CPUS];
@@ -366,6 +368,7 @@ static int xen_irq_info_ipi_setup(unsigned cpu,
 	info->u.ipi = ipi;
 
 	per_cpu(ipi_to_irq, cpu)[ipi] = irq;
+	per_cpu(ipi_to_evtchn, cpu)[ipi] = evtchn;
 
 	return xen_irq_info_common_setup(info, irq, IRQT_IPI, evtchn, 0);
 }
@@ -981,6 +984,7 @@ static void __unbind_from_irq(unsigned int irq)
 			break;
 		case IRQT_IPI:
 			per_cpu(ipi_to_irq, cpu)[ipi_from_irq(irq)] = -1;
+			per_cpu(ipi_to_evtchn, cpu)[ipi_from_irq(irq)] = 0;
 			break;
 		case IRQT_EVTCHN:
 			dev = info->u.interdomain;
@@ -1631,7 +1635,7 @@ EXPORT_SYMBOL_GPL(evtchn_put);
 
 void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)
 {
-	int irq;
+	evtchn_port_t evtchn;
 
 #ifdef CONFIG_X86
 	if (unlikely(vector == XEN_NMI_VECTOR)) {
@@ -1642,9 +1646,9 @@ void xen_send_IPI_one(unsigned int cpu, enum ipi_vector vector)
 		return;
 	}
 #endif
-	irq = per_cpu(ipi_to_irq, cpu)[vector];
-	BUG_ON(irq < 0);
-	notify_remote_via_irq(irq);
+	evtchn = per_cpu(ipi_to_evtchn, cpu)[vector];
+	BUG_ON(evtchn == 0);
+	notify_remote_via_evtchn(evtchn);
 }
 
 struct evtchn_loop_ctrl {
-- 
2.35.3

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to