Re: Linux 6.7-rc1+: WARNING at drivers/xen/evtchn.c:167 evtchn_interrupt

2024-03-12 Thread Demi Marie Obenour
On Tue, Mar 12, 2024 at 02:10:02PM +0100, Juergen Gross wrote:
> On 30.11.23 03:34, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > While testing 6.7-rc3 on Qubes OS I found several warning like in the
> > subject in dom0 log. I see them when running 6.7-rc1 too. I'm not sure
> > what exactly triggers the issue, but my guess would be unbinding an
> > event channel from userspace (closing vchan connection).
> > 
> > Specific message:
> > 
> > [   83.973377] [ cut here ]
> > [   83.975523] Interrupt for port 77, but apparently not enabled; per-user 
> > a0e9f1d1
> 
> Finally I think I have a fix (thanks to Demi for finding the problematic patch
> through bisecting).
> 
> Could you please try the attached patch? It is based on current upstream, but
> I think it should apply to 6.7 or stable 6.6, too.
> 
> 
> Juergen

> From 9d673c37b2d0c9aa274c53f619c4e9e43a419f41 Mon Sep 17 00:00:00 2001
> From: Juergen Gross 
> To: linux-ker...@vger.kernel.org
> Cc: Juergen Gross 
> Cc: Stefano Stabellini 
> Cc: Oleksandr Tyshchenko 
> Cc: xen-devel@lists.xenproject.org
> Date: Tue, 12 Mar 2024 13:52:24 +0100
> Subject: [PATCH] xen/evtchn: avoid WARN() when unbinding an event channel
> 
> When unbinding a user event channel, the related handler might be
> called a last time in case the kernel was built with
> CONFIG_DEBUG_SHIRQ. This might cause a WARN() in the handler.
> 
> Avoid that by adding an "unbinding" flag to struct user_event which
> will short circuit the handler.
> 
> Fixes: 9e90e58c11b7 ("xen: evtchn: Allow shared registration of IRQ handers")
> Signed-off-by: Juergen Gross 
> ---
>  drivers/xen/evtchn.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
> index 59717628ca42..f6a2216c2c87 100644
> --- a/drivers/xen/evtchn.c
> +++ b/drivers/xen/evtchn.c
> @@ -85,6 +85,7 @@ struct user_evtchn {
>   struct per_user_data *user;
>   evtchn_port_t port;
>   bool enabled;
> + bool unbinding;
>  };
>  
>  static void evtchn_free_ring(evtchn_port_t *ring)
> @@ -164,6 +165,10 @@ static irqreturn_t evtchn_interrupt(int irq, void *data)
>   struct per_user_data *u = evtchn->user;
>   unsigned int prod, cons;
>  
> + /* Handler might be called when tearing down the IRQ. */
> + if (evtchn->unbinding)
> + return IRQ_HANDLED;
> +
>   WARN(!evtchn->enabled,
>"Interrupt for port %u, but apparently not enabled; per-user %p\n",
>evtchn->port, u);
> @@ -421,6 +426,7 @@ static void evtchn_unbind_from_user(struct per_user_data 
> *u,
>  
>   BUG_ON(irq < 0);
>  
> + evtchn->unbinding = true;
>   unbind_from_irqhandler(irq, evtchn);
>  
>   del_evtchn(u, evtchn);
> -- 
> 2.35.3

That fixes the problem, thanks!  Code looks good to me.

Reported-by: Demi Marie Obenour 
Tested-by: Demi Marie Obenour 
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab


signature.asc
Description: PGP signature


Re: Linux 6.7-rc1+: WARNING at drivers/xen/evtchn.c:167 evtchn_interrupt

2024-03-12 Thread Jürgen Groß

On 30.11.23 03:34, Marek Marczykowski-Górecki wrote:

Hi,

While testing 6.7-rc3 on Qubes OS I found several warning like in the
subject in dom0 log. I see them when running 6.7-rc1 too. I'm not sure
what exactly triggers the issue, but my guess would be unbinding an
event channel from userspace (closing vchan connection).

Specific message:

[   83.973377] [ cut here ]
[   83.975523] Interrupt for port 77, but apparently not enabled; per-user 
a0e9f1d1


Finally I think I have a fix (thanks to Demi for finding the problematic patch
through bisecting).

Could you please try the attached patch? It is based on current upstream, but
I think it should apply to 6.7 or stable 6.6, too.


Juergen
From 9d673c37b2d0c9aa274c53f619c4e9e43a419f41 Mon Sep 17 00:00:00 2001
From: Juergen Gross 
To: linux-ker...@vger.kernel.org
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Oleksandr Tyshchenko 
Cc: xen-devel@lists.xenproject.org
Date: Tue, 12 Mar 2024 13:52:24 +0100
Subject: [PATCH] xen/evtchn: avoid WARN() when unbinding an event channel

When unbinding a user event channel, the related handler might be
called a last time in case the kernel was built with
CONFIG_DEBUG_SHIRQ. This might cause a WARN() in the handler.

Avoid that by adding an "unbinding" flag to struct user_event which
will short circuit the handler.

Fixes: 9e90e58c11b7 ("xen: evtchn: Allow shared registration of IRQ handers")
Signed-off-by: Juergen Gross 
---
 drivers/xen/evtchn.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/xen/evtchn.c b/drivers/xen/evtchn.c
index 59717628ca42..f6a2216c2c87 100644
--- a/drivers/xen/evtchn.c
+++ b/drivers/xen/evtchn.c
@@ -85,6 +85,7 @@ struct user_evtchn {
 	struct per_user_data *user;
 	evtchn_port_t port;
 	bool enabled;
+	bool unbinding;
 };
 
 static void evtchn_free_ring(evtchn_port_t *ring)
@@ -164,6 +165,10 @@ static irqreturn_t evtchn_interrupt(int irq, void *data)
 	struct per_user_data *u = evtchn->user;
 	unsigned int prod, cons;
 
+	/* Handler might be called when tearing down the IRQ. */
+	if (evtchn->unbinding)
+		return IRQ_HANDLED;
+
 	WARN(!evtchn->enabled,
 	 "Interrupt for port %u, but apparently not enabled; per-user %p\n",
 	 evtchn->port, u);
@@ -421,6 +426,7 @@ static void evtchn_unbind_from_user(struct per_user_data *u,
 
 	BUG_ON(irq < 0);
 
+	evtchn->unbinding = true;
 	unbind_from_irqhandler(irq, evtchn);
 
 	del_evtchn(u, evtchn);
-- 
2.35.3



Re: Linux 6.7-rc1+: WARNING at drivers/xen/evtchn.c:167 evtchn_interrupt

2024-01-29 Thread Marek Marczykowski-Górecki
On Sat, Dec 02, 2023 at 10:46:38PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Dec 01, 2023 at 09:29:30AM +0100, Juergen Gross wrote:
> > Hi Marek,
> > 
> > On 30.11.23 03:34, Marek Marczykowski-Górecki wrote:
> > > Hi,
> > > 
> > > While testing 6.7-rc3 on Qubes OS I found several warning like in the
> > > subject in dom0 log. I see them when running 6.7-rc1 too. I'm not sure
> > > what exactly triggers the issue, but my guess would be unbinding an
> > > event channel from userspace (closing vchan connection).
> > > 
> > > Specific message:
> > > 
> > > [   83.973377] [ cut here ]
> > > [   83.975523] Interrupt for port 77, but apparently not enabled; 
> > > per-user a0e9f1d1
> > 
> > Just a guess, but I think this might happen when the vchan connection
> > is closed while there is still some traffic. This could result in events
> > triggering in parallel to unbinding the event channel.
> > 
> > Could you please test the attached patch (only compile tested)?
> 
> Unfortunately that doesn't help, I get exactly the same traceback.

Hi, this is still an issue on 6.7.2, any other ideas?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: Linux 6.7-rc1+: WARNING at drivers/xen/evtchn.c:167 evtchn_interrupt

2023-12-02 Thread Marek Marczykowski-Górecki
On Fri, Dec 01, 2023 at 09:29:30AM +0100, Juergen Gross wrote:
> Hi Marek,
> 
> On 30.11.23 03:34, Marek Marczykowski-Górecki wrote:
> > Hi,
> > 
> > While testing 6.7-rc3 on Qubes OS I found several warning like in the
> > subject in dom0 log. I see them when running 6.7-rc1 too. I'm not sure
> > what exactly triggers the issue, but my guess would be unbinding an
> > event channel from userspace (closing vchan connection).
> > 
> > Specific message:
> > 
> > [   83.973377] [ cut here ]
> > [   83.975523] Interrupt for port 77, but apparently not enabled; per-user 
> > a0e9f1d1
> 
> Just a guess, but I think this might happen when the vchan connection
> is closed while there is still some traffic. This could result in events
> triggering in parallel to unbinding the event channel.
> 
> Could you please test the attached patch (only compile tested)?

Unfortunately that doesn't help, I get exactly the same traceback.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: Linux 6.7-rc1+: WARNING at drivers/xen/evtchn.c:167 evtchn_interrupt

2023-12-01 Thread Juergen Gross

Hi Marek,

On 30.11.23 03:34, Marek Marczykowski-Górecki wrote:

Hi,

While testing 6.7-rc3 on Qubes OS I found several warning like in the
subject in dom0 log. I see them when running 6.7-rc1 too. I'm not sure
what exactly triggers the issue, but my guess would be unbinding an
event channel from userspace (closing vchan connection).

Specific message:

[   83.973377] [ cut here ]
[   83.975523] Interrupt for port 77, but apparently not enabled; per-user 
a0e9f1d1


Just a guess, but I think this might happen when the vchan connection
is closed while there is still some traffic. This could result in events
triggering in parallel to unbinding the event channel.

Could you please test the attached patch (only compile tested)?


Juergen

From e29761a8d5d68d7432659d601af60f556e3136a9 Mon Sep 17 00:00:00 2001
From: Juergen Gross 
Date: Fri, 1 Dec 2023 08:03:39 +0100
Subject: [PATCH] xen/events: fix race when unbinding an event channel
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When unbinding an event channel the unbind function should only return
to the caller when no interrupt handler can be active any more.

When switching from a rwlock to RCU for freeing event channels this
assumption was broken.

Fix that by replacing the "is_active" flag of an event channel with an
atomic used to keep track of event channel freeing AND interrupt
handler activity.

Rearrange struct irq_info a little bit to avoid adding new padding
holes.

Fixes: 87797fad6cce ("xen/events: replace evtchn_rwlock with RCU")
Reported-by: Marek Marczykowski-Górecki 
Signed-off-by: Juergen Gross 
---
 drivers/xen/events/events_base.c | 51 ++--
 1 file changed, 35 insertions(+), 16 deletions(-)

diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index f5edb9e27e3c..c480bc498ba8 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -106,7 +106,7 @@ struct irq_info {
 #define EVT_MASK_REASON_EXPLICIT	0x01
 #define EVT_MASK_REASON_TEMPORARY	0x02
 #define EVT_MASK_REASON_EOI_PENDING	0x04
-	u8 is_active;		/* Is event just being handled? */
+	bool is_static;   /* Is event channel static */
 	unsigned irq;
 	evtchn_port_t evtchn;   /* event channel */
 	unsigned short cpu; /* cpu bound */
@@ -114,7 +114,23 @@ struct irq_info {
 	unsigned int irq_epoch; /* If eoi_cpu valid: irq_epoch of event */
 	u64 eoi_time;   /* Time in jiffies when to EOI. */
 	raw_spinlock_t lock;
-	bool is_static;   /* Is event channel static */
+
+	/*
+	 * When freeing an event channel it is marked with EVT_ACT_FREEING.
+	 * When an IRQ handler is starting the event channel is marked with
+	 * EVT_ACT_HANDLER | EVT_ACT_EOI.
+	 * When leaving the IRQ handler EVT_ACT_HANDLER is cleared. EVT_ACT_EOI
+	 * is cleared either at the same time or in case of a late-EOI event
+	 * when the EOI is signalled.
+	 * A IRQ handler may only be entered if none of the flags is set.
+	 * Freeing an event channel is done via RCU, but the freeing function
+	 * starting the RCU process may only return if EVT_ACT_HANDLER is not
+	 * set (no handler is active any more).
+	 */
+	atomic_t active;	/* Is event just being handled? */
+#define EVT_ACT_HANDLER		0x0001	/* IRQ handler active */
+#define EVT_ACT_EOI		0x0002	/* EOI pending */
+#define EVT_ACT_FREEING		0x0004	/* even channel being freed */
 
 	union {
 		unsigned short virq;
@@ -640,8 +656,8 @@ static void xen_irq_lateeoi_locked(struct irq_info *info, bool spurious)
 
 	info->eoi_time = 0;
 
-	/* is_active hasn't been reset yet, do it now. */
-	smp_store_release(>is_active, 0);
+	/* EVT_ACT_EOI hasn't been reset yet, do it now. */
+	atomic_fetch_andnot_release(EVT_ACT_EOI, >active);
 	do_unmask(info, EVT_MASK_REASON_EOI_PENDING);
 }
 
@@ -792,9 +808,9 @@ static void xen_free_irq(struct irq_info *info)
 }
 
 /* Not called for lateeoi events. */
-static void event_handler_exit(struct irq_info *info)
+static void event_handler_exit(struct irq_info *info, int flag)
 {
-	smp_store_release(>is_active, 0);
+	atomic_fetch_andnot_release(EVT_ACT_HANDLER | flag, >active);
 	clear_evtchn(info->evtchn);
 }
 
@@ -819,7 +835,7 @@ static void do_eoi_pirq(struct irq_info *info)
 	if (!VALID_EVTCHN(info->evtchn))
 		return;
 
-	event_handler_exit(info);
+	event_handler_exit(info, EVT_ACT_EOI);
 
 	if (pirq_needs_eoi(info)) {
 		rc = HYPERVISOR_physdev_op(PHYSDEVOP_eoi, );
@@ -968,6 +984,8 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
 			return;
 	}
 
+	atomic_fetch_or_acquire(EVT_ACT_FREEING, >active);
+
 	evtchn = info->evtchn;
 
 	if (VALID_EVTCHN(evtchn)) {
@@ -997,6 +1015,10 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
 		xen_irq_info_cleanup(info);
 	}
 
+	/* Wait until all IRQ handlers are gone. */
+	while (atomic_read_acquire(>active) & EVT_ACT_HANDLER)
+		cpu_relax();
+
 	xen_free_irq(info);

Linux 6.7-rc1+: WARNING at drivers/xen/evtchn.c:167 evtchn_interrupt

2023-11-29 Thread Marek Marczykowski-Górecki
Hi,

While testing 6.7-rc3 on Qubes OS I found several warning like in the
subject in dom0 log. I see them when running 6.7-rc1 too. I'm not sure
what exactly triggers the issue, but my guess would be unbinding an
event channel from userspace (closing vchan connection).

Specific message:

[   83.973377] [ cut here ]
[   83.975523] Interrupt for port 77, but apparently not enabled; per-user 
a0e9f1d1
[   83.979083] WARNING: CPU: 1 PID: 2432 at drivers/xen/evtchn.c:167 
evtchn_interrupt+0xb6/0xc0 [xen_evtchn]
[   83.982869] Modules linked in: joydev intel_rapl_msr intel_rapl_common 
snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg 
snd_intel_sdw_acpi ppdev snd_hda_codec snd_hda_core snd_hwdep snd_seq 
snd_seq_device snd_pcm parport_pc parport pcspkr snd_timer snd e1000e i2c_piix4 
soundcore fuse loop xenfs dm_thin_pool dm_persistent_data dm_bio_prison 
crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic 
ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 virtio_scsi bochs 
serio_raw drm_vram_helper xhci_pci xhci_pci_renesas floppy drm_ttm_helper 
xhci_hcd ttm ata_generic pata_acpi qemu_fw_cfg virtio_console xen_privcmd 
xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn scsi_dh_rdac 
scsi_dh_emc scsi_dh_alua uinput dm_multipath
[   84.019753] CPU: 1 PID: 2432 Comm: qrexec-client Not tainted 
6.7.0-0.rc3.1.qubes.1.fc37.x86_64 #1
[   84.027045] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014
[   84.033828] RIP: e030:evtchn_interrupt+0xb6/0xc0 [xen_evtchn]
[   84.037814] Code: ba 01 00 00 00 be 1d 00 00 00 48 8d bb 88 00 00 00 e8 ce 
93 27 c1 eb b4 8b 76 20 48 89 da 48 c7 c7 70 52 21 c0 e8 0a 5c f0 c0 <0f> 0b e9 
61 ff ff ff 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90
[   84.048932] RSP: e02b:c90041e7fd60 EFLAGS: 00010082
[   84.051252] RAX:  RBX: 888104fc9a80 RCX: 0027
[   84.054569] RDX: 88813ff21608 RSI: 0001 RDI: 88813ff21600
[   84.057606] RBP: 88810522a280 R08:  R09: c90041e7fbf8
[   84.060864] R10: 0003 R11: 82f460c8 R12: 888105d5b6a4
[   84.064043] R13: 888105d5b760 R14: 88810522a280 R15: 888105d5b600
[   84.066598] FS:  75ee3eb8ed40() GS:88813ff0() 
knlGS:
[   84.069251] CS:  1e030 DS:  ES:  CR0: 80050033
[   84.071293] CR2: 7b86fa0f0f5c CR3: 000106e18000 CR4: 00040660
[   84.073911] Call Trace:
[   84.074746]  
[   84.075471]  ? evtchn_interrupt+0xb6/0xc0 [xen_evtchn]
[   84.077448]  ? __warn+0x81/0x130
[   84.078671]  ? evtchn_interrupt+0xb6/0xc0 [xen_evtchn]
[   84.080507]  ? report_bug+0x171/0x1a0
[   84.081830]  ? prb_read_valid+0x1b/0x30
[   84.083682]  ? handle_bug+0x41/0x70
[   84.084952]  ? exc_invalid_op+0x17/0x70
[   84.086314]  ? asm_exc_invalid_op+0x1a/0x20
[   84.088520]  ? evtchn_interrupt+0xb6/0xc0 [xen_evtchn]
[   84.090300]  __free_irq+0x114/0x2b0
[   84.091478]  free_irq+0x32/0x70
[   84.092522]  unbind_from_irqhandler+0x31/0xb0
[   84.094035]  evtchn_release+0x2b/0xa0 [xen_evtchn]
[   84.095643]  __fput+0x9a/0x2c0
[   84.096734]  __x64_sys_close+0x3d/0x80
[   84.097995]  do_syscall_64+0x63/0xe0
[   84.099188]  ? __vm_munmap+0xbb/0x150
[   84.100431]  ? syscall_exit_work+0x103/0x130
[   84.101854]  ? syscall_exit_to_user_mode+0x2b/0x40
[   84.103509]  ? do_syscall_64+0x6f/0xe0
[   84.104757]  ? syscall_exit_to_user_mode+0x2b/0x40
[   84.106350]  ? do_syscall_64+0x6f/0xe0
[   84.107587]  ? exit_to_user_mode_prepare+0xbc/0xd0
[   84.109180]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[   84.110863] RIP: 0033:0x75ee3ed15a0a
[   84.111999] Code: 00 00 0f 05 48 3d 00 f0 ff ff 77 44 c3 0f 1f 00 48 83 ec 
18 89 7c 24 0c e8 33 bc f8 ff 8b 7c 24 0c 89 c2 b8 03 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 36 89 d7 89 44 24 0c e8 93 bc f8 ff 8b 44 24
[   84.118267] RSP: 002b:70ad0f70 EFLAGS: 0293 ORIG_RAX: 
0003
[   84.120756] RAX: ffda RBX: 55d3d8f70970 RCX: 75ee3ed15a0a
[   84.123675] RDX:  RSI:  RDI: 000f
[   84.126012] RBP:  R08:  R09: 
[   84.128366] R10: 75ee3ec31468 R11: 0293 R12: 
[   84.130699] R13:  R14: 70ad2eaa R15: 55d3d8f6f2a0
[   84.133050]  
[   84.133810] ---[ end trace  ]---

Full log: https://openqa.qubes-os.org/tests/86647/logfile?filename=serial0.txt

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature