Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-03-08 Thread Oliver Neukum
Am Montag, den 08.03.2021, 09:50 +0100 schrieb Bruno Thomsen:
> 
> Tested-by: Bruno Thomsen 
> 
> I have not observed any oops with patches applied. Patches have seen
> more than 10 weeks of runtime testing across multiple devices.

Hi,

that is good news. I shall send them upstream.

Regards
Oliver




Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-03-08 Thread Bruno Thomsen
Den fre. 26. feb. 2021 kl. 15.14 skrev Bruno Thomsen :
>
> Den tor. 25. feb. 2021 kl. 10.57 skrev Oliver Neukum :
> >
> > Am Mittwoch, den 24.02.2021, 16:21 +0100 schrieb Bruno Thomsen:
> >
> > Hi,
> >
> > > No, this is not a regression from 5.10. It seems that many attempts to
> > > fix cdc-acm in the 5.x kernel series have failed to fix the root cause of
> > > these oops. I have not seen this on 4.14 and 4.19, but I have observed
> > > it on at least 5.3 and newer kernels in slight variations.
> > > I guess this is because cdc-acm is very common in the embedded
> > > ARM world and rarely used on servers or laptops. Combined with
> > > ARM devices still commonly use 4.x LTS kernels. Not sure if
> > > hardening options on the kernel has increased change of reproducing
> > > oops.
> >
> > OK, so this is not an additional problem.
> > According to your logs, an URB that should have been killed wasn't.
>
> Thanks for looking into this bug rapport.
>
> > > I am ready to test new patches and will continue to report oops
> >
> > Could you test the attached patches?
>
> Yes, I am already running tests on the patches.
> I have not seen any oops yet and it seems the USB cdc-acm driver is still
> working as intended.
>
> The only notable trace I have seen is this new error from the cdc-acm driver
> but everything kept on working.
> kernel: cdc_acm 1-1.1:1.7: acm_start_wb - usb_submit_urb(write bulk) failed: 
> -19
>
> Other then that I see this common error (should probably be a warning) during
> device enumeration:
> kernel: cdc_acm 1-1.2:1.0: failed to set dtr/rts
>
> I will post an update next week when the patches have survived some
> more runtime.

Tested-by: Bruno Thomsen 

I have not observed any oops with patches applied. Patches have seen
more than 10 weeks of runtime testing across multiple devices.

/Bruno


Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-02-26 Thread Bruno Thomsen
Den tor. 25. feb. 2021 kl. 10.57 skrev Oliver Neukum :
>
> Am Mittwoch, den 24.02.2021, 16:21 +0100 schrieb Bruno Thomsen:
>
> Hi,
>
> > No, this is not a regression from 5.10. It seems that many attempts to
> > fix cdc-acm in the 5.x kernel series have failed to fix the root cause of
> > these oops. I have not seen this on 4.14 and 4.19, but I have observed
> > it on at least 5.3 and newer kernels in slight variations.
> > I guess this is because cdc-acm is very common in the embedded
> > ARM world and rarely used on servers or laptops. Combined with
> > ARM devices still commonly use 4.x LTS kernels. Not sure if
> > hardening options on the kernel has increased change of reproducing
> > oops.
>
> OK, so this is not an additional problem.
> According to your logs, an URB that should have been killed wasn't.

Thanks for looking into this bug rapport.

> > I am ready to test new patches and will continue to report oops
>
> Could you test the attached patches?

Yes, I am already running tests on the patches.
I have not seen any oops yet and it seems the USB cdc-acm driver is still
working as intended.

The only notable trace I have seen is this new error from the cdc-acm driver
but everything kept on working.
kernel: cdc_acm 1-1.1:1.7: acm_start_wb - usb_submit_urb(write bulk) failed: -19

Other then that I see this common error (should probably be a warning) during
device enumeration:
kernel: cdc_acm 1-1.2:1.0: failed to set dtr/rts

I will post an update next week when the patches have survived some
more runtime.

/Bruno


Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-02-25 Thread Oliver Neukum
Am Mittwoch, den 24.02.2021, 16:21 +0100 schrieb Bruno Thomsen:

Hi,

> No, this is not a regression from 5.10. It seems that many attempts to
> fix cdc-acm in the 5.x kernel series have failed to fix the root cause of
> these oops. I have not seen this on 4.14 and 4.19, but I have observed
> it on at least 5.3 and newer kernels in slight variations.
> I guess this is because cdc-acm is very common in the embedded
> ARM world and rarely used on servers or laptops. Combined with
> ARM devices still commonly use 4.x LTS kernels. Not sure if
> hardening options on the kernel has increased change of reproducing
> oops.

OK, so this is not an additional problem.
According to your logs, an URB that should have been killed wasn't.

> I am ready to test new patches and will continue to report oops

Could you test the attached patches?

Regards
Oliver

From 307097e80657ca44ac99da8efc8397070b1aff3f Mon Sep 17 00:00:00 2001
From: Oliver Neukum 
Date: Thu, 18 Feb 2021 13:42:40 +0100
Subject: [PATCH 1/2] cdc-wdm: untangle a circular dependency between callback
 and softint

We have a cycle of callbacks scheduling works which submit
URBs with thos callbacks. This needs to be blocked, stopped
and unblocked to untangle the circle..

Signed-off-by: Oliver Neukum 
---
 drivers/usb/class/cdc-wdm.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 508b1c3f8b73..d1e4a7379beb 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -321,12 +321,23 @@ static void wdm_int_callback(struct urb *urb)
 
 }
 
-static void kill_urbs(struct wdm_device *desc)
+static void poison_urbs(struct wdm_device *desc)
 {
 	/* the order here is essential */
-	usb_kill_urb(desc->command);
-	usb_kill_urb(desc->validity);
-	usb_kill_urb(desc->response);
+	usb_poison_urb(desc->command);
+	usb_poison_urb(desc->validity);
+	usb_poison_urb(desc->response);
+}
+
+static void unpoison_urbs(struct wdm_device *desc)
+{
+	/*
+	 *  the order here is not essential
+	 *  it is symmetrical just to be nice
+	 */
+	usb_unpoison_urb(desc->response);
+	usb_unpoison_urb(desc->validity);
+	usb_unpoison_urb(desc->command);
 }
 
 static void free_urbs(struct wdm_device *desc)
@@ -741,11 +752,12 @@ static int wdm_release(struct inode *inode, struct file *file)
 	if (!desc->count) {
 		if (!test_bit(WDM_DISCONNECTING, >flags)) {
 			dev_dbg(>intf->dev, "wdm_release: cleanup\n");
-			kill_urbs(desc);
+			poison_urbs(desc);
 			spin_lock_irq(>iuspin);
 			desc->resp_count = 0;
 			spin_unlock_irq(>iuspin);
 			desc->manage_power(desc->intf, 0);
+			unpoison_urbs(desc);
 		} else {
 			/* must avoid dev_printk here as desc->intf is invalid */
 			pr_debug(KBUILD_MODNAME " %s: device gone - cleaning up\n", __func__);
@@ -1037,9 +1049,9 @@ static void wdm_disconnect(struct usb_interface *intf)
 	wake_up_all(>wait);
 	mutex_lock(>rlock);
 	mutex_lock(>wlock);
+	poison_urbs(desc);
 	cancel_work_sync(>rxwork);
 	cancel_work_sync(>service_outs_intr);
-	kill_urbs(desc);
 	mutex_unlock(>wlock);
 	mutex_unlock(>rlock);
 
@@ -1080,9 +1092,10 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message)
 		set_bit(WDM_SUSPENDING, >flags);
 		spin_unlock_irq(>iuspin);
 		/* callback submits work - order is essential */
-		kill_urbs(desc);
+		poison_urbs(desc);
 		cancel_work_sync(>rxwork);
 		cancel_work_sync(>service_outs_intr);
+		unpoison_urbs(desc);
 	}
 	if (!PMSG_IS_AUTO(message)) {
 		mutex_unlock(>wlock);
@@ -1140,7 +1153,7 @@ static int wdm_pre_reset(struct usb_interface *intf)
 	wake_up_all(>wait);
 	mutex_lock(>rlock);
 	mutex_lock(>wlock);
-	kill_urbs(desc);
+	poison_urbs(desc);
 	cancel_work_sync(>rxwork);
 	cancel_work_sync(>service_outs_intr);
 	return 0;
@@ -1151,6 +1164,7 @@ static int wdm_post_reset(struct usb_interface *intf)
 	struct wdm_device *desc = wdm_find_device(intf);
 	int rv;
 
+	unpoison_urbs(desc);
 	clear_bit(WDM_OVERFLOW, >flags);
 	clear_bit(WDM_RESETTING, >flags);
 	rv = recover_from_urb_loss(desc);
-- 
2.26.2

From 3eeb644af140174ebad6ddce5526bcaf42ccd9c9 Mon Sep 17 00:00:00 2001
From: Oliver Neukum 
Date: Thu, 18 Feb 2021 13:52:28 +0100
Subject: [PATCH 2/2] cdc-acm: untangle a circular dependency between callback
 and softint

We have a cycle of callbacks scheduling works which submit
URBs with thos callbacks. This needs to be blocked, stopped
and unblocked to untangle the circle.

Signed-off-by: Oliver Neukum 
---
 drivers/usb/class/cdc-acm.c | 41 +
 1 file changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 781905745812..235fd1f654a4 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -147,17 +147,29 @@ static inline int acm_set_control(struct acm *acm, int control)
 #define acm_send_break(acm, ms) \
 	acm_ctrl_msg(acm, USB_CDC_REQ_SEND_BREAK, ms, 

Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-02-24 Thread Bruno Thomsen
Den man. 22. feb. 2021 kl. 10.36 skrev Oliver Neukum :
>
> Am Donnerstag, den 18.02.2021, 16:52 +0100 schrieb Bruno Thomsen:
> > Den fre. 12. feb. 2021 kl. 16.33 skrev Bruno Thomsen 
> > :
> > > Hi,
> > >
> > > I have been experience random kernel oops in the cdc-acm driver on
> > > imx7 (arm arch). Normally it happens during the first 1-3min runtime
> > > after power-on. Below oops is from 5.8.17 mainline kernel with an
> > > extra patch back-ported in an attempt to fix it:
> > > 38203b8385 ("usb: cdc-acm: fix cooldown mechanism")
> >
> > I can now boot board with 5.11 kernel without any extra patches and
> > it produce similar issue. Hopefully that make the oops more useful.
> > Issue has been seen on multiple devices, so I don't think it's a bad
> > hardware issue.
>
> is this a regression from 5.10?

Hi Oliver

No, this is not a regression from 5.10. It seems that many attempts to
fix cdc-acm in the 5.x kernel series have failed to fix the root cause of
these oops. I have not seen this on 4.14 and 4.19, but I have observed
it on at least 5.3 and newer kernels in slight variations.
I guess this is because cdc-acm is very common in the embedded
ARM world and rarely used on servers or laptops. Combined with
ARM devices still commonly use 4.x LTS kernels. Not sure if
hardening options on the kernel has increased change of reproducing
oops.

I am ready to test new patches and will continue to report oops

/Bruno


Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-02-22 Thread Oliver Neukum
Am Donnerstag, den 18.02.2021, 16:52 +0100 schrieb Bruno Thomsen:
> Den fre. 12. feb. 2021 kl. 16.33 skrev Bruno Thomsen 
> :
> > Hi,
> > 
> > I have been experience random kernel oops in the cdc-acm driver on
> > imx7 (arm arch). Normally it happens during the first 1-3min runtime
> > after power-on. Below oops is from 5.8.17 mainline kernel with an
> > extra patch back-ported in an attempt to fix it:
> > 38203b8385 ("usb: cdc-acm: fix cooldown mechanism")
> 
> I can now boot board with 5.11 kernel without any extra patches and
> it produce similar issue. Hopefully that make the oops more useful.
> Issue has been seen on multiple devices, so I don't think it's a bad
> hardware issue.

Hi,

is this a regression from 5.10?

Regards
Oliver




Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten

2021-02-18 Thread Bruno Thomsen
Den fre. 12. feb. 2021 kl. 16.33 skrev Bruno Thomsen :
>
> Hi,
>
> I have been experience random kernel oops in the cdc-acm driver on
> imx7 (arm arch). Normally it happens during the first 1-3min runtime
> after power-on. Below oops is from 5.8.17 mainline kernel with an
> extra patch back-ported in an attempt to fix it:
> 38203b8385 ("usb: cdc-acm: fix cooldown mechanism")

I can now boot board with 5.11 kernel without any extra patches and
it produce similar issue. Hopefully that make the oops more useful.
Issue has been seen on multiple devices, so I don't think it's a bad
hardware issue.

[ 76.458010] 8<--- cut here ---
[ 76.461178] Unable to handle kernel paging request at virtual address 6b6b6b93
[ 76.472958] pgd = f805d813
[ 76.475788] [6b6b6b93] *pgd=
[ 76.488068] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[ 76.493441] Modules linked in: xt_TCPMSS xt_tcpmss xt_hl nf_log_ipv6
nf_log_ipv4 nf_log_common xt_policy xt_limit xt_conntrack xt_tcpudp
xt_pkttype ip6table_mangle iptable_nat nf_nat nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ip6table_filter
ip6_tables iptable_filter ip_tables des_generic md5 sch_fq_codel
cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii cdc_acm usb_storage
ip_tunnel xfrm_user xfrm6_tunnel tunnel6 xfrm4_tunnel tunnel4 esp6
esp4 ah6 ah4 xfrm_algo xt_LOG xt_LED xt_comment x_tables ipv6
[ 76.539032] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G T 5.11.0 #1
[ 76.546539] Hardware name: Freescale i.MX7 Dual (Device Tree)
[ 76.552295] Workqueue: events acm_softint [cdc_acm]
[ 76.557223] PC is at usb_kill_urb+0x8/0x24
[ 76.561337] LR is at acm_softint+0x4c/0x10c [cdc_acm]
[ 76.566415] pc : [<805911a8>] lr : [<7f1168c4>] psr: 200e0113
[ 76.572689] sp : 84113f08 ip : 8575de7c fp : 840e92bc
[ 76.577920] r10:  r9 : 893222a8 r8 : 89322008
[ 76.583151] r7 : 89322000 r6 : 89322438 r5 : 89322448 r4 : 000a
[ 76.589686] r3 : 6b6b6b6b r2 : 12d79029 r1 : 800e0113 r0 : 6b6b6b6b
[ 76.596223] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 76.603369] Control: 10c5387d Table: 8933406a DAC: 0051
[ 76.609120] Process kworker/0:0 (pid: 5, stack limit = 0x8fb8cf7e)
[ 76.615315] Stack: (0x84113f08 to 0x84114000)
[ 76.619685] 3f00: 89322448 840e9280 bf6caf40 bf6ce100  
[ 76.627875] 3f20:  8013f14c 84112000 bf6caf40 bf6caf58
840e9280 bf6caf40 840e9294
[ 76.636065] 3f40: bf6caf58 80c03d00 0008 84112000 bf6caf40
8013f3e8  80d0bbb0
[ 76.644255] 3f60:  840a7640 840a77c0 84112000 840ede7c
8013f3a4 840e9280 840a7664
[ 76.652445] 3f80:  801467d8  840a77c0 80146694
  
[ 76.660634] 3fa0:    80100150 
  
[ 76.668823] 3fc0:     
  
[ 76.677012] 3fe0:     0013
  
[ 76.685203] [<805911a8>] (usb_kill_urb) from [<7f1168c4>]
(acm_softint+0x4c/0x10c [cdc_acm])
[ 76.693690] [<7f1168c4>] (acm_softint [cdc_acm]) from [<8013f14c>]
(process_one_work+0x1bc/0x414)
[ 76.702605] [<8013f14c>] (process_one_work) from [<8013f3e8>]
(worker_thread+0x44/0x4dc)
[ 76.710719] [<8013f3e8>] (worker_thread) from [<801467d8>]
(kthread+0x144/0x180)
[ 76.718139] [<801467d8>] (kthread) from [<80100150>] (ret_from_fork+0x14/0x24)
[ 76.725380] Exception stack(0x84113fb0 to 0x84113ff8)
[ 76.730443] 3fa0:    
[ 76.738632] 3fc0:     
  
[ 76.746819] 3fe0:     0013 
[ 76.753448] Code: eae0 eb081505 e2503000 012fff1e (e5932028)
[ 76.761647] ---[ end trace 05b398f82b2a04b9 ]---