Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Am Montag, den 08.03.2021, 09:50 +0100 schrieb Bruno Thomsen: > > Tested-by: Bruno Thomsen > > I have not observed any oops with patches applied. Patches have seen > more than 10 weeks of runtime testing across multiple devices. Hi, that is good news. I shall send them upstream. Regards Oliver
Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Den fre. 26. feb. 2021 kl. 15.14 skrev Bruno Thomsen : > > Den tor. 25. feb. 2021 kl. 10.57 skrev Oliver Neukum : > > > > Am Mittwoch, den 24.02.2021, 16:21 +0100 schrieb Bruno Thomsen: > > > > Hi, > > > > > No, this is not a regression from 5.10. It seems that many attempts to > > > fix cdc-acm in the 5.x kernel series have failed to fix the root cause of > > > these oops. I have not seen this on 4.14 and 4.19, but I have observed > > > it on at least 5.3 and newer kernels in slight variations. > > > I guess this is because cdc-acm is very common in the embedded > > > ARM world and rarely used on servers or laptops. Combined with > > > ARM devices still commonly use 4.x LTS kernels. Not sure if > > > hardening options on the kernel has increased change of reproducing > > > oops. > > > > OK, so this is not an additional problem. > > According to your logs, an URB that should have been killed wasn't. > > Thanks for looking into this bug rapport. > > > > I am ready to test new patches and will continue to report oops > > > > Could you test the attached patches? > > Yes, I am already running tests on the patches. > I have not seen any oops yet and it seems the USB cdc-acm driver is still > working as intended. > > The only notable trace I have seen is this new error from the cdc-acm driver > but everything kept on working. > kernel: cdc_acm 1-1.1:1.7: acm_start_wb - usb_submit_urb(write bulk) failed: > -19 > > Other then that I see this common error (should probably be a warning) during > device enumeration: > kernel: cdc_acm 1-1.2:1.0: failed to set dtr/rts > > I will post an update next week when the patches have survived some > more runtime. Tested-by: Bruno Thomsen I have not observed any oops with patches applied. Patches have seen more than 10 weeks of runtime testing across multiple devices. /Bruno
Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Den tor. 25. feb. 2021 kl. 10.57 skrev Oliver Neukum : > > Am Mittwoch, den 24.02.2021, 16:21 +0100 schrieb Bruno Thomsen: > > Hi, > > > No, this is not a regression from 5.10. It seems that many attempts to > > fix cdc-acm in the 5.x kernel series have failed to fix the root cause of > > these oops. I have not seen this on 4.14 and 4.19, but I have observed > > it on at least 5.3 and newer kernels in slight variations. > > I guess this is because cdc-acm is very common in the embedded > > ARM world and rarely used on servers or laptops. Combined with > > ARM devices still commonly use 4.x LTS kernels. Not sure if > > hardening options on the kernel has increased change of reproducing > > oops. > > OK, so this is not an additional problem. > According to your logs, an URB that should have been killed wasn't. Thanks for looking into this bug rapport. > > I am ready to test new patches and will continue to report oops > > Could you test the attached patches? Yes, I am already running tests on the patches. I have not seen any oops yet and it seems the USB cdc-acm driver is still working as intended. The only notable trace I have seen is this new error from the cdc-acm driver but everything kept on working. kernel: cdc_acm 1-1.1:1.7: acm_start_wb - usb_submit_urb(write bulk) failed: -19 Other then that I see this common error (should probably be a warning) during device enumeration: kernel: cdc_acm 1-1.2:1.0: failed to set dtr/rts I will post an update next week when the patches have survived some more runtime. /Bruno
Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Am Mittwoch, den 24.02.2021, 16:21 +0100 schrieb Bruno Thomsen: Hi, > No, this is not a regression from 5.10. It seems that many attempts to > fix cdc-acm in the 5.x kernel series have failed to fix the root cause of > these oops. I have not seen this on 4.14 and 4.19, but I have observed > it on at least 5.3 and newer kernels in slight variations. > I guess this is because cdc-acm is very common in the embedded > ARM world and rarely used on servers or laptops. Combined with > ARM devices still commonly use 4.x LTS kernels. Not sure if > hardening options on the kernel has increased change of reproducing > oops. OK, so this is not an additional problem. According to your logs, an URB that should have been killed wasn't. > I am ready to test new patches and will continue to report oops Could you test the attached patches? Regards Oliver From 307097e80657ca44ac99da8efc8397070b1aff3f Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Thu, 18 Feb 2021 13:42:40 +0100 Subject: [PATCH 1/2] cdc-wdm: untangle a circular dependency between callback and softint We have a cycle of callbacks scheduling works which submit URBs with thos callbacks. This needs to be blocked, stopped and unblocked to untangle the circle.. Signed-off-by: Oliver Neukum --- drivers/usb/class/cdc-wdm.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c index 508b1c3f8b73..d1e4a7379beb 100644 --- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -321,12 +321,23 @@ static void wdm_int_callback(struct urb *urb) } -static void kill_urbs(struct wdm_device *desc) +static void poison_urbs(struct wdm_device *desc) { /* the order here is essential */ - usb_kill_urb(desc->command); - usb_kill_urb(desc->validity); - usb_kill_urb(desc->response); + usb_poison_urb(desc->command); + usb_poison_urb(desc->validity); + usb_poison_urb(desc->response); +} + +static void unpoison_urbs(struct wdm_device *desc) +{ + /* + * the order here is not essential + * it is symmetrical just to be nice + */ + usb_unpoison_urb(desc->response); + usb_unpoison_urb(desc->validity); + usb_unpoison_urb(desc->command); } static void free_urbs(struct wdm_device *desc) @@ -741,11 +752,12 @@ static int wdm_release(struct inode *inode, struct file *file) if (!desc->count) { if (!test_bit(WDM_DISCONNECTING, >flags)) { dev_dbg(>intf->dev, "wdm_release: cleanup\n"); - kill_urbs(desc); + poison_urbs(desc); spin_lock_irq(>iuspin); desc->resp_count = 0; spin_unlock_irq(>iuspin); desc->manage_power(desc->intf, 0); + unpoison_urbs(desc); } else { /* must avoid dev_printk here as desc->intf is invalid */ pr_debug(KBUILD_MODNAME " %s: device gone - cleaning up\n", __func__); @@ -1037,9 +1049,9 @@ static void wdm_disconnect(struct usb_interface *intf) wake_up_all(>wait); mutex_lock(>rlock); mutex_lock(>wlock); + poison_urbs(desc); cancel_work_sync(>rxwork); cancel_work_sync(>service_outs_intr); - kill_urbs(desc); mutex_unlock(>wlock); mutex_unlock(>rlock); @@ -1080,9 +1092,10 @@ static int wdm_suspend(struct usb_interface *intf, pm_message_t message) set_bit(WDM_SUSPENDING, >flags); spin_unlock_irq(>iuspin); /* callback submits work - order is essential */ - kill_urbs(desc); + poison_urbs(desc); cancel_work_sync(>rxwork); cancel_work_sync(>service_outs_intr); + unpoison_urbs(desc); } if (!PMSG_IS_AUTO(message)) { mutex_unlock(>wlock); @@ -1140,7 +1153,7 @@ static int wdm_pre_reset(struct usb_interface *intf) wake_up_all(>wait); mutex_lock(>rlock); mutex_lock(>wlock); - kill_urbs(desc); + poison_urbs(desc); cancel_work_sync(>rxwork); cancel_work_sync(>service_outs_intr); return 0; @@ -1151,6 +1164,7 @@ static int wdm_post_reset(struct usb_interface *intf) struct wdm_device *desc = wdm_find_device(intf); int rv; + unpoison_urbs(desc); clear_bit(WDM_OVERFLOW, >flags); clear_bit(WDM_RESETTING, >flags); rv = recover_from_urb_loss(desc); -- 2.26.2 From 3eeb644af140174ebad6ddce5526bcaf42ccd9c9 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Thu, 18 Feb 2021 13:52:28 +0100 Subject: [PATCH 2/2] cdc-acm: untangle a circular dependency between callback and softint We have a cycle of callbacks scheduling works which submit URBs with thos callbacks. This needs to be blocked, stopped and unblocked to untangle the circle. Signed-off-by: Oliver Neukum --- drivers/usb/class/cdc-acm.c | 41 + 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c index 781905745812..235fd1f654a4 100644 --- a/drivers/usb/class/cdc-acm.c +++ b/drivers/usb/class/cdc-acm.c @@ -147,17 +147,29 @@ static inline int acm_set_control(struct acm *acm, int control) #define acm_send_break(acm, ms) \ acm_ctrl_msg(acm, USB_CDC_REQ_SEND_BREAK, ms,
Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Den man. 22. feb. 2021 kl. 10.36 skrev Oliver Neukum : > > Am Donnerstag, den 18.02.2021, 16:52 +0100 schrieb Bruno Thomsen: > > Den fre. 12. feb. 2021 kl. 16.33 skrev Bruno Thomsen > > : > > > Hi, > > > > > > I have been experience random kernel oops in the cdc-acm driver on > > > imx7 (arm arch). Normally it happens during the first 1-3min runtime > > > after power-on. Below oops is from 5.8.17 mainline kernel with an > > > extra patch back-ported in an attempt to fix it: > > > 38203b8385 ("usb: cdc-acm: fix cooldown mechanism") > > > > I can now boot board with 5.11 kernel without any extra patches and > > it produce similar issue. Hopefully that make the oops more useful. > > Issue has been seen on multiple devices, so I don't think it's a bad > > hardware issue. > > is this a regression from 5.10? Hi Oliver No, this is not a regression from 5.10. It seems that many attempts to fix cdc-acm in the 5.x kernel series have failed to fix the root cause of these oops. I have not seen this on 4.14 and 4.19, but I have observed it on at least 5.3 and newer kernels in slight variations. I guess this is because cdc-acm is very common in the embedded ARM world and rarely used on servers or laptops. Combined with ARM devices still commonly use 4.x LTS kernels. Not sure if hardening options on the kernel has increased change of reproducing oops. I am ready to test new patches and will continue to report oops /Bruno
Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Am Donnerstag, den 18.02.2021, 16:52 +0100 schrieb Bruno Thomsen: > Den fre. 12. feb. 2021 kl. 16.33 skrev Bruno Thomsen > : > > Hi, > > > > I have been experience random kernel oops in the cdc-acm driver on > > imx7 (arm arch). Normally it happens during the first 1-3min runtime > > after power-on. Below oops is from 5.8.17 mainline kernel with an > > extra patch back-ported in an attempt to fix it: > > 38203b8385 ("usb: cdc-acm: fix cooldown mechanism") > > I can now boot board with 5.11 kernel without any extra patches and > it produce similar issue. Hopefully that make the oops more useful. > Issue has been seen on multiple devices, so I don't think it's a bad > hardware issue. Hi, is this a regression from 5.10? Regards Oliver
Re: usb: cdc-acm: BUG kmalloc-128 Poison overwritten
Den fre. 12. feb. 2021 kl. 16.33 skrev Bruno Thomsen : > > Hi, > > I have been experience random kernel oops in the cdc-acm driver on > imx7 (arm arch). Normally it happens during the first 1-3min runtime > after power-on. Below oops is from 5.8.17 mainline kernel with an > extra patch back-ported in an attempt to fix it: > 38203b8385 ("usb: cdc-acm: fix cooldown mechanism") I can now boot board with 5.11 kernel without any extra patches and it produce similar issue. Hopefully that make the oops more useful. Issue has been seen on multiple devices, so I don't think it's a bad hardware issue. [ 76.458010] 8<--- cut here --- [ 76.461178] Unable to handle kernel paging request at virtual address 6b6b6b93 [ 76.472958] pgd = f805d813 [ 76.475788] [6b6b6b93] *pgd= [ 76.488068] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 76.493441] Modules linked in: xt_TCPMSS xt_tcpmss xt_hl nf_log_ipv6 nf_log_ipv4 nf_log_common xt_policy xt_limit xt_conntrack xt_tcpudp xt_pkttype ip6table_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle ip6table_filter ip6_tables iptable_filter ip_tables des_generic md5 sch_fq_codel cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet mii cdc_acm usb_storage ip_tunnel xfrm_user xfrm6_tunnel tunnel6 xfrm4_tunnel tunnel4 esp6 esp4 ah6 ah4 xfrm_algo xt_LOG xt_LED xt_comment x_tables ipv6 [ 76.539032] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G T 5.11.0 #1 [ 76.546539] Hardware name: Freescale i.MX7 Dual (Device Tree) [ 76.552295] Workqueue: events acm_softint [cdc_acm] [ 76.557223] PC is at usb_kill_urb+0x8/0x24 [ 76.561337] LR is at acm_softint+0x4c/0x10c [cdc_acm] [ 76.566415] pc : [<805911a8>] lr : [<7f1168c4>] psr: 200e0113 [ 76.572689] sp : 84113f08 ip : 8575de7c fp : 840e92bc [ 76.577920] r10: r9 : 893222a8 r8 : 89322008 [ 76.583151] r7 : 89322000 r6 : 89322438 r5 : 89322448 r4 : 000a [ 76.589686] r3 : 6b6b6b6b r2 : 12d79029 r1 : 800e0113 r0 : 6b6b6b6b [ 76.596223] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 76.603369] Control: 10c5387d Table: 8933406a DAC: 0051 [ 76.609120] Process kworker/0:0 (pid: 5, stack limit = 0x8fb8cf7e) [ 76.615315] Stack: (0x84113f08 to 0x84114000) [ 76.619685] 3f00: 89322448 840e9280 bf6caf40 bf6ce100 [ 76.627875] 3f20: 8013f14c 84112000 bf6caf40 bf6caf58 840e9280 bf6caf40 840e9294 [ 76.636065] 3f40: bf6caf58 80c03d00 0008 84112000 bf6caf40 8013f3e8 80d0bbb0 [ 76.644255] 3f60: 840a7640 840a77c0 84112000 840ede7c 8013f3a4 840e9280 840a7664 [ 76.652445] 3f80: 801467d8 840a77c0 80146694 [ 76.660634] 3fa0: 80100150 [ 76.668823] 3fc0: [ 76.677012] 3fe0: 0013 [ 76.685203] [<805911a8>] (usb_kill_urb) from [<7f1168c4>] (acm_softint+0x4c/0x10c [cdc_acm]) [ 76.693690] [<7f1168c4>] (acm_softint [cdc_acm]) from [<8013f14c>] (process_one_work+0x1bc/0x414) [ 76.702605] [<8013f14c>] (process_one_work) from [<8013f3e8>] (worker_thread+0x44/0x4dc) [ 76.710719] [<8013f3e8>] (worker_thread) from [<801467d8>] (kthread+0x144/0x180) [ 76.718139] [<801467d8>] (kthread) from [<80100150>] (ret_from_fork+0x14/0x24) [ 76.725380] Exception stack(0x84113fb0 to 0x84113ff8) [ 76.730443] 3fa0: [ 76.738632] 3fc0: [ 76.746819] 3fe0: 0013 [ 76.753448] Code: eae0 eb081505 e2503000 012fff1e (e5932028) [ 76.761647] ---[ end trace 05b398f82b2a04b9 ]---