[PATCH v2] r8152: fix lockup when runtime PM is enabled
When an interface is brought up which was previously suspended (via runtime PM), it would hang. This happens because napi_disable is called before napi_enable. Solve this by avoiding napi_enable in the resume during open function (netif_running is true when open is called, IFF_UP is set after a successful open; netif_running is false when close is called, but IFF_UP is then still set). While at it, remove WORK_ENABLE check from rtl8152_open (introduced with the original change) because it cannot happen: - After this patch, runtime resume will not set it during rtl8152_open. - When link is up, rtl8152_open is not called. - When link is down during system/auto suspend/resume, it is not set. Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready") Link: https://lkml.kernel.org/r/20151205105912.GA1766@al Signed-off-by: Peter Wu <pe...@lekensteyn.nl> --- v2: moved rtl_runtime_suspend_enable from close to rtl8152_suspend Tested with the scenario from https://lkml.kernel.org/r/20151208111008.GA18728@al (added debug prints to verify the suspend/resume moments) --- drivers/net/usb/r8152.c | 21 +++-- 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index d9427ca..2e32c41 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3067,17 +3067,6 @@ static int rtl8152_open(struct net_device *netdev) mutex_lock(>control); - /* The WORK_ENABLE may be set when autoresume occurs */ - if (test_bit(WORK_ENABLE, >flags)) { - clear_bit(WORK_ENABLE, >flags); - usb_kill_urb(tp->intr_urb); - cancel_delayed_work_sync(>schedule); - - /* disable the tx/rx, if the workqueue has enabled them. */ - if (netif_carrier_ok(netdev)) - tp->rtl_ops.disable(tp); - } - tp->rtl_ops.up(tp); rtl8152_set_speed(tp, AUTONEG_ENABLE, @@ -3124,12 +3113,6 @@ static int rtl8152_close(struct net_device *netdev) } else { mutex_lock(>control); - /* The autosuspend may have been enabled and wouldn't -* be disable when autoresume occurs, because the -* netif_running() would be false. -*/ - rtl_runtime_suspend_enable(tp, false); - tp->rtl_ops.down(tp); mutex_unlock(>control); @@ -3512,7 +3495,7 @@ static int rtl8152_resume(struct usb_interface *intf) netif_device_attach(tp->netdev); } - if (netif_running(tp->netdev)) { + if (netif_running(tp->netdev) && tp->netdev->flags & IFF_UP) { if (test_bit(SELECTIVE_SUSPEND, >flags)) { rtl_runtime_suspend_enable(tp, false); clear_bit(SELECTIVE_SUSPEND, >flags); @@ -3532,6 +3515,8 @@ static int rtl8152_resume(struct usb_interface *intf) } usb_submit_urb(tp->intr_urb, GFP_KERNEL); } else if (test_bit(SELECTIVE_SUSPEND, >flags)) { + if (tp->netdev->flags & IFF_UP) + rtl_runtime_suspend_enable(tp, false); clear_bit(SELECTIVE_SUSPEND, >flags); } -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] r8152: fix lockup when runtime PM is enabled
On Tue, Dec 08, 2015 at 03:18:59AM +, Hayes Wang wrote: > Peter Wu > > Sent: Tuesday, December 08, 2015 12:59 AM > [...] > > + if (tp->netdev->flags & IFF_UP) { > > Maybe you could just replace the checking of netif_running(tp->netdev) > with this. Simply replacing netif_running by IFF_UP does not work, it hangs during close when the device is suspended. This patch is correct, but I have a v2 patch that moves rtl_runtime_suspend_enable from close to suspend. This is the evaluated scenario (run = netif_running, up = IFF_UP set): # suspended before open suspend (run=0, up=0) open(run=1) resume (run=1, up=0) <-- fixed by patch (open ends) # while up suspend (run=1, up=1) resume (run=1, up=1) <-- no issue, values match suspend (run=1, up=1) # close while suspended close (run=0, up=1) resume (run=0, up=1) <-- fixed in patch v2 (close cont and ends) <-- rtl_runtime_suspend_enable removed in v2 suspend (run=0, up=0) # while down resume (run=0, up=0) suspend (run=0, up=0) # open while suspended, open fails open(run=1) resume (run=1, up=0) <-- fixed by patch (open fails) suspend (run=0, up=0) > Excuse me. I have a question. Before the open() is finished, the > netif_running() would be true, but the IFF_UP wouldn't be set. Is it > right? That is right, this happens behind the scenes: # open netif_running = true open() if open succeeded set IFF_UP else netif_running = false # close netif_running = false close() clear IFF_UP -- Kind regards, Peter Wu https://lekensteyn.nl -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] r8152: fix lockup when runtime PM is enabled
On Tue, Dec 08, 2015 at 12:39:12PM +, Hayes Wang wrote: > Peter Wu [mailto:pe...@lekensteyn.nl] > > Sent: Tuesday, December 08, 2015 7:18 PM > > > > When an interface is brought up which was previously suspended (via > > runtime PM), it would hang. This happens because napi_disable is called > > before napi_enable. > > > > Solve this by avoiding napi_enable in the resume during open function > > (netif_running is true when open is called, IFF_UP is set after a > > successful open; netif_running is false when close is called, but IFF_UP > > is then still set). > > > > While at it, remove WORK_ENABLE check from rtl8152_open (introduced with > > the original change) because it cannot happen: > > > > - After this patch, runtime resume will not set it during rtl8152_open. > > - When link is up, rtl8152_open is not called. > > - When link is down during system/auto suspend/resume, it is not set. > > > > Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready") > > Link: https://lkml.kernel.org/r/20151205105912.GA1766@al > > Signed-off-by: Peter Wu <pe...@lekensteyn.nl> > > --- > > v2: moved rtl_runtime_suspend_enable from close to rtl8152_suspend > > Acked-by: Hayes Wang <hayesw...@realtek.com> > > Best Regards, > Hayes I found another problem with runtime PM. When a device is suspended via autosuspend and a system suspend takes place, there is no network I/O after resume. Triggering a renegotiation (ethtool -r eth1) brings back network activity. Can you have a look? I guess that reset_resume needs different treatment, but don't know how to do it properly as suspend is not called before system reset (because the device is apparently already in suspended state). I think that this new issue can be addressed in a different patch, this patch solves a more severe lockup. Opinions? -- Kind regards, Peter Wu https://lekensteyn.nl -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] r8152: fix lockup when runtime PM is enabled
When an interface is brought up which was previously suspended (via runtime PM), it would hang. This happens because napi_disable is called before napi_enable. Solve this by avoiding napi_disable before the device is fully up. While at it, remove WORK_ENABLE check from rtl8152_open (introduced with the original change) because it cannot happen: - After this patch, runtime resume will not set it during rtl8152_open. - When link is up, rtl8152_open is not called. - When link is down during system/auto suspend/resume, it is not set. Fixes: 41cec84cf285 ("r8152: don't enable napi before rx ready") Link: https://lkml.kernel.org/r/20151205105912.GA1766@al Signed-off-by: Peter Wu <pe...@lekensteyn.nl> --- drivers/net/usb/r8152.c | 23 +++ 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index d9427ca..b8b083e 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -3067,17 +3067,6 @@ static int rtl8152_open(struct net_device *netdev) mutex_lock(>control); - /* The WORK_ENABLE may be set when autoresume occurs */ - if (test_bit(WORK_ENABLE, >flags)) { - clear_bit(WORK_ENABLE, >flags); - usb_kill_urb(tp->intr_urb); - cancel_delayed_work_sync(>schedule); - - /* disable the tx/rx, if the workqueue has enabled them. */ - if (netif_carrier_ok(netdev)) - tp->rtl_ops.disable(tp); - } - tp->rtl_ops.up(tp); rtl8152_set_speed(tp, AUTONEG_ENABLE, @@ -3516,11 +3505,13 @@ static int rtl8152_resume(struct usb_interface *intf) if (test_bit(SELECTIVE_SUSPEND, >flags)) { rtl_runtime_suspend_enable(tp, false); clear_bit(SELECTIVE_SUSPEND, >flags); - napi_disable(>napi); - set_bit(WORK_ENABLE, >flags); - if (netif_carrier_ok(tp->netdev)) - rtl_start_rx(tp); - napi_enable(>napi); + if (tp->netdev->flags & IFF_UP) { + napi_disable(>napi); + set_bit(WORK_ENABLE, >flags); + if (netif_carrier_ok(tp->netdev)) + rtl_start_rx(tp); + napi_enable(>napi); + } } else { tp->rtl_ops.up(tp); rtl8152_set_speed(tp, AUTONEG_ENABLE, -- 2.6.3 -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: (4.3.0) r8152: deadlock related to runtime suspend?
On Mon, Dec 07, 2015 at 05:11:50PM +0800, Lu Baolu wrote: > Hi Peter, > > Have you ever tried disabling auto-pm? Did things go smoothly if auto-pm is > disabled? > > I always disable usb auto-pm in below way. > > # echo on | tee /sys/bus/usb/devices/*/power/control > # echo on > /sys/bus/pci/devices//power/control > > Thanks, > Baolu Hi Baolu, The deadlock does not seem to occur with auto-PM disabled, but that is a workaround for the issue. The hang can always be reproduced under this test: - Start a QEMU VM, passing through the USB adapter - This VM boots to a busybox shell with no other services running or udev magic (to reduce interference). - Enable runtime PM for all devices by default (see script below) - From the console, invoke "ip link set eth1 up" (eth0 is a virtio adapter). # somewhere in /init after mounting filesystems echo /sbin/hotplug > /proc/sys/kernel/hotplug echo auto | tee /sys/bus/pci/devices/*/power/control \ /sys/bus/usb/devices/*/power/control >/dev/null #!/bin/sh # /sbin/hotplug path="/sys/$DEVPATH/power/control" [ -e "$path" ] || return newval=auto read status < "$path" if [ "x$status" != "x$newval" ]; then echo "$DEVPATH: $status -> $newval" >/dev/kmsg echo $newval > "$path" fi With "auto", the ip command hangs (a trace can be found on the bottom of this mail). With "on", it does not. If I keep a loop spinning that invokes `ethtool eth1`, the command returns immediately without issues (presumably because the device is not suspended through runtime PM). Under some circumstances I get a lockdep warning (when trying to bring an interface down if I remember correctly). Its trace can be found on the bottom of this mail. I'll keep testing. For the lockdep warning, my initial guess is that calling schedule_delayed_work_sync under tp->lock is a bad idea because scheduled work can execute and try to claim tp->lock too. Maybe there are two different lockup cases here, I'll keep testing. Kind regards, Peter > On 12/05/2015 06:59 PM, Peter Wu wrote: > > Hi, > > > > I rarely use a Realtek USB 3.0 Gigabit Ethernet adapter (vid/pid > > 0bda:8153), but when I did last night, it resulted in a lockup of > > processes doing networking ("ip link", "ping", "ethtool", ...). > > > > A (few) minute(s) before that event, I noticed that there was no network > > connectivity (ping hung) which was somehow solved by invoking "ethtool > > eth1" (triggering runtime pm wakeup?). This same trick did not work at > > the next event. Invoking "ethtool eth1", "ip link", etc. hung completely > > and interrupt (^C) did not work at all. > > > > Since that did not work, I pulled the USB adapter and re-inserted it, > > hoping it would reset things. That did not work at all, there was a > > "usb disconnect" message, but no further driver messages. > > > > Fast forward an hour, and it has become a disaster. I have terminated > > and killed many programs via SysRq but am still unable to get a stable > > system that does not hang on network I/O. Even the suspend process > > fails so in the end I attempted to shutdown the system. After half an > > hour after getting the poweroff message, I issued SysRq + B to reboot > > (since SysRq + O did not shut down either). > > > > Attached are logs with various various backtraces from SysRq and failed > > suspend. Let me know if you need more information! > > > > By the way, often I have to rmmod xhci and re-insert it, otherwise > > plugging it in does not result in a detection. A USB 2.0 port does not > > have this problem (runtime PM is enabled for all devices). This is the > > USB 3.0 port: > > > > 02:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 > > Host Controller [1033:0194] (rev 03) > -- lockdep splat from the bare machine: == [ INFO: possible circular locking dependency detected ] 4.3.0-custom #1 Tainted: G O --- kworker/0:1/38 is trying to acquire lock: (>control){+.+.+.}, at: [] rtl8152_resume+0x24/0x130 [r8152] but task is already holding lock: ((&(>schedule)->work)){+.+.+.}, at: [] process_one_work+0x15c/0x660 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 ((&(>schedule)->work)){+.+.+.}: [] lock_acquire+0xc3/0x1d0 [] flush_work+0x3d/0x290
Re: (4.3.0) r8152: deadlock related to runtime suspend?
On Mon, Dec 07, 2015 at 07:08:50PM +0800, Lu Baolu wrote: > > > On 12/07/2015 05:37 PM, Peter Wu wrote: > > On Mon, Dec 07, 2015 at 05:11:50PM +0800, Lu Baolu wrote: > >> Hi Peter, > >> > >> Have you ever tried disabling auto-pm? Did things go smoothly if auto-pm > >> is disabled? > >> > >> I always disable usb auto-pm in below way. > >> > >> # echo on | tee /sys/bus/usb/devices/*/power/control > >> # echo on > /sys/bus/pci/devices//power/control > >> > >> Thanks, > >> Baolu > > Hi Baolu, > > > > The deadlock does not seem to occur with auto-PM disabled, but that is a > > workaround for the issue. The hang can always be reproduced under this > > test: > > > > - Start a QEMU VM, passing through the USB adapter > > I would suggest you to start with bare metal. > > When you pass through the host controller to a guest VM, you > probably use IOMMU unit to let hardware access the memory > directly, but things like pci configure space access, interrupt and > IO port access still rely on QEMU. This introduces a lot of complexities. It is a USB device, not a PCI device, so such issues do not apply here I think. I have found a possible reason for this lockup. The resume code may execute napi_disable while napi_enable was not called before. This autoresume thing happens in the open function which explains why all other rtnl users are blocked. Is this a sane analysis? Kind regards, Peter > Thanks, > Baolu > > > - This VM boots to a busybox shell with no other services running or > >udev magic (to reduce interference). > > - Enable runtime PM for all devices by default (see script below) > > - From the console, invoke "ip link set eth1 up" (eth0 is a virtio > >adapter). > > > > # somewhere in /init after mounting filesystems > > echo /sbin/hotplug > /proc/sys/kernel/hotplug > > echo auto | tee /sys/bus/pci/devices/*/power/control \ > > /sys/bus/usb/devices/*/power/control >/dev/null > > > > #!/bin/sh > > # /sbin/hotplug > > path="/sys/$DEVPATH/power/control" > > [ -e "$path" ] || return > > newval=auto > > read status < "$path" > > if [ "x$status" != "x$newval" ]; then > > echo "$DEVPATH: $status -> $newval" >/dev/kmsg > > echo $newval > "$path" > > fi > > > > With "auto", the ip command hangs (a trace can be found on the bottom of > > this mail). With "on", it does not. > > > > If I keep a loop spinning that invokes `ethtool eth1`, the command > > returns immediately without issues (presumably because the device is not > > suspended through runtime PM). > > > > Under some circumstances I get a lockdep warning (when trying to bring > > an interface down if I remember correctly). Its trace can be found on > > the bottom of this mail. > > > > I'll keep testing. For the lockdep warning, my initial guess is that > > calling schedule_delayed_work_sync under tp->lock is a bad idea because > > scheduled work can execute and try to claim tp->lock too. > > > > Maybe there are two different lockup cases here, I'll keep testing. > > > > Kind regards, > > Peter > > > >> On 12/05/2015 06:59 PM, Peter Wu wrote: > >>> Hi, > >>> > >>> I rarely use a Realtek USB 3.0 Gigabit Ethernet adapter (vid/pid > >>> 0bda:8153), but when I did last night, it resulted in a lockup of > >>> processes doing networking ("ip link", "ping", "ethtool", ...). > >>> > >>> A (few) minute(s) before that event, I noticed that there was no network > >>> connectivity (ping hung) which was somehow solved by invoking "ethtool > >>> eth1" (triggering runtime pm wakeup?). This same trick did not work at > >>> the next event. Invoking "ethtool eth1", "ip link", etc. hung completely > >>> and interrupt (^C) did not work at all. > >>> > >>> Since that did not work, I pulled the USB adapter and re-inserted it, > >>> hoping it would reset things. That did not work at all, there was a > >>> "usb disconnect" message, but no further driver messages. > >>> > >>> Fast forward an hour, and it has become a disaster. I have terminated > >>> and killed many programs via SysRq but am still unable to get a stable > >>> s
(4.3.0) r8152: deadlock related to runtime suspend?
Hi, I rarely use a Realtek USB 3.0 Gigabit Ethernet adapter (vid/pid 0bda:8153), but when I did last night, it resulted in a lockup of processes doing networking ("ip link", "ping", "ethtool", ...). A (few) minute(s) before that event, I noticed that there was no network connectivity (ping hung) which was somehow solved by invoking "ethtool eth1" (triggering runtime pm wakeup?). This same trick did not work at the next event. Invoking "ethtool eth1", "ip link", etc. hung completely and interrupt (^C) did not work at all. Since that did not work, I pulled the USB adapter and re-inserted it, hoping it would reset things. That did not work at all, there was a "usb disconnect" message, but no further driver messages. Fast forward an hour, and it has become a disaster. I have terminated and killed many programs via SysRq but am still unable to get a stable system that does not hang on network I/O. Even the suspend process fails so in the end I attempted to shutdown the system. After half an hour after getting the poweroff message, I issued SysRq + B to reboot (since SysRq + O did not shut down either). Attached are logs with various various backtraces from SysRq and failed suspend. Let me know if you need more information! By the way, often I have to rmmod xhci and re-insert it, otherwise plugging it in does not result in a detection. A USB 2.0 port does not have this problem (runtime PM is enabled for all devices). This is the USB 3.0 port: 02:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev 03) -- Kind regards, Peter Wu https://lekensteyn.nl Dec 05 00:32:58 al kernel: usb 3-1: USB disconnect, device number 4 Dec 05 00:34:43 al NetworkManager[441]: Could not get scan request result: Timeout was reached Dec 05 00:35:45 al NetworkManager[441]: Could not get scan request result: Timeout was reached Dec 05 00:36:13 al kernel: INFO: task wpa_supplicant:573 blocked for more than 120 seconds. Dec 05 00:36:13 al kernel: Tainted: G O4.3.0-1-ARCH #1 Dec 05 00:36:13 al kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 05 00:36:13 al kernel: wpa_supplicant D 88023bc956c0 0 573 1 0x Dec 05 00:36:13 al kernel: 8800aca2fa48 0086 880232600c80 88022d101900 Dec 05 00:36:13 al kernel: 8800aca3 818b4504 88022d101900 Dec 05 00:36:13 al kernel: 818b4508 8800aca2fa60 8157a09a 818b4500 Dec 05 00:36:13 al kernel: Call Trace: Dec 05 00:36:13 al kernel: [] schedule+0x3a/0x90 Dec 05 00:36:13 al kernel: [] schedule_preempt_disabled+0x15/0x20 Dec 05 00:36:13 al kernel: [] __mutex_lock_slowpath+0xce/0x140 Dec 05 00:36:13 al kernel: [] mutex_lock+0x17/0x30 Dec 05 00:36:13 al kernel: [] rtnl_lock+0x15/0x20 Dec 05 00:36:13 al kernel: [] nl80211_pre_doit+0xeb/0x180 [cfg80211] Dec 05 00:36:13 al kernel: [] genl_family_rcv_msg+0x1c8/0x3c0 Dec 05 00:36:13 al kernel: [] ? genl_family_rcv_msg+0x3c0/0x3c0 Dec 05 00:36:13 al kernel: [] genl_rcv_msg+0x76/0xb0 Dec 05 00:36:13 al kernel: [] netlink_rcv_skb+0xa4/0xc0 Dec 05 00:36:13 al kernel: [] genl_rcv+0x28/0x40 Dec 05 00:36:13 al kernel: [] netlink_unicast+0x11a/0x1a0 Dec 05 00:36:13 al kernel: [] netlink_sendmsg+0x4f8/0x610 Dec 05 00:36:13 al kernel: [] sock_sendmsg+0x17/0x30 Dec 05 00:36:13 al kernel: [] ___sys_sendmsg+0x295/0x2b0 Dec 05 00:36:13 al kernel: [] ? should_numa_migrate_memory+0x54/0x130 Dec 05 00:36:13 al kernel: [] ? handle_mm_fault+0xb38/0x18a0 Dec 05 00:36:13 al kernel: [] __sys_sendmsg+0x51/0x90 Dec 05 00:36:13 al kernel: [] SyS_sendmsg+0x12/0x20 Dec 05 00:36:13 al kernel: [] entry_SYSCALL_64_fastpath+0x12/0x71 Dec 05 00:36:13 al kernel: INFO: task Qt bearer threa:8292 blocked for more than 120 seconds. Dec 05 00:36:13 al kernel: Tainted: G O4.3.0-1-ARCH #1 Dec 05 00:36:13 al kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Dec 05 00:36:13 al kernel: Qt bearer threa D 88023bd156c0 0 8292 8222 0x Dec 05 00:36:13 al kernel: 8800ac9bfc88 0086 880232601900 880214323200 Dec 05 00:36:13 al kernel: 8800ac9c 818b4504 880214323200 Dec 05 00:36:13 al kernel: 818b4508 8800ac9bfca0 8157a09a 818b4500 Dec 05 00:36:13 al kernel: Call Trace: Dec 05 00:36:13 al kernel: [] schedule+0x3a/0x90 Dec 05 00:36:13 al kernel: [] schedule_preempt_disabled+0x15/0x20 Dec 05 00:36:13 al kernel: [] __mutex_lock_slowpath+0xce/0x140 Dec 05 00:36:13 al kernel: [] mutex_lock+0x17/0x30 Dec 05 00:36:13 al kernel: [] rtnetlink_rcv+0x19/0x30 Dec 05 00:36:13 al kernel: [] netlink_unicast+0x11a/0x1a0 Dec 05 00:36:13 al kernel: [] netlink_sendmsg+0x4f8/0x610 Dec 05 00:36:13 al kernel:
[v3.13,v3.15] XHCI breaks suspend, causes overheating
Hi, I do not often suspend my laptop with a device inserted on the USB 3.0 port, but when I did last night, it trigged an immediate wake up. Not only that, but after resume, some kworkers were hogging CPU. Another problem is that the laptop overheats in some cases (see end of mail). Some details: Distribution: Arch Linux x86_64 USB peripheral: Samsung S3 (GT-i9300) phone USB HCD: 02:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev 03) Kernel: v3.15-rc8-105-gc593e89 (also confirmed for v3.13.6, other kernels have not been tested) I managed to get the following backtrace with SysRq + l: NMI backtrace for cpu 0 CPU: 0 PID: 11849 Comm: kworker/0:2 Tainted: G O 3.15.0-rc8-custom-00058-gd2cfd31 #1 Hardware name: CLEVO CO.B7130 /B7130 , BIOS 6.00 08/27/2010 Workqueue: pm pm_runtime_work .. [815f8bbf] ? _raw_spin_unlock_irqrestore+0x5f/0x80 [a019b54b] xhci_hub_control+0xbb/0x1450 [xhci_hcd] [81475cfc] ? usb_hcd_submit_urb+0x2dc/0xa80 [81475de7] usb_hcd_submit_urb+0x3c7/0xa80 [8109f535] ? mark_held_locks+0x75/0xa0 [810a4b91] ? __raw_spin_lock_init+0x21/0x60 [8109d1f1] ? lockdep_init_map+0x61/0x530 [814776e2] usb_submit_urb+0x372/0x600 [814780df] usb_start_wait_urb+0x5f/0x150 [810821ed] ? get_parent_ip+0xd/0x50 [81478295] usb_control_msg+0xc5/0x110 [8146b7cc] hub_port_status+0x7c/0x120 [8146d599] hub_activate+0xc9/0x620 [8109f6bd] ? trace_hardirqs_on_caller+0x15d/0x200 [8146dc01] hub_resume+0x31/0x70 [8147b626] usb_resume_interface.isra.5+0xb6/0x120 [8147b788] usb_suspend_both+0xf8/0x1f0 [8147cb83] usb_runtime_suspend+0x33/0x80 [8147cb50] ? usb_probe_interface+0x330/0x330 [814174f2] __rpm_callback+0x32/0x70 [81417556] rpm_callback+0x26/0xa0 [814180d1] rpm_suspend+0x111/0x660 [81418e3d] __pm_runtime_suspend+0x5d/0x90 [8147cbf0] ? usb_runtime_resume+0x20/0x20 [8147cc1a] usb_runtime_idle+0x2a/0x40 [814174f2] __rpm_callback+0x32/0x70 [8141898a] rpm_idle+0x27a/0x310 [8141947f] pm_runtime_work+0xbf/0xd0 [8106bb54] process_one_work+0x1d4/0x560 [8106baf4] ? process_one_work+0x174/0x560 [8106c93c] worker_thread+0x12c/0x3d0 [8106c810] ? manage_workers.isra.23+0x2e0/0x2e0 [810733d6] kthread+0xf6/0x110 [810732e0] ? kthread_create_on_node+0x200/0x200 [8160152c] ret_from_fork+0x7c/0xb0 [810732e0] ? kthread_create_on_node+0x200/0x200 /proc/../stack was not usable, it just showed start_worker (iirc) on top. `perf report` mentions kworker and khubd as top users with call traces corresponding to the above backtrace. CPU usage increased a bit, but the machine was kind of usable: root 11856 1.2 0.0 0 0 ?S00:51 0:10 [kworker/0:3] peter 555 1.8 1.1 3085412 87532 ? Sl Jun09 1:48 kwin root 3 1.9 0.0 0 0 ?SJun09 1:54 [ksoftirqd/0] root14 1.9 0.0 0 0 ?SJun09 1:53 [ksoftirqd/1] peter 561 2.1 2.4 3373080 193416 ? Sl Jun09 2:06 kdeinit4: plasma-desktop [kdeinit] root 11707 2.4 0.0 0 0 ?S00:36 0:40 [kworker/1:3] root 354 2.5 1.0 312392 82996 tty7 Ssl+ Jun09 2:27 /usr/bin/X :0 vt7 -nolisten tcp -auth /var/run/xauth/A:0-dfgwjb root39 4.5 0.0 0 0 ?SJun09 4:24 [khubd] root 11629 12.9 0.0 0 0 ?S00:33 4:02 [kworker/1:2] root 12076 14.6 0.0 0 0 ?S01:02 0:17 [kworker/1:0] root 11849 16.5 0.0 0 0 ?S00:44 3:25 [kworker/0:2] In an attempt to get more details, I executed: echo workqueue:workqueue_queue_work tracing/set_event cat /sys/kernel/debug/tracing/trace_pipe | pv /tmp/trace.txt When the bug is triggered, I see 100k events/s with lines such as: khubd-39[000] d..3 7342.846004: workqueue_queue_work: work struct=8800ae0f0ab8 function=pm_runtime_work workqueue=8802326bde00 req_cpu=256 cpu=0 Attempts to suspend the laptop again failed consistently. After some tries, the laptop began to beep (presumably thermal limits). I have never heard such beeps before, the laptop has never been overheating. Today, I tried to reproduce this bug again. The reproduction steps are: 1. Start the ADB server (`adb shell`). There is no device connected yet. 2. Connect phone to USB 3.0 port (USB debugging is enabled on the Android phone). 3. Try to suspend. 4. If it wakes up again, try to suspend again. Be
[3.11-rc5..] NULL pointer dereference, lockup after XHCI device dropped out
Hi, I hit a BUG followed by a lockup when trying to wipe a new external USB 3.0 1TB 2.5 Western Digital My Passport (WDBBEP0010BBK-EESN) HDD. The following command is used (with sudo): dd if=/dev/zero of=/dev/sdd bs=1M In another shell, I kept sending a USR1 signal to watch progress (maybe not related, but mentioned for completeness): while killall -USR1 dd; do sleep 30; done While doing this over SSH, another KDE session was open with someone playing the Java game Minecraft. I have so far reproduced this lockup three times. The first time, the lock-up occured after: 84838+0 records in 84838+0 records out 88959090688 bytes (89 GB) copied, 789.707 s, 113 MB/s I have no logs from the first lockup. The second time (after a reboot): 388514+1 records in 388514+1 records out 407386685440 bytes (407 GB) copied, 3881.25 s, 105 MB/s The second log part (not third lockup log) below is from this session. I omitted the remaining lines that are printed when the CPU got hung. The third BUG occured 22 minutes (156 GB) after rebooting and running dd and has similar log entries to the previous lock-up. This kernel used the same kernel sources, but with XHCI debugging enabled. No differences were noticable in the console output, other than the missing xHCI xhci_drop_endpoint called with disabled ep message in the new logs. See the first console output below. I have no idea why the USB 3.0 HDD would drop out, but that should not trigger a NULL pointer dereference anyhow. The desktop was running kernel v3.11-rc5-35-gf1d6e17 plus three r8169 patches that are unrelated to this issue and has 24 GB RAM and a i5-2320 CPU. I do not know how to easily reproduce this bug, the current reproduction method described above takes time. One possibly related bug is https://bugzilla.kernel.org/show_bug.cgi?id=51641, but I do not have a panic. Any ideas? If you want me to test something or open a new bug, let me know. Regards, Peter -- === third lockup log === [ 17.724296] usb 6-2: new SuperSpeed USB device number 2 using xhci_hcd [ 17.746522] usb 6-2: Parent hub missing LPM exit latency info. Power management will be impacted. [ 17.759786] usb 6-2: New USB device found, idVendor=1058, idProduct=0748 [ 17.766494] usb 6-2: New USB device strings: Mfr=1, Product=2, SerialNumber=5 [ 17.773628] usb 6-2: Product: My Passport 0748 [ 17.778071] usb 6-2: Manufacturer: Western Digital [ 17.782868] usb 6-2: SerialNumber: 57584231414E35383439 [ 17.790595] usb-storage 6-2:1.0: USB Mass Storage device detected [ 17.796995] scsi9 : usb-storage 6-2:1.0 [ 18.800604] scsi 9:0:0:0: Direct-Access WD My Passport 0748 1022 PQ: 0 ANSI: 6 [ 18.808994] scsi 9:0:0:1: Enclosure WD SES Device 1022 PQ: 0 ANSI: 6 [ 18.817618] sd 9:0:0:0: Attached scsi generic sg4 type 0 [ 18.819416] sd 9:0:0:0: [sdd] Spinning up disk... [ 18.827943] scsi 9:0:0:1: Attached scsi generic sg5 type 13 [ 19.823185] ..ready [ 20.828276] sd 9:0:0:0: [sdd] 1953458176 512-byte logical blocks: (1.00 TB/931 GiB) [ 20.836548] sd 9:0:0:0: [sdd] Write Protect is off [ 20.841352] sd 9:0:0:0: [sdd] Mode Sense: 47 00 10 08 [ 20.847172] sd 9:0:0:0: [sdd] No Caching mode page present [ 20.852674] sd 9:0:0:0: [sdd] Assuming drive cache: write through [ 20.860714] sd 9:0:0:0: [sdd] No Caching mode page present [ 20.866208] sd 9:0:0:0: [sdd] Assuming drive cache: write through [ 20.885996] sdd: unknown partition table [ 20.891706] sd 9:0:0:0: [sdd] No Caching mode page present [ 20.897214] sd 9:0:0:0: [sdd] Assuming drive cache: write through [ 20.903310] sd 9:0:0:0: [sdd] Attached SCSI disk [ 1355.971539] usb 6-2: USB disconnect, device number 2 [ 1385.834230] sd 9:0:0:0: Device offlined - not ready after error recovery [ 1385.842002] sd 9:0:0:0: [sdd] Unhandled error code [ 1385.846811] sd 9:0:0:0: [sdd] [ 1385.849980] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1385.856015] sd 9:0:0:0: [sdd] CDB: [ 1385.859518] Write(10): 2a 00 10 4f eb 00 00 00 f0 00 [ 1385.864748] end_request: I/O error, dev sdd, sector 273672960 [ 1385.870509] quiet_error: 6 callbacks suppressed [ 1385.875047] Buffer I/O error on device sdd, logical block 34209120 [ 1385.881240] lost page write due to I/O error on sdd [ 1385.886144] Buffer I/O error on device sdd, logical block 34209121 [ 1385.892326] lost page write due to I/O error on sdd [ 1385.897221] Buffer I/O error on device sdd, logical block 34209122 [ 1385.903405] lost page write due to I/O error on sdd [ 1385.908290] Buffer I/O error on device sdd, logical block 34209123 [ 1385.914478] lost page write due to I/O error on sdd [ 1385.919372] Buffer I/O error on device sdd, logical block 34209124 [ 1385.925557] lost page write due to I/O error on sdd [ 1385.930448] Buffer I/O error on device sdd, logical block 34209125 [ 1385.936630] lost page write due to I/O error on sdd [ 1385.941530] Buffer I/O error on device sdd, logical
Re: [PATCH 1/2] Revert Revert HID: Fix logitech-dj: missing Unifying device issue
On Tuesday 13 August 2013 08:13:17 Peter Hurley wrote: On 08/12/2013 05:54 PM, Peter Wu wrote: On Thursday 18 July 2013 16:28:01 Peter Hurley wrote: Before we revert to using the workaround, I'd like to suggest that this new hidden problem may be an interaction with the xhci_hcd host controller driver only. Looking at the related bug, the OP indicates the machine only has USB3 ports. Additionally, comments #7, #100, and #104 of the original bug report add additional information that would seem to confirm this suspicion. Let me add I have this USB device running on the uhci_hcd driver with or without this workaround on v3.10. This problem does not seem specific to xhci, uhci seems also effected. If true, it would certainly help to have a bug report confirming uhci failure from a bare-metal system which contained: 1) kernel version 2) complete dmesg output 3) lsusb -v output 4) lsmod output 5) usbmon capture from a plug attempt I was too fast in drawing a conclusion, besides the kernel I also upgraded some other packages. Today the issue also showed up in 3.9.9 + updated packages. When checking the dmesg, the issue solved by this patch did not occur (the enumeration was successful). Today I upgraded a system (running Arch Linux) from kernel 3.9.9 to 3.10.5. After a reboot to 3.10.5, things broke. The setup: - There are two USB receivers plugged into USB 1.1 ports (different buses according to lsusb, uhci), each receiver is paired to a K360 keyboard. - One of the receivers are passed to a QEMU guest with -usbdevice host:$busid. $devid. This keyboard is working (probably because QEMU performed a reset). - Since 3.10.5, the keyboard that is *not* passed to the QEMU guest is not functioning on reboot. After closing the QEMU guest, the USB bus gets reset(?) after which the other keyboard suddenly gets detected. I had only booted 3.10.5 twice before rolling back to 3.9.9, both boots triggered the issue. Do I need to provide a usbmon, lsusb, dmesg and/ or other details from 3.10.5? Do both keyboards work on bare metal? Seems like this problem might be specific to qemu (or kvm) and you may get more insight on those lists. I haven't tested that, the system automatically boots into openbox + QEMU. Previously, both keyboards worked on bare metal, so I think it still works. Note that there are other Arch Linux users who have reported issues[1][2] Unfortunately, not even one user in the referenced reports identified the usb hub the receiver was plugged into. I've asked it now. since upgrading to 3.10.z. Triggering a re-enumeration by writing the magic HID++ message[3] makes the paired devices appear again (as reported in forums[1], I haven't tried this on the affected UHCI machine). While the underlying bug is fixed, can this patch be forwarded to stable? I see that 3.10.6 has been released, but still without this patch. This is still a workaround and not really a fix for the underlying bug. I meant to say, while the underlying bug is *being* fixed. Anyway, can this patch be applied to 3.10? Sorry for the confusion with uhci, looking further it seems that the wrong USB receiver is being passed to QEMU. It's not a kernel issue, perhaps I can blame libusbx. Regards, Peter -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html