Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c
On 10.12.2017 00:38, Alexander Kappner wrote: Hi Mathias, thanks for the patch! The system now resumes cleanly from hibernate even with usbmuxd doing its thing. Tested-by: Alexander KappnerWhile testing this I hit some other issues with xhci-debugfs.c but I'll write these up in a separate bug. Thanks, Adding reported-by and tested-by tags, and pushing patch to my for-usb-linus branch -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c
Hi Mathias, thanks for the patch! The system now resumes cleanly from hibernate even with usbmuxd doing its thing. Tested-by: Alexander KappnerWhile testing this I hit some other issues with xhci-debugfs.c but I'll write these up in a separate bug. On 12/08/2017 09:01 AM, Mathias Nyman wrote: On 08.12.2017 13:06, Alexander Kappner wrote: Hi, I think we need to dig a bit deeper. It's good to check if spriv is valid but there are probably other reasons than kzalloc failing. I agree -- this small allocation is unlikely to fail in practice. Also, while my patch prevents the kernel oops, it also prevents the debugfs entries from being created. I've been debugging this more trying to come up with a better solution, but I might need some guidance as I'm not too familiar with the USB subsystem. The immediate cause of the crash was usbmuxd sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn calls xhci_debugfs_create_endpoint. The ioctl handler acquires a device-specific lock via usb_lock_device. When the system resumed from hibernate, xhci_resume was called. This in turn called xhci_mem_cleanup to deallocate the device structures, which include setting the debugfs_private pointer to NULL (via xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl is somehow racing with the hibernate. The call to xhci_resume is protected by a host-controller specific lock (xhci->lock) but it doesn't attempt to take the usb_lock_device device-specific lock. Now my suspicion is that xhci_resume freed and zeroed the device structures while racing with the ioctl handler. I can't seem to find any exclusion mechanism that would prevent xhci_resume from racing with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter). Am I missing something? If not, is there any reason why an ioctl might need to execute in parallel with the xhci_resume? If not, can we just do a busy wait in xhci_resume until all pending ioctls have returned? I'm not sure, but if I recall correctly then power management is supposed to make sure a driver doesn't access usb devices while the host controller is still resuming. The odd thing here is that xhci_debugfs_remove_slot(xhci, slot_id), and xhci_free_virt_device(xhci, slot_id) are called together when xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first() That means both the xhci_virt_device *dev and dev->debugfs_private should both be freed and xhci->devs[slot_id] set NULL for that virt_device. so xhci_add_endpoint() should fail a lot earlier because the xhci->devs[slot_id] should be a null pointer as well. Allocation is also done together in xhci_alloc_dev() Looking at it more closely there is actually the .free_dev callback that first frees the dev->debugs_private but the virt_dev is only freed conditionally later Attached a patch that frees them together, can you try it out? If it doesn't help we need to add some elaborate tracing Thanks -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c
On 08.12.2017 13:06, Alexander Kappner wrote: Hi, I think we need to dig a bit deeper. It's good to check if spriv is valid but there are probably other reasons than kzalloc failing. I agree -- this small allocation is unlikely to fail in practice. Also, while my patch prevents the kernel oops, it also prevents the debugfs entries from being created. I've been debugging this more trying to come up with a better solution, but I might need some guidance as I'm not too familiar with the USB subsystem. The immediate cause of the crash was usbmuxd sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn calls xhci_debugfs_create_endpoint. The ioctl handler acquires a device-specific lock via usb_lock_device. When the system resumed from hibernate, xhci_resume was called. This in turn called xhci_mem_cleanup to deallocate the device structures, which include setting the debugfs_private pointer to NULL (via xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl is somehow racing with the hibernate. The call to xhci_resume is protected by a host-controller specific lock (xhci->lock) but it doesn't attempt to take the usb_lock_device device-specific lock. Now my suspicion is that xhci_resume freed and zeroed the device structures while racing with the ioctl handler. I can't seem to find any exclusion mechanism that would prevent xhci_resume from racing with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter). Am I missing something? If not, is there any reason why an ioctl might need to execute in parallel with the xhci_resume? If not, can we just do a busy wait in xhci_resume until all pending ioctls have returned? I'm not sure, but if I recall correctly then power management is supposed to make sure a driver doesn't access usb devices while the host controller is still resuming. The odd thing here is that xhci_debugfs_remove_slot(xhci, slot_id), and xhci_free_virt_device(xhci, slot_id) are called together when xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first() That means both the xhci_virt_device *dev and dev->debugfs_private should both be freed and xhci->devs[slot_id] set NULL for that virt_device. so xhci_add_endpoint() should fail a lot earlier because the xhci->devs[slot_id] should be a null pointer as well. Allocation is also done together in xhci_alloc_dev() Looking at it more closely there is actually the .free_dev callback that first frees the dev->debugs_private but the virt_dev is only freed conditionally later Attached a patch that frees them together, can you try it out? If it doesn't help we need to add some elaborate tracing Thanks -Mathias >From 811aad86287e78879b7e7f28e0c266bcd2a2f73e Mon Sep 17 00:00:00 2001 From: Mathias NymanDate: Fri, 8 Dec 2017 18:50:18 +0200 Subject: [PATCH] xhci: Only free xhci_virt_device->debugfs_private if device is freed freeing device is conditional, we otherwise might end up with a xhci virt_device that doesnt have a valid debugs_private pointer. For testing only Signed-off-by: Mathias Nyman --- drivers/usb/host/xhci.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 2424d30..da6dbe3 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -3525,8 +3525,6 @@ static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev) struct xhci_slot_ctx *slot_ctx; int i, ret; - xhci_debugfs_remove_slot(xhci, udev->slot_id); - #ifndef CONFIG_USB_DEFAULT_PERSIST /* * We called pm_runtime_get_noresume when the device was attached. @@ -3555,8 +3553,10 @@ static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev) } ret = xhci_disable_slot(xhci, udev->slot_id); - if (ret) + if (ret) { + xhci_debugfs_remove_slot(xhci, udev->slot_id); xhci_free_virt_device(xhci, udev->slot_id); + } } int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id) -- 2.7.4
Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c
Hi, > I think we need to dig a bit deeper. It's good to check if spriv is valid but there are probably other reasons than kzalloc failing. I agree -- this small allocation is unlikely to fail in practice. Also, while my patch prevents the kernel oops, it also prevents the debugfs entries from being created. I've been debugging this more trying to come up with a better solution, but I might need some guidance as I'm not too familiar with the USB subsystem. The immediate cause of the crash was usbmuxd sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn calls xhci_debugfs_create_endpoint. The ioctl handler acquires a device-specific lock via usb_lock_device. When the system resumed from hibernate, xhci_resume was called. This in turn called xhci_mem_cleanup to deallocate the device structures, which include setting the debugfs_private pointer to NULL (via xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl is somehow racing with the hibernate. The call to xhci_resume is protected by a host-controller specific lock (xhci->lock) but it doesn't attempt to take the usb_lock_device device-specific lock. Now my suspicion is that xhci_resume freed and zeroed the device structures while racing with the ioctl handler. I can't seem to find any exclusion mechanism that would prevent xhci_resume from racing with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter). Am I missing something? If not, is there any reason why an ioctl might need to execute in parallel with the xhci_resume? If not, can we just do a busy wait in xhci_resume until all pending ioctls have returned? > (Added this as reply subject, please remember to send mail with proper subject) My apologies, thanks for fixing. On 12/07/2017 04:47 AM, Mathias Nyman wrote: On 07.12.2017 11:26, Alexander Kappner wrote: Date: Wed, 6 Dec 2017 15:28:37 -0800 Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c (Added this as reply subject, please remember to send mail with proper subject) My kernel crashed just after resuming from hibernate and starting usbmuxd (a user-space daemon for iOS device pairing) with several USB devices connected (dmesg attached). Backtrace leads to: 0x8170465d is in xhci_debugfs_create_endpoint (drivers/usb/host/xhci-debugfs.c:381). 376 int ep_index) 377 { 378 struct xhci_ep_priv *epriv; 379 struct xhci_slot_priv *spriv = dev->debugfs_private; 380 381 if (spriv->eps[ep_index]) 382 return; 383 384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL); 385 if (!epriv) The read violation happens at address 0x40 and sizeof(struct xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here. Thanks for reporting and debugging this, much appreciated. spriv gets allocated in xhci_debugfs_create_slot: ... priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return; ... There's no separate error path if this allocation fails, so we might be left with NULL in priv. Subsequent users of priv thus need to check for this NULL - so this is what the patch does. I think we need to dig a bit deeper. It's good to check if spriv is valid but there are probably other reasons than kzalloc failing. Resume from hibernation will reset the xhci controller, reinitializing a lot. It could be related. -Mathias (keeping rest of message as reference) There might be other ways of triggering this null pointer dereference, including when xhci_resume frees the device structures (e.g. after returning from a hibernate), but I wasn't able to find or reproduce it. [63953.758083] BUG: unable to handle kernel NULL pointer dereference at 0040 [63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0 [63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0 [63953.758093] Oops: [#1] PREEMPT SMP [63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211 qmi_wwan ecdh_generic thinkpad_acpi rfkill [63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P O 4.14.0.1-12769-g1deab8c #1 [63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W (1.35 ) 11/10/2016 [63953.758105] task: 8810527ba0c0 task.stack: c9000a8ec000 [63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0 [63953.758108] RSP: 0018:c9000a8efc80 EFLAGS: 00010206 [63953.758109] RAX: RBX: 88105a71c000 RCX: 0003 [63953.758110] RDX: 0003 RSI: 880c0b57e000 RDI: 88105a71c238 [63953.758110] RBP: 0003 R08: 881063800600 R09: 0003 [63953.758111] R10: 88105a71c238 R11: 0001 R12: 0011
Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c
On 07.12.2017 11:26, Alexander Kappner wrote: Date: Wed, 6 Dec 2017 15:28:37 -0800 Subject: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c (Added this as reply subject, please remember to send mail with proper subject) My kernel crashed just after resuming from hibernate and starting usbmuxd (a user-space daemon for iOS device pairing) with several USB devices connected (dmesg attached). Backtrace leads to: 0x8170465d is in xhci_debugfs_create_endpoint (drivers/usb/host/xhci-debugfs.c:381). 376 int ep_index) 377 { 378 struct xhci_ep_priv *epriv; 379 struct xhci_slot_priv *spriv = dev->debugfs_private; 380 381 if (spriv->eps[ep_index]) 382 return; 383 384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL); 385 if (!epriv) The read violation happens at address 0x40 and sizeof(struct xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here. Thanks for reporting and debugging this, much appreciated. spriv gets allocated in xhci_debugfs_create_slot: ... priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return; ... There's no separate error path if this allocation fails, so we might be left with NULL in priv. Subsequent users of priv thus need to check for this NULL - so this is what the patch does. I think we need to dig a bit deeper. It's good to check if spriv is valid but there are probably other reasons than kzalloc failing. Resume from hibernation will reset the xhci controller, reinitializing a lot. It could be related. -Mathias (keeping rest of message as reference) There might be other ways of triggering this null pointer dereference, including when xhci_resume frees the device structures (e.g. after returning from a hibernate), but I wasn't able to find or reproduce it. [63953.758083] BUG: unable to handle kernel NULL pointer dereference at 0040 [63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0 [63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0 [63953.758093] Oops: [#1] PREEMPT SMP [63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211 qmi_wwan ecdh_generic thinkpad_acpi rfkill [63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P O 4.14.0.1-12769-g1deab8c #1 [63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W (1.35 ) 11/10/2016 [63953.758105] task: 8810527ba0c0 task.stack: c9000a8ec000 [63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0 [63953.758108] RSP: 0018:c9000a8efc80 EFLAGS: 00010206 [63953.758109] RAX: RBX: 88105a71c000 RCX: 0003 [63953.758110] RDX: 0003 RSI: 880c0b57e000 RDI: 88105a71c238 [63953.758110] RBP: 0003 R08: 881063800600 R09: 0003 [63953.758111] R10: 88105a71c238 R11: 0001 R12: 0011 [63953.758112] R13: 0018 R14: R15: [63953.758113] FS: 7f0a77715700() GS:8810a3d0() knlGS: [63953.758114] CS: 0010 DS: ES: CR0: 80050033 [63953.758115] CR2: 0040 CR3: 0003f91a8006 CR4: 003606e0 [63953.758115] DR0: DR1: DR2: [63953.758116] DR3: DR6: fffe0ff0 DR7: 0400 [63953.758117] Call Trace: [63953.758120] xhci_add_endpoint+0x127/0x2b0 [63953.758123] usb_hcd_alloc_bandwidth+0x1ad/0x300 [63953.758125] usb_set_configuration+0x1c8/0x880 [63953.758128] usbdev_do_ioctl+0xc41/0x1120 [63953.758130] usbdev_ioctl+0xa/0x10 [63953.758151] do_vfs_ioctl+0x8b/0x5c0 [63953.758153] ? __fget+0x6c/0xb0 [63953.758155] SyS_ioctl+0x76/0x90 [63953.758157] do_syscall_64+0x6b/0x290 [63953.758173] entry_SYSCALL64_slow_path+0x25/0x25 [63953.758175] RIP: 0033:0x7f0a76a151c7 [63953.758175] RSP: 002b:7ffd1431b0c8 EFLAGS: 0202 ORIG_RAX: 0010 [63953.758177] RAX: ffda RBX: 023239a0 RCX: 7f0a76a151c7 [63953.758178] RDX: 7ffd1431b0dc RSI: 80045505 RDI: 000e [63953.758178] RBP: 023240c0 R08: 7ffd1431b008 R09: 0004 [63953.758179] R10: 7ffd1431aec0 R11: 0202 R12: 023240c0 [63953.758180] R13: 0001 R14: 0056 R15: 0038 [63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee <49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d [63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP: c9000a8efc80 [63953.758203] CR2: 0040 [63953.758204] ---[ end trace 1f7ea9a959f02054 ]--- Signed-off-by: Alexander Kappner---