Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

2017-12-11 Thread Mathias Nyman

On 10.12.2017 00:38, Alexander Kappner wrote:

Hi Mathias,

thanks for the patch! The system now resumes cleanly from hibernate even with 
usbmuxd doing its thing.

Tested-by: Alexander Kappner 

While testing this I hit some other issues with xhci-debugfs.c but I'll write 
these up in a separate bug.



Thanks,
Adding reported-by and tested-by tags, and pushing patch to my for-usb-linus 
branch

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

2017-12-09 Thread Alexander Kappner

Hi Mathias,

thanks for the patch! The system now resumes cleanly from hibernate even 
with usbmuxd doing its thing.


Tested-by: Alexander Kappner 

While testing this I hit some other issues with xhci-debugfs.c but I'll 
write these up in a separate bug.


On 12/08/2017 09:01 AM, Mathias Nyman wrote:

On 08.12.2017 13:06, Alexander Kappner wrote:

Hi,


I think we need to dig a bit deeper. It's good to check if spriv is
valid
but there are probably other reasons than kzalloc failing.


I agree -- this small allocation is  unlikely to fail in practice.
Also, while my patch prevents the kernel oops, it also prevents the
debugfs entries from being created.

I've been debugging this more trying to come up with a better
solution, but I might need some guidance as I'm not too familiar with
the USB subsystem. The immediate cause of the crash was usbmuxd
sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
acquires a device-specific lock via usb_lock_device.

When the system resumed from hibernate, xhci_resume was called. This
in turn called xhci_mem_cleanup to deallocate the device structures,
which include setting the debugfs_private pointer to NULL  (via
xhci_free_virt_devices_depth_first). It thus seems likely that the
ioctl is somehow racing with the hibernate. The call to xhci_resume
is protected by a host-controller specific lock (xhci->lock) but it
doesn't attempt to take the usb_lock_device device-specific lock.

Now my suspicion is that xhci_resume freed and zeroed the device
structures while racing with the ioctl handler. I can't seem to find
any exclusion mechanism that would prevent xhci_resume from racing
with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
that matter). Am I missing something? If not, is there any reason why
an ioctl might need to execute in parallel with the xhci_resume? If
not, can we just do a busy wait in xhci_resume until all pending
ioctls have returned?


I'm not sure, but if I recall correctly then power management is supposed
to make sure a driver doesn't access usb devices while the host 
controller

is still resuming.

The odd thing here is that
xhci_debugfs_remove_slot(xhci, slot_id), and
xhci_free_virt_device(xhci, slot_id) are called together when
xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()

That means both the xhci_virt_device *dev and dev->debugfs_private
should both be freed and xhci->devs[slot_id] set NULL for that 
virt_device.


so xhci_add_endpoint() should fail a lot earlier because the 
xhci->devs[slot_id]

should be a null pointer as well.

Allocation is also done together in xhci_alloc_dev()

Looking at it more closely there is actually the .free_dev callback that
first frees the dev->debugs_private but the virt_dev is only freed
conditionally later

Attached a patch that frees them together, can you try it out?

If it doesn't help we need to add some elaborate tracing

Thanks
-Mathias




--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

2017-12-08 Thread Mathias Nyman

On 08.12.2017 13:06, Alexander Kappner wrote:

Hi,


I think we need to dig a bit deeper. It's good to check if spriv is
valid
but there are probably other reasons than kzalloc failing.


I agree -- this small allocation is  unlikely to fail in practice.
Also, while my patch prevents the kernel oops, it also prevents the
debugfs entries from being created.

I've been debugging this more trying to come up with a better
solution, but I might need some guidance as I'm not too familiar with
the USB subsystem. The immediate cause of the crash was usbmuxd
sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
acquires a device-specific lock via usb_lock_device.

When the system resumed from hibernate, xhci_resume was called. This
in turn called xhci_mem_cleanup to deallocate the device structures,
which include setting the debugfs_private pointer to NULL  (via
xhci_free_virt_devices_depth_first). It thus seems likely that the
ioctl is somehow racing with the hibernate. The call to xhci_resume
is protected by a host-controller specific lock (xhci->lock) but it
doesn't attempt to take the usb_lock_device device-specific lock.

Now my suspicion is that xhci_resume freed and zeroed the device
structures while racing with the ioctl handler. I can't seem to find
any exclusion mechanism that would prevent xhci_resume from racing
with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
that matter). Am I missing something? If not, is there any reason why
an ioctl might need to execute in parallel with the xhci_resume? If
not, can we just do a busy wait in xhci_resume until all pending
ioctls have returned?


I'm not sure, but if I recall correctly then power management is supposed
to make sure a driver doesn't access usb devices while the host controller
is still resuming.

The odd thing here is that
xhci_debugfs_remove_slot(xhci, slot_id), and
xhci_free_virt_device(xhci, slot_id) are called together when
xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()

That means both the xhci_virt_device *dev and dev->debugfs_private
should both be freed and xhci->devs[slot_id] set NULL for that virt_device.

so xhci_add_endpoint() should fail a lot earlier because the xhci->devs[slot_id]
should be a null pointer as well.

Allocation is also done together in xhci_alloc_dev()

Looking at it more closely there is actually the .free_dev callback that
first frees the dev->debugs_private but the virt_dev is only freed
conditionally later

Attached a patch that frees them together, can you try it out?

If it doesn't help we need to add some elaborate tracing

Thanks
-Mathias


>From 811aad86287e78879b7e7f28e0c266bcd2a2f73e Mon Sep 17 00:00:00 2001
From: Mathias Nyman 
Date: Fri, 8 Dec 2017 18:50:18 +0200
Subject: [PATCH] xhci: Only free xhci_virt_device->debugfs_private if device
 is freed

freeing device is conditional, we otherwise might end up with a
xhci virt_device that doesnt have a valid debugs_private pointer.
For testing only

Signed-off-by: Mathias Nyman 
---
 drivers/usb/host/xhci.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 2424d30..da6dbe3 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3525,8 +3525,6 @@ static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	struct xhci_slot_ctx *slot_ctx;
 	int i, ret;
 
-	xhci_debugfs_remove_slot(xhci, udev->slot_id);
-
 #ifndef CONFIG_USB_DEFAULT_PERSIST
 	/*
 	 * We called pm_runtime_get_noresume when the device was attached.
@@ -3555,8 +3553,10 @@ static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	}
 
 	ret = xhci_disable_slot(xhci, udev->slot_id);
-	if (ret)
+	if (ret) {
+		xhci_debugfs_remove_slot(xhci, udev->slot_id);
 		xhci_free_virt_device(xhci, udev->slot_id);
+	}
 }
 
 int xhci_disable_slot(struct xhci_hcd *xhci, u32 slot_id)
-- 
2.7.4



Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

2017-12-08 Thread Alexander Kappner

Hi,

> I think we need to dig a bit deeper. It's good to check if spriv is 
valid

but there are probably other reasons than kzalloc failing.

I agree -- this small allocation is  unlikely to fail in practice. Also, 
while my patch prevents the kernel oops, it also prevents the debugfs 
entries from being created.


I've been debugging this more trying to come up with a better solution, 
but I might need some guidance as I'm not too familiar with the USB 
subsystem. The immediate cause of the crash was usbmuxd sending a 
USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if it fails_ 
calls usb_hcd_alloc_bandwidth to try and reset the device, which in turn 
calls xhci_debugfs_create_endpoint. The ioctl handler acquires a 
device-specific lock via usb_lock_device.


When the system resumed from hibernate, xhci_resume was called. This in 
turn called xhci_mem_cleanup to deallocate the device structures, which 
include setting the debugfs_private pointer to NULL  (via 
xhci_free_virt_devices_depth_first). It thus seems likely that the ioctl 
is somehow racing with the hibernate. The call to xhci_resume is 
protected by a host-controller specific lock (xhci->lock) but it doesn't 
attempt to take the usb_lock_device device-specific lock.


Now my suspicion is that xhci_resume freed and zeroed the device 
structures while racing with the ioctl handler. I can't seem to find any 
exclusion mechanism that would prevent xhci_resume from racing with the 
USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for that matter). 
Am I missing something? If not, is there any reason why an ioctl might 
need to execute in parallel with the xhci_resume? If not, can we just do 
a busy wait in xhci_resume until all pending ioctls have returned?


> (Added this as reply subject, please remember to send mail with 
proper subject)

My apologies, thanks for fixing.


On 12/07/2017 04:47 AM, Mathias Nyman wrote:

On 07.12.2017 11:26, Alexander Kappner wrote:

Date: Wed, 6 Dec 2017 15:28:37 -0800
Subject: [PATCH] usb-core: Fix potential null pointer dereference in 
xhci-debugfs.c


(Added this as reply subject, please remember to send mail with proper 
subject)




My kernel crashed just after resuming from hibernate and starting 
usbmuxd

(a user-space daemon for iOS device pairing) with several USB devices
connected (dmesg attached).

Backtrace leads to:

0x8170465d is in xhci_debugfs_create_endpoint
(drivers/usb/host/xhci-debugfs.c:381).
376   int ep_index)
377 {
378 struct xhci_ep_priv *epriv;
379 struct xhci_slot_priv   *spriv = dev->debugfs_private;
380
381 if (spriv->eps[ep_index])
382 return;
383
384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
385 if (!epriv)

The read violation happens at address 0x40 and sizeof(struct
xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.



Thanks for reporting and debugging this, much appreciated.


spriv gets allocated in xhci_debugfs_create_slot:

...
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
 return;
...

There's no separate error path if this allocation fails, so we might be
left with NULL in priv. Subsequent users of priv thus need to check for
this NULL - so this is what the patch does.



I think we need to dig a bit deeper. It's good to check if spriv is valid
but there are probably other reasons than kzalloc failing.

Resume from hibernation will reset the xhci controller, reinitializing 
a lot.

It could be related.

-Mathias

(keeping rest of message as reference)


There might be other ways of triggering this null pointer dereference,
including when xhci_resume frees the device structures (e.g. after
returning from a hibernate), but I wasn't able to find or reproduce it.

[63953.758083] BUG: unable to handle kernel NULL pointer dereference at
0040
[63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
[63953.758093] Oops:  [#1] PREEMPT SMP
[63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
qmi_wwan ecdh_generic thinkpad_acpi rfkill
[63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P   O
4.14.0.1-12769-g1deab8c #1
[63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS 
N1EET62W

(1.35 ) 11/10/2016
[63953.758105] task: 8810527ba0c0 task.stack: c9000a8ec000
[63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758108] RSP: 0018:c9000a8efc80 EFLAGS: 00010206
[63953.758109] RAX:  RBX: 88105a71c000 RCX:
0003
[63953.758110] RDX: 0003 RSI: 880c0b57e000 RDI:
88105a71c238
[63953.758110] RBP: 0003 R08: 881063800600 R09:
0003
[63953.758111] R10: 88105a71c238 R11: 0001 R12:
0011

Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

2017-12-07 Thread Mathias Nyman

On 07.12.2017 11:26, Alexander Kappner wrote:

Date: Wed, 6 Dec 2017 15:28:37 -0800
Subject: [PATCH] usb-core: Fix potential null pointer dereference in 
xhci-debugfs.c


(Added this as reply subject, please remember to send mail with proper subject)



My kernel crashed just after resuming from hibernate and starting usbmuxd
(a user-space daemon for iOS device pairing) with several USB devices
connected (dmesg attached).

Backtrace leads to:

0x8170465d is in xhci_debugfs_create_endpoint
(drivers/usb/host/xhci-debugfs.c:381).
376   int ep_index)
377 {
378 struct xhci_ep_priv *epriv;
379 struct xhci_slot_priv   *spriv = dev->debugfs_private;
380
381 if (spriv->eps[ep_index])
382 return;
383
384 epriv = kzalloc(sizeof(*epriv), GFP_KERNEL);
385 if (!epriv)

The read violation happens at address 0x40 and sizeof(struct
xhci_ep_priv)=0x40, so it seems ep_index is 1 and spriv is NULL here.



Thanks for reporting and debugging this, much appreciated.
 

spriv gets allocated in xhci_debugfs_create_slot:

...
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
 return;
...

There's no separate error path if this allocation fails, so we might be
left with NULL in priv. Subsequent users of priv thus need to check for
this NULL - so this is what the patch does.



I think we need to dig a bit deeper. It's good to check if spriv is valid
but there are probably other reasons than kzalloc failing.

Resume from hibernation will reset the xhci controller, reinitializing a lot.
It could be related.

-Mathias

(keeping rest of message as reference)


There might be other ways of triggering this null pointer dereference,
including when xhci_resume frees the device structures (e.g. after
returning from a hibernate), but I wasn't able to find or reproduce it.

[63953.758083] BUG: unable to handle kernel NULL pointer dereference at
0040
[63953.758090] IP: xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758091] PGD bb911d067 P4D bb911d067 PUD 10500ff067 PMD 0
[63953.758093] Oops:  [#1] PREEMPT SMP
[63953.758094] Modules linked in: ipheth tun nvidia_modeset(PO) iwlmvm
mac80211 iwlwifi nvidia(PO) btusb btrtl btbcm btintel bluetooth cfg80211
qmi_wwan ecdh_generic thinkpad_acpi rfkill
[63953.758103] CPU: 4 PID: 27091 Comm: usbmuxd Tainted: P   O
4.14.0.1-12769-g1deab8c #1
[63953.758104] Hardware name: LENOVO 20ENCTO1WW/20ENCTO1WW, BIOS N1EET62W
(1.35 ) 11/10/2016
[63953.758105] task: 8810527ba0c0 task.stack: c9000a8ec000
[63953.758107] RIP: 0010:xhci_debugfs_create_endpoint+0x1d/0xa0
[63953.758108] RSP: 0018:c9000a8efc80 EFLAGS: 00010206
[63953.758109] RAX:  RBX: 88105a71c000 RCX:
0003
[63953.758110] RDX: 0003 RSI: 880c0b57e000 RDI:
88105a71c238
[63953.758110] RBP: 0003 R08: 881063800600 R09:
0003
[63953.758111] R10: 88105a71c238 R11: 0001 R12:
0011
[63953.758112] R13: 0018 R14:  R15:

[63953.758113] FS:  7f0a77715700() GS:8810a3d0()
knlGS:
[63953.758114] CS:  0010 DS:  ES:  CR0: 80050033
[63953.758115] CR2: 0040 CR3: 0003f91a8006 CR4:
003606e0
[63953.758115] DR0:  DR1:  DR2:

[63953.758116] DR3:  DR6: fffe0ff0 DR7:
0400
[63953.758117] Call Trace:
[63953.758120]  xhci_add_endpoint+0x127/0x2b0
[63953.758123]  usb_hcd_alloc_bandwidth+0x1ad/0x300
[63953.758125]  usb_set_configuration+0x1c8/0x880
[63953.758128]  usbdev_do_ioctl+0xc41/0x1120
[63953.758130]  usbdev_ioctl+0xa/0x10
[63953.758151]  do_vfs_ioctl+0x8b/0x5c0
[63953.758153]  ? __fget+0x6c/0xb0
[63953.758155]  SyS_ioctl+0x76/0x90
[63953.758157]  do_syscall_64+0x6b/0x290
[63953.758173]  entry_SYSCALL64_slow_path+0x25/0x25
[63953.758175] RIP: 0033:0x7f0a76a151c7
[63953.758175] RSP: 002b:7ffd1431b0c8 EFLAGS: 0202 ORIG_RAX:
0010
[63953.758177] RAX: ffda RBX: 023239a0 RCX:
7f0a76a151c7
[63953.758178] RDX: 7ffd1431b0dc RSI: 80045505 RDI:
000e
[63953.758178] RBP: 023240c0 R08: 7ffd1431b008 R09:
0004
[63953.758179] R10: 7ffd1431aec0 R11: 0202 R12:
023240c0
[63953.758180] R13: 0001 R14: 0056 R15:
0038
[63953.758182] Code: e9 39 ff ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 41 57 41 56 41 55 41 54 55 48 63 ea 53 4c 8b b6 88 15 00 00 4d 8d 2c ee
<49> 83 7d 28 00 74 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 8b 3d
[63953.758202] RIP: xhci_debugfs_create_endpoint+0x1d/0xa0 RSP:
c9000a8efc80
[63953.758203] CR2: 0040
[63953.758204] ---[ end trace 1f7ea9a959f02054 ]---

Signed-off-by: Alexander Kappner 
---