Re: [PATCH v3] usb: core: hub: hub_port_init lock controller instead of bus

2016-04-27 Thread Mathias Nyman

On 25.04.2016 15:48, Chris Bainbridge wrote:

The XHCI controller presents two USB buses to the system - one for USB2
and one for USB3. The hub init code (hub_port_init) is reentrant but
only locks one bus per thread, leading to a race condition failure when
two threads attempt to simultaneously initialise a USB2 and USB3 device:

[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
[   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this failure occurred on 6% of all boots.

The call traces at the point of failure are:

Call Trace:
  [] schedule+0x37/0x90
  [] usb_kill_urb+0x8d/0xd0
  [] ? wake_up_atomic_t+0x30/0x30
  [] usb_start_wait_urb+0xbe/0x150
  [] usb_control_msg+0xbc/0xf0
  [] hub_port_init+0x51e/0xb70
  [] hub_event+0x817/0x1570
  [] process_one_work+0x1ff/0x620
  [] ? process_one_work+0x15f/0x620
  [] worker_thread+0x64/0x4b0
  [] ? rescuer_thread+0x390/0x390
  [] kthread+0x105/0x120
  [] ? kthread_create_on_node+0x200/0x200
  [] ret_from_fork+0x3f/0x70
  [] ? kthread_create_on_node+0x200/0x200

Call Trace:
  [] xhci_setup_device+0x53d/0xa40
  [] xhci_address_device+0xe/0x10
  [] hub_port_init+0x1bf/0xb70
  [] ? trace_hardirqs_on+0xd/0x10
  [] hub_event+0x817/0x1570
  [] process_one_work+0x1ff/0x620
  [] ? process_one_work+0x15f/0x620
  [] worker_thread+0x64/0x4b0
  [] ? rescuer_thread+0x390/0x390
  [] kthread+0x105/0x120
  [] ? kthread_create_on_node+0x200/0x200
  [] ret_from_fork+0x3f/0x70
  [] ? kthread_create_on_node+0x200/0x200

Which results from the two call chains:

hub_port_init
  usb_get_device_descriptor
   usb_get_descriptor
usb_control_msg
 usb_internal_control_msg
  usb_start_wait_urb
   usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
  hub_set_address
   xhci_address_device
xhci_setup_device

Mathias Nyman explains the current behaviour violates the XHCI spec:

  hub_port_reset() will end up moving the corresponding xhci device slot
  to default state.

  As hub_port_reset() is called several times in hub_port_init() it
  sounds reasonable that we could end up with two threads having their
  xhci device slots in default state at the same time, which according to
  xhci 4.5.3 specs still is a big no no:

  "Note: Software shall not transition more than one Device Slot to the
   Default State at a time"

  So both threads fail at their next task after this.
  One fails to read the descriptor, and the other fails addressing the
  device.

Fix this in hub_port_init by locking the USB controller (instead of an
individual bus) to prevent simultaneous initialisation of both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Link: https://lkml.org/lkml/2016/2/8/312
Link: https://lkml.org/lkml/2016/2/4/748
Signed-off-by: Chris Bainbridge 
---


Acked-by: Mathias Nyman 



[PATCH v3] usb: core: hub: hub_port_init lock controller instead of bus

2016-04-25 Thread Chris Bainbridge
The XHCI controller presents two USB buses to the system - one for USB2
and one for USB3. The hub init code (hub_port_init) is reentrant but
only locks one bus per thread, leading to a race condition failure when
two threads attempt to simultaneously initialise a USB2 and USB3 device:

[8.034843] xhci_hcd :00:14.0: Timeout while waiting for setup device 
command
[   13.183701] usb 3-3: device descriptor read/all, error -110

On a test system this failure occurred on 6% of all boots.

The call traces at the point of failure are:

Call Trace:
 [] schedule+0x37/0x90
 [] usb_kill_urb+0x8d/0xd0
 [] ? wake_up_atomic_t+0x30/0x30
 [] usb_start_wait_urb+0xbe/0x150
 [] usb_control_msg+0xbc/0xf0
 [] hub_port_init+0x51e/0xb70
 [] hub_event+0x817/0x1570
 [] process_one_work+0x1ff/0x620
 [] ? process_one_work+0x15f/0x620
 [] worker_thread+0x64/0x4b0
 [] ? rescuer_thread+0x390/0x390
 [] kthread+0x105/0x120
 [] ? kthread_create_on_node+0x200/0x200
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x200/0x200

Call Trace:
 [] xhci_setup_device+0x53d/0xa40
 [] xhci_address_device+0xe/0x10
 [] hub_port_init+0x1bf/0xb70
 [] ? trace_hardirqs_on+0xd/0x10
 [] hub_event+0x817/0x1570
 [] process_one_work+0x1ff/0x620
 [] ? process_one_work+0x15f/0x620
 [] worker_thread+0x64/0x4b0
 [] ? rescuer_thread+0x390/0x390
 [] kthread+0x105/0x120
 [] ? kthread_create_on_node+0x200/0x200
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_create_on_node+0x200/0x200

Which results from the two call chains:

hub_port_init
 usb_get_device_descriptor
  usb_get_descriptor
   usb_control_msg
usb_internal_control_msg
 usb_start_wait_urb
  usb_submit_urb / wait_for_completion_timeout / usb_kill_urb

hub_port_init
 hub_set_address
  xhci_address_device
   xhci_setup_device

Mathias Nyman explains the current behaviour violates the XHCI spec:

 hub_port_reset() will end up moving the corresponding xhci device slot
 to default state.

 As hub_port_reset() is called several times in hub_port_init() it
 sounds reasonable that we could end up with two threads having their
 xhci device slots in default state at the same time, which according to
 xhci 4.5.3 specs still is a big no no:

 "Note: Software shall not transition more than one Device Slot to the
  Default State at a time"

 So both threads fail at their next task after this.
 One fails to read the descriptor, and the other fails addressing the
 device.

Fix this in hub_port_init by locking the USB controller (instead of an
individual bus) to prevent simultaneous initialisation of both buses.

Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in 
parallel")
Link: https://lkml.org/lkml/2016/2/8/312
Link: https://lkml.org/lkml/2016/2/4/748
Signed-off-by: Chris Bainbridge 
---
This is technically the same as the last patch version, but with a reworked
commit message to better explain the cause of the issue.
---
 drivers/usb/core/hcd.c  | 15 +--
 drivers/usb/core/hub.c  |  8 
 include/linux/usb.h |  3 +--
 include/linux/usb/hcd.h |  1 +
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 2ca2cef7f681..980fc5774151 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -994,7 +994,7 @@ static void usb_bus_init (struct usb_bus *bus)
bus->bandwidth_allocated = 0;
bus->bandwidth_int_reqs  = 0;
bus->bandwidth_isoc_reqs = 0;
-   mutex_init(&bus->usb_address0_mutex);
+   mutex_init(&bus->devnum_next_mutex);
 }
 
 /*-*/
@@ -2521,6 +2521,14 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
return NULL;
}
if (primary_hcd == NULL) {
+   hcd->address0_mutex = kmalloc(sizeof(*hcd->address0_mutex),
+   GFP_KERNEL);
+   if (!hcd->address0_mutex) {
+   kfree(hcd);
+   dev_dbg(dev, "hcd address0 mutex alloc failed\n");
+   return NULL;
+   }
+   mutex_init(hcd->address0_mutex);
hcd->bandwidth_mutex = kmalloc(sizeof(*hcd->bandwidth_mutex),
GFP_KERNEL);
if (!hcd->bandwidth_mutex) {
@@ -2532,6 +2540,7 @@ struct usb_hcd *usb_create_shared_hcd(const struct 
hc_driver *driver,
dev_set_drvdata(dev, hcd);
} else {
mutex_lock(&usb_port_peer_mutex);
+   hcd->address0_mutex = primary_hcd->address0_mutex;
hcd->bandwidth_mutex = primary_hcd->bandwidth_mutex;
hcd->primary_hcd = primary_hcd;
primary_hcd->primary_hcd = primary_hcd;
@@ -2598,8 +2607,10 @@ static void hcd_release(struct kref *kref)
struct usb_hcd *hcd = container_of (kref, struct usb_hcd, kref);
 
mutex_lock(&usb_port_peer_mutex);
-   if (usb_hcd_is_primary_hcd(hc