Re: xHCI issues Reset Device Command at invalid states
On Mon, 2 Jan 2017, Felipe Balbi wrote: > Hi, > > Alan Sternwrites: > > On Mon, 2 Jan 2017, Mathias Nyman wrote: > > > >> On 30.12.2016 14:01, Felipe Balbi wrote: > >> > > >> > Hi Mathias, > >> > > >> > So the problem I found with v4.10-rc1 doesn't appear to be a > >> > regression. I can't, however, trigger it with Broadwell, only Skylake > >> > and Kabylake. > >> > > >> > According to tracepoints, our Reset Device Command sometimes completes > >> > with "Context State Error", which tells us that we're issuing Reset > >> > Device Command when we shouldn't. > >> > >> This could happen if reset device is issued twice, > >> > >> xhci reset device command only works for slots in Addressed or > >> Configured states. Reset device sets the slot to "Defult" state. > > > > Doesn't the xHCI hardware automatically assign addresses to devices? > > And send a Set-Address request after a reset has completed? > > > > And doesn't this mean the slot should automatically go from the Default > > state to the Addressed state? It should remain in the Default state > > only for a brief time, while the Set-Address request is in progress. > > My copy of xHCI spec says differently. The only thing that moves default > from Default to Address is an Address Device command with BSR set to > 0. The only situation where we don't see Default State is if we issue > Address Device with BSR set to 0 from Enabled state. However, a Reset > Device command will put the device in Default State where it should > remain until another Address Device command with BSR = 0 is issued. [After re-reading the xHCI spec to refresh my memory...] Yes, you're right. > > Unless there's something wrong with the device, and it doesn't accept > > the Set-Address request. In which case, usbcore would naturally retry > > the reset. Is that what happened here? > > Doesn't seem to be what's happening here. My Test case is merely a > series of resets (started via userland with libusb) to mimic USB CV's > 500 enumeration test (which is implemented with a series of USB Resets). > > I haven't come up with an explanation as to why the device moved from > Configured/Addressed back to Default yet. In order to answer that, I > have to finish my Slot and Endpoint context tracers which I started > writing today. If I were to guess, I'd say that we issued Enable Slot > command, then Address Device command (BSR=1), then Reset Device > command. This would cause te error above ;-) During a normal enumeration under the "new" scheme, the hub driver issues a port reset and then invokes the HCD's reset_device and enable_device callbacks (which should leave the device slot in the Default state) in order to read the device descriptor. It then does another port reset. The HCD's reset_device callback for this second reset should _always_ get a Context State Error completion code; I don't know why you don't see this all the time. Of course, if something goes wrong during a device reset then almost anything is possible. There are plenty of error pathways that skip over the address_device callback. > We should, anyway, apply the other patch I sent [1] (BTW, anybody knows > what's wrong with marc.info?? It's not indexing messages anymore), to > prevent us from even issuing the command if we're in the wrong state. > > [1] https://marc.info/?i=20170102123412.9214-1-felipe.ba...@linux.intel.com Agreed. The command's name is a little misleading; a Reset Device command does not actually reset the device. Rather, it resets the device's Slot Context. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xHCI issues Reset Device Command at invalid states
Hi, Alan Sternwrites: > On Mon, 2 Jan 2017, Mathias Nyman wrote: > >> On 30.12.2016 14:01, Felipe Balbi wrote: >> > >> > Hi Mathias, >> > >> > So the problem I found with v4.10-rc1 doesn't appear to be a >> > regression. I can't, however, trigger it with Broadwell, only Skylake >> > and Kabylake. >> > >> > According to tracepoints, our Reset Device Command sometimes completes >> > with "Context State Error", which tells us that we're issuing Reset >> > Device Command when we shouldn't. >> >> This could happen if reset device is issued twice, >> >> xhci reset device command only works for slots in Addressed or >> Configured states. Reset device sets the slot to "Defult" state. > > Doesn't the xHCI hardware automatically assign addresses to devices? > And send a Set-Address request after a reset has completed? > > And doesn't this mean the slot should automatically go from the Default > state to the Addressed state? It should remain in the Default state > only for a brief time, while the Set-Address request is in progress. My copy of xHCI spec says differently. The only thing that moves default from Default to Address is an Address Device command with BSR set to 0. The only situation where we don't see Default State is if we issue Address Device with BSR set to 0 from Enabled state. However, a Reset Device command will put the device in Default State where it should remain until another Address Device command with BSR = 0 is issued. > Unless there's something wrong with the device, and it doesn't accept > the Set-Address request. In which case, usbcore would naturally retry > the reset. Is that what happened here? Doesn't seem to be what's happening here. My Test case is merely a series of resets (started via userland with libusb) to mimic USB CV's 500 enumeration test (which is implemented with a series of USB Resets). I haven't come up with an explanation as to why the device moved from Configured/Addressed back to Default yet. In order to answer that, I have to finish my Slot and Endpoint context tracers which I started writing today. If I were to guess, I'd say that we issued Enable Slot command, then Address Device command (BSR=1), then Reset Device command. This would cause te error above ;-) We should, anyway, apply the other patch I sent [1] (BTW, anybody knows what's wrong with marc.info?? It's not indexing messages anymore), to prevent us from even issuing the command if we're in the wrong state. [1] https://marc.info/?i=20170102123412.9214-1-felipe.ba...@linux.intel.com -- balbi -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xHCI issues Reset Device Command at invalid states
On Mon, 2 Jan 2017, Mathias Nyman wrote: > On 30.12.2016 14:01, Felipe Balbi wrote: > > > > Hi Mathias, > > > > So the problem I found with v4.10-rc1 doesn't appear to be a > > regression. I can't, however, trigger it with Broadwell, only Skylake > > and Kabylake. > > > > According to tracepoints, our Reset Device Command sometimes completes > > with "Context State Error", which tells us that we're issuing Reset > > Device Command when we shouldn't. > > This could happen if reset device is issued twice, > > xhci reset device command only works for slots in Addressed or > Configured states. Reset device sets the slot to "Defult" state. Doesn't the xHCI hardware automatically assign addresses to devices? And send a Set-Address request after a reset has completed? And doesn't this mean the slot should automatically go from the Default state to the Addressed state? It should remain in the Default state only for a brief time, while the Set-Address request is in progress. Unless there's something wrong with the device, and it doesn't accept the Set-Address request. In which case, usbcore would naturally retry the reset. Is that what happened here? > If the slot is already in Default state when a reset device command > is issued xHC will return a "Context State Error" > > > > > Another thing I noticed is that we're clearing PortFeature(PortReset) > > less than 20ms after setting it. Full tracepoint data attached (slot 28 > > is the device in question. Slot 12, AFAICT, it's parent hub), but here's > > a snippet showing both problems: > > The device is connected to a external hub. I think this should be handled > by usb core. > > hub_port_reset() in usb core should have 10ms TDRST to drive resume > and 10 ms TRSTRCY for reset recovery, (looks like it has 10+40ms for recovery) > > I haven't checked, but I think it should be ok to ask the hub about the port > status > GetPortStatus() even during reset recovery, as long as we don't communicate > with the > device behind the hut that is being reset. Sure, it would be okay, but what's the reason? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xHCI issues Reset Device Command at invalid states
Hi, Mathias Nymanwrites: > On 30.12.2016 14:01, Felipe Balbi wrote: >> >> Hi Mathias, >> >> So the problem I found with v4.10-rc1 doesn't appear to be a >> regression. I can't, however, trigger it with Broadwell, only Skylake >> and Kabylake. >> >> According to tracepoints, our Reset Device Command sometimes completes >> with "Context State Error", which tells us that we're issuing Reset >> Device Command when we shouldn't. > > This could happen if reset device is issued twice, > > xhci reset device command only works for slots in Addressed or > Configured states. Reset device sets the slot to "Defult" state. > > If the slot is already in Default state when a reset device command > is issued xHC will return a "Context State Error" yeah, I also confirmed that USB Spec itself has no restriction in that regard. Meaning that Bus Reset is valid from Default too, so this is an xHCI peculiarity. Patch sent, thanks. The problem of the broken USB headset is still there, however. What I see is that a control message to the headset times out. See the snippet below: > -0 [002] d.h234.570372: xhci_handle_command: CMD: Stop > Ring Command: slot 22 sp 0 ep 1 flags C slot state configured We stop the ring. > -0 [002] d.h234.570374: xhci_dbg_cancel_urb: Removing > canceled TD starting at 0x1be118790 (dma). > -0 [002] d.h234.570377: xhci_dbg_cancel_urb: Finding > endpoint context > -0 [002] d.h234.570379: xhci_dbg_cancel_urb: Cycle state = > 0x1 > -0 [002] d.h234.570380: xhci_dbg_cancel_urb: New dequeue > segment = 8801be51d340 (virtual) > -0 [002] d.h234.570382: xhci_dbg_cancel_urb: New dequeue > pointer = 0x1be1187c0 (DMA) > -0 [002] d.h234.570384: xhci_dbg_cancel_urb: Set TR Deq > Ptr cmd, new deq seg = 8801be51d340 (0x1be118000 dma), new deq ptr = > 8801be1187c0 (0x1be1187c0 dma), new cycle = 1 > -0 [002] d.h234.570407: xhci_queue_trb: CMD: Set TR > Dequeue Pointer Command: deq 0001be1187c1 stream 0 slot 22 ep 1 flags C > slot state configured > -0 [002] dNh134.570416: xhci_urb_giveback: ep0out-control: > urb 8801bb8aa480 pipe 2147488384 slot 22 length 0/111 sgs 0/0 stream 0 > flags 0200 >device-reset-2777 [002] 34.570421: xhci_urb_enqueue: ep0out-control: >urb 8801bb8aa480 pipe 2147488384 slot 22 length 0/111 sgs 0/0 stream 0 >flags 00110200 > -0 [003] d.h234.570422: xhci_handle_event: EVENT: TRB > 0001be035940 status 'Success' len 0 slot 22 ep 0 type 'Command Completion > Event' flags e:c slot state reserved > -0 [003] d.h234.570422: xhci_handle_command: CMD: Set TR > Dequeue Pointer Command: deq 0001be1187c1 stream 0 slot 22 ep 1 flags C > slot state configured > -0 [003] d.h234.570423: xhci_dbg_cancel_urb: Successful > Set TR Deq Ptr cmd, deq = @1be1187c0 >device-reset-2777 [002] d..134.570426: xhci_queue_trb: CTRL: bRequestType >81 bRequest 06 wValue 2200 wIndex 0003 wLength 111 length 8 TD size 0 intr 0 >type 'Setup Stage' flags b:I:i:c:s:i:e:c slot state configured >device-reset-2777 [002] d..134.570426: xhci_queue_trb: CTRL: Buffer >0001bd199280 length 111 TD size 0 intr 0 type 'Data Stage' flags >b:i:i:c:s:I:e:C slot state configured >device-reset-2777 [002] d..134.570426: xhci_queue_trb: CTRL: Buffer > length 0 TD size 0 intr 0 type 'Status Stage' flags >b:i:I:c:s:i:e:C slot state configured we queue several TRBs. After queueing them, we unconditionally call giveback_first_trb() which unconditionally calls xhci_ring_ep_doorbell(). So the ring is definitely restarted. > -0 [002] d.h234.570514: xhci_handle_event: EVENT: TRB > 0001be1187c0 status 'Stall Error' len 8 slot 22 ep 1 type 'Transfer > Event' flags e:c slot state reserved we get Stall Error... and continue getting Stall errors until our control transfer, supposedly, times out > -0 [002] d.h234.570515: xhci_handle_transfer: CTRL: > bRequestType 81 bRequest 06 wValue 2200 wIndex 0003 wLength 111 length 8 TD > size 0 intr 0 type 'Setup Stage' flags b:I:i:c:s:i:e:C slot state configured > -0 [002] d.h234.570516: xhci_queue_trb: CMD: Reset > Endpoint Command: ctx slot 22 ep 1 flags C slot state > configured > -0 [002] d.h234.570516: xhci_dbg_reset_ep: Cleaning up > stalled endpoint ring > -0 [002] d.h234.570517: xhci_dbg_cancel_urb: Finding > endpoint context > -0 [002] d.h234.570517: xhci_dbg_cancel_urb: Cycle state = > 0x1 > -0 [002] d.h234.570517: xhci_dbg_cancel_urb: New dequeue > segment = 8801be51d340 (virtual) > -0 [002] d.h234.570518: xhci_dbg_cancel_urb: New dequeue > pointer = 0x1be1187f0 (DMA) > -0 [002] d.h234.570518: xhci_dbg_reset_ep: Queueing new > dequeue state > -0 [002] d.h2
Re: xHCI issues Reset Device Command at invalid states
On 02.01.2017 14:13, Felipe Balbi wrote: Hi, Felipe Balbiwrites: Hi Mathias, So the problem I found with v4.10-rc1 doesn't appear to be a regression. I can't, however, trigger it with Broadwell, only Skylake and Kabylake. According to tracepoints, our Reset Device Command sometimes completes with "Context State Error", which tells us that we're issuing Reset Device Command when we shouldn't. Another thing I noticed is that we're clearing PortFeature(PortReset) less than 20ms after setting it. Full tracepoint data attached (slot 28 actually, this is not a problem. Reset is supposed to be 10ms minimum. device-reset-2819 [002] d..1 154.868782: xhci_queue_trb: CMD: Reset Device Command: slot 28 flags C -0 [003] d.h2 154.868832: xhci_handle_event: EVENT: TRB 0001be2b2e60 status 'Context State Error' len 0 slot 28 ep 0 type 'Command Completion Event' flags e:C And here's xHCI telling us that slot 28's Context was is invalid status. I traced this down to Slot being in Default state. According to Figure 10 Slot State Diagram, Reset Device is only allow from Configured or Addressed. Any other states, Reset Device is invalid and shouldn't be issued. Now I'm wondering whether we should hide this fact from USB Core, or do we have a bigger problem with USB Core itself. when core calls hcd->driver->reset_device() xhci will always submit a reset device command without checking the slot state, even if it will return context state error when slot is in a default state. As we just checked it looks like USB2 specs, fig 9.1 allows resetting the device is default state, so this limitation is xhci specific. -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xHCI issues Reset Device Command at invalid states
On 30.12.2016 14:01, Felipe Balbi wrote: Hi Mathias, So the problem I found with v4.10-rc1 doesn't appear to be a regression. I can't, however, trigger it with Broadwell, only Skylake and Kabylake. According to tracepoints, our Reset Device Command sometimes completes with "Context State Error", which tells us that we're issuing Reset Device Command when we shouldn't. This could happen if reset device is issued twice, xhci reset device command only works for slots in Addressed or Configured states. Reset device sets the slot to "Defult" state. If the slot is already in Default state when a reset device command is issued xHC will return a "Context State Error" Another thing I noticed is that we're clearing PortFeature(PortReset) less than 20ms after setting it. Full tracepoint data attached (slot 28 is the device in question. Slot 12, AFAICT, it's parent hub), but here's a snippet showing both problems: The device is connected to a external hub. I think this should be handled by usb core. hub_port_reset() in usb core should have 10ms TDRST to drive resume and 10 ms TRSTRCY for reset recovery, (looks like it has 10+40ms for recovery) I haven't checked, but I think it should be ok to ask the hub about the port status GetPortStatus() even during reset recovery, as long as we don't communicate with the device behind the hut that is being reset. -Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xHCI issues Reset Device Command at invalid states
Hi, Felipe Balbiwrites: > Hi Mathias, > > So the problem I found with v4.10-rc1 doesn't appear to be a > regression. I can't, however, trigger it with Broadwell, only Skylake > and Kabylake. > > According to tracepoints, our Reset Device Command sometimes completes > with "Context State Error", which tells us that we're issuing Reset > Device Command when we shouldn't. > > Another thing I noticed is that we're clearing PortFeature(PortReset) > less than 20ms after setting it. Full tracepoint data attached (slot 28 actually, this is not a problem. Reset is supposed to be 10ms minimum. >> device-reset-2819 [002] d..1 154.868782: xhci_queue_trb: CMD: Reset >> Device Command: slot 28 flags C >> -0 [003] d.h2 154.868832: xhci_handle_event: EVENT: TRB >> 0001be2b2e60 status 'Context State Error' len 0 slot 28 ep 0 type >> 'Command Completion Event' flags e:C > > And here's xHCI telling us that slot 28's Context was is invalid status. I traced this down to Slot being in Default state. According to Figure 10 Slot State Diagram, Reset Device is only allow from Configured or Addressed. Any other states, Reset Device is invalid and shouldn't be issued. Now I'm wondering whether we should hide this fact from USB Core, or do we have a bigger problem with USB Core itself. Alan, Mathias; any ideas? -- balbi signature.asc Description: PGP signature