Re: xHCI issues Reset Device Command at invalid states

2017-01-03 Thread Alan Stern
On Mon, 2 Jan 2017, Felipe Balbi wrote:

> Hi,
> 
> Alan Stern  writes:
> > On Mon, 2 Jan 2017, Mathias Nyman wrote:
> >
> >> On 30.12.2016 14:01, Felipe Balbi wrote:
> >> >
> >> > Hi Mathias,
> >> >
> >> > So the problem I found with v4.10-rc1 doesn't appear to be a
> >> > regression. I can't, however, trigger it with Broadwell, only Skylake
> >> > and Kabylake.
> >> >
> >> > According to tracepoints, our Reset Device Command sometimes completes
> >> > with "Context State Error", which tells us that we're issuing Reset
> >> > Device Command when we shouldn't.
> >> 
> >> This could happen if reset device is issued twice,
> >> 
> >> xhci reset device command only works for slots in Addressed or
> >> Configured states. Reset device sets the slot to "Defult" state.
> >
> > Doesn't the xHCI hardware automatically assign addresses to devices?  
> > And send a Set-Address request after a reset has completed?
> >
> > And doesn't this mean the slot should automatically go from the Default 
> > state to the Addressed state?  It should remain in the Default state 
> > only for a brief time, while the Set-Address request is in progress.
> 
> My copy of xHCI spec says differently. The only thing that moves default
> from Default to Address is an Address Device command with BSR set to
> 0. The only situation where we don't see Default State is if we issue
> Address Device with BSR set to 0 from Enabled state. However, a Reset
> Device command will put the device in  Default State where it should
> remain until another Address Device command with BSR = 0 is issued.

[After re-reading the xHCI spec to refresh my memory...]

Yes, you're right.

> > Unless there's something wrong with the device, and it doesn't accept 
> > the Set-Address request.  In which case, usbcore would naturally retry 
> > the reset.  Is that what happened here?
> 
> Doesn't seem to be what's happening here. My Test case is merely a
> series of resets (started via userland with libusb) to mimic USB CV's
> 500 enumeration test (which is implemented with a series of USB Resets).
> 
> I haven't come up with an explanation as to why the device moved from
> Configured/Addressed back to Default yet. In order to answer that, I
> have to finish my Slot and Endpoint context tracers which I started
> writing today. If I were to guess, I'd say that we issued Enable Slot
> command, then Address Device command (BSR=1), then Reset Device
> command. This would cause te error above ;-)

During a normal enumeration under the "new" scheme, the hub driver
issues a port reset and then invokes the HCD's reset_device and
enable_device callbacks (which should leave the device slot in the
Default state) in order to read the device descriptor.  It then does
another port reset.  The HCD's reset_device callback for this second
reset should _always_ get a Context State Error completion code; I
don't know why you don't see this all the time.

Of course, if something goes wrong during a device reset then almost
anything is possible.  There are plenty of error pathways that skip
over the address_device callback.

> We should, anyway, apply the other patch I sent [1] (BTW, anybody knows
> what's wrong with marc.info?? It's not indexing messages anymore), to
> prevent us from even issuing the command if we're in the wrong state.
> 
> [1] https://marc.info/?i=20170102123412.9214-1-felipe.ba...@linux.intel.com

Agreed.  The command's name is a little misleading; a Reset Device 
command does not actually reset the device.  Rather, it resets the 
device's Slot Context.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xHCI issues Reset Device Command at invalid states

2017-01-02 Thread Felipe Balbi

Hi,

Alan Stern  writes:
> On Mon, 2 Jan 2017, Mathias Nyman wrote:
>
>> On 30.12.2016 14:01, Felipe Balbi wrote:
>> >
>> > Hi Mathias,
>> >
>> > So the problem I found with v4.10-rc1 doesn't appear to be a
>> > regression. I can't, however, trigger it with Broadwell, only Skylake
>> > and Kabylake.
>> >
>> > According to tracepoints, our Reset Device Command sometimes completes
>> > with "Context State Error", which tells us that we're issuing Reset
>> > Device Command when we shouldn't.
>> 
>> This could happen if reset device is issued twice,
>> 
>> xhci reset device command only works for slots in Addressed or
>> Configured states. Reset device sets the slot to "Defult" state.
>
> Doesn't the xHCI hardware automatically assign addresses to devices?  
> And send a Set-Address request after a reset has completed?
>
> And doesn't this mean the slot should automatically go from the Default 
> state to the Addressed state?  It should remain in the Default state 
> only for a brief time, while the Set-Address request is in progress.

My copy of xHCI spec says differently. The only thing that moves default
from Default to Address is an Address Device command with BSR set to
0. The only situation where we don't see Default State is if we issue
Address Device with BSR set to 0 from Enabled state. However, a Reset
Device command will put the device in  Default State where it should
remain until another Address Device command with BSR = 0 is issued.



> Unless there's something wrong with the device, and it doesn't accept 
> the Set-Address request.  In which case, usbcore would naturally retry 
> the reset.  Is that what happened here?

Doesn't seem to be what's happening here. My Test case is merely a
series of resets (started via userland with libusb) to mimic USB CV's
500 enumeration test (which is implemented with a series of USB Resets).

I haven't come up with an explanation as to why the device moved from
Configured/Addressed back to Default yet. In order to answer that, I
have to finish my Slot and Endpoint context tracers which I started
writing today. If I were to guess, I'd say that we issued Enable Slot
command, then Address Device command (BSR=1), then Reset Device
command. This would cause te error above ;-)

We should, anyway, apply the other patch I sent [1] (BTW, anybody knows
what's wrong with marc.info?? It's not indexing messages anymore), to
prevent us from even issuing the command if we're in the wrong state.

[1] https://marc.info/?i=20170102123412.9214-1-felipe.ba...@linux.intel.com

-- 
balbi
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xHCI issues Reset Device Command at invalid states

2017-01-02 Thread Alan Stern
On Mon, 2 Jan 2017, Mathias Nyman wrote:

> On 30.12.2016 14:01, Felipe Balbi wrote:
> >
> > Hi Mathias,
> >
> > So the problem I found with v4.10-rc1 doesn't appear to be a
> > regression. I can't, however, trigger it with Broadwell, only Skylake
> > and Kabylake.
> >
> > According to tracepoints, our Reset Device Command sometimes completes
> > with "Context State Error", which tells us that we're issuing Reset
> > Device Command when we shouldn't.
> 
> This could happen if reset device is issued twice,
> 
> xhci reset device command only works for slots in Addressed or
> Configured states. Reset device sets the slot to "Defult" state.

Doesn't the xHCI hardware automatically assign addresses to devices?  
And send a Set-Address request after a reset has completed?

And doesn't this mean the slot should automatically go from the Default 
state to the Addressed state?  It should remain in the Default state 
only for a brief time, while the Set-Address request is in progress.

Unless there's something wrong with the device, and it doesn't accept 
the Set-Address request.  In which case, usbcore would naturally retry 
the reset.  Is that what happened here?

> If the slot is already in Default state when a reset device command
> is issued xHC will return a  "Context State Error"
> 
> >
> > Another thing I noticed is that we're clearing PortFeature(PortReset)
> > less than 20ms after setting it. Full tracepoint data attached (slot 28
> > is the device in question. Slot 12, AFAICT, it's parent hub), but here's
> > a snippet showing both problems:
> 
> The device is connected to a external hub. I think this should be handled
> by usb core.
> 
> hub_port_reset() in usb core should have 10ms TDRST to drive resume
> and 10 ms TRSTRCY for reset recovery, (looks like it has 10+40ms for recovery)
> 
> I haven't checked, but I think it should be ok to ask the hub about the port 
> status
> GetPortStatus() even during reset recovery, as long as we don't communicate 
> with the
> device behind the hut that is being reset.

Sure, it would be okay, but what's the reason?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xHCI issues Reset Device Command at invalid states

2017-01-02 Thread Felipe Balbi

Hi,

Mathias Nyman  writes:
> On 30.12.2016 14:01, Felipe Balbi wrote:
>>
>> Hi Mathias,
>>
>> So the problem I found with v4.10-rc1 doesn't appear to be a
>> regression. I can't, however, trigger it with Broadwell, only Skylake
>> and Kabylake.
>>
>> According to tracepoints, our Reset Device Command sometimes completes
>> with "Context State Error", which tells us that we're issuing Reset
>> Device Command when we shouldn't.
>
> This could happen if reset device is issued twice,
>
> xhci reset device command only works for slots in Addressed or
> Configured states. Reset device sets the slot to "Defult" state.
>
> If the slot is already in Default state when a reset device command
> is issued xHC will return a  "Context State Error"

yeah, I also confirmed that USB Spec itself has no restriction in that
regard. Meaning that Bus Reset is valid from Default too, so this is an
xHCI peculiarity.

Patch sent, thanks.

The problem of the broken USB headset is still there, however. What I
see is that a control message to the headset times out. See the snippet
below:

>  -0 [002] d.h234.570372: xhci_handle_command: CMD: Stop 
> Ring Command: slot 22 sp 0 ep 1 flags C slot state configured

We stop the ring.

>  -0 [002] d.h234.570374: xhci_dbg_cancel_urb: Removing 
> canceled TD starting at 0x1be118790 (dma).
>  -0 [002] d.h234.570377: xhci_dbg_cancel_urb: Finding 
> endpoint context
>  -0 [002] d.h234.570379: xhci_dbg_cancel_urb: Cycle state = 
> 0x1
>  -0 [002] d.h234.570380: xhci_dbg_cancel_urb: New dequeue 
> segment = 8801be51d340 (virtual)
>  -0 [002] d.h234.570382: xhci_dbg_cancel_urb: New dequeue 
> pointer = 0x1be1187c0 (DMA)
>  -0 [002] d.h234.570384: xhci_dbg_cancel_urb: Set TR Deq 
> Ptr cmd, new deq seg = 8801be51d340 (0x1be118000 dma), new deq ptr = 
> 8801be1187c0 (0x1be1187c0 dma), new cycle = 1
>  -0 [002] d.h234.570407: xhci_queue_trb: CMD: Set TR 
> Dequeue Pointer Command: deq 0001be1187c1 stream 0 slot 22 ep 1 flags C 
> slot state configured
>  -0 [002] dNh134.570416: xhci_urb_giveback: ep0out-control: 
> urb 8801bb8aa480 pipe 2147488384 slot 22 length 0/111 sgs 0/0 stream 0 
> flags 0200
>device-reset-2777  [002] 34.570421: xhci_urb_enqueue: ep0out-control: 
>urb 8801bb8aa480 pipe 2147488384 slot 22 length 0/111 sgs 0/0 stream 0 
>flags 00110200
>  -0 [003] d.h234.570422: xhci_handle_event: EVENT: TRB 
> 0001be035940 status 'Success' len 0 slot 22 ep 0 type 'Command Completion 
> Event' flags e:c slot state reserved
>  -0 [003] d.h234.570422: xhci_handle_command: CMD: Set TR 
> Dequeue Pointer Command: deq 0001be1187c1 stream 0 slot 22 ep 1 flags C 
> slot state configured
>  -0 [003] d.h234.570423: xhci_dbg_cancel_urb: Successful 
> Set TR Deq Ptr cmd, deq = @1be1187c0
>device-reset-2777  [002] d..134.570426: xhci_queue_trb: CTRL: bRequestType 
>81 bRequest 06 wValue 2200 wIndex 0003 wLength 111 length 8 TD size 0 intr 0 
>type 'Setup Stage' flags b:I:i:c:s:i:e:c slot state configured
>device-reset-2777  [002] d..134.570426: xhci_queue_trb: CTRL: Buffer 
>0001bd199280 length 111 TD size 0 intr 0 type 'Data Stage' flags 
>b:i:i:c:s:I:e:C slot state configured
>device-reset-2777  [002] d..134.570426: xhci_queue_trb: CTRL: Buffer 
> length 0 TD size 0 intr 0 type 'Status Stage' flags 
>b:i:I:c:s:i:e:C slot state configured

we queue several TRBs. After queueing them, we unconditionally call
giveback_first_trb() which unconditionally calls
xhci_ring_ep_doorbell(). So the ring is definitely restarted.

>  -0 [002] d.h234.570514: xhci_handle_event: EVENT: TRB 
> 0001be1187c0 status 'Stall Error' len 8 slot 22 ep 1 type 'Transfer 
> Event' flags e:c slot state reserved

we get Stall Error... and continue getting Stall errors until our
control transfer, supposedly, times out 

>  -0 [002] d.h234.570515: xhci_handle_transfer: CTRL: 
> bRequestType 81 bRequest 06 wValue 2200 wIndex 0003 wLength 111 length 8 TD 
> size 0 intr 0 type 'Setup Stage' flags b:I:i:c:s:i:e:C slot state configured
>  -0 [002] d.h234.570516: xhci_queue_trb: CMD: Reset 
> Endpoint Command: ctx  slot 22 ep 1 flags C slot state 
> configured
>  -0 [002] d.h234.570516: xhci_dbg_reset_ep: Cleaning up 
> stalled endpoint ring
>  -0 [002] d.h234.570517: xhci_dbg_cancel_urb: Finding 
> endpoint context
>  -0 [002] d.h234.570517: xhci_dbg_cancel_urb: Cycle state = 
> 0x1
>  -0 [002] d.h234.570517: xhci_dbg_cancel_urb: New dequeue 
> segment = 8801be51d340 (virtual)
>  -0 [002] d.h234.570518: xhci_dbg_cancel_urb: New dequeue 
> pointer = 0x1be1187f0 (DMA)
>  -0 [002] d.h234.570518: xhci_dbg_reset_ep: Queueing new 
> dequeue state
>  -0 [002] d.h2

Re: xHCI issues Reset Device Command at invalid states

2017-01-02 Thread Mathias Nyman

On 02.01.2017 14:13, Felipe Balbi wrote:


Hi,

Felipe Balbi  writes:

Hi Mathias,

So the problem I found with v4.10-rc1 doesn't appear to be a
regression. I can't, however, trigger it with Broadwell, only Skylake
and Kabylake.

According to tracepoints, our Reset Device Command sometimes completes
with "Context State Error", which tells us that we're issuing Reset
Device Command when we shouldn't.

Another thing I noticed is that we're clearing PortFeature(PortReset)
less than 20ms after setting it. Full tracepoint data attached (slot 28


actually, this is not a problem. Reset is supposed to be 10ms minimum.


device-reset-2819  [002] d..1   154.868782: xhci_queue_trb: CMD: Reset Device 
Command: slot 28 flags C
   -0 [003] d.h2   154.868832: xhci_handle_event: EVENT: TRB 
0001be2b2e60 status 'Context State Error' len 0 slot 28 ep 0 type 'Command 
Completion Event' flags e:C


And here's xHCI telling us that slot 28's Context was is invalid status.


I traced this down to Slot being in Default state. According to Figure
10 Slot State Diagram, Reset Device is only allow from Configured or
Addressed. Any other states, Reset Device is invalid and shouldn't be
issued.

Now I'm wondering whether we should hide this fact from USB Core, or do
we have a bigger problem with USB Core itself.


when core calls hcd->driver->reset_device() xhci
will always submit a reset device command  without checking the slot state,
even if it will return context state error when slot is in a default state.

As we just checked it looks like USB2 specs, fig 9.1 allows resetting
the device is default state, so this limitation is xhci specific.

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xHCI issues Reset Device Command at invalid states

2017-01-02 Thread Mathias Nyman

On 30.12.2016 14:01, Felipe Balbi wrote:


Hi Mathias,

So the problem I found with v4.10-rc1 doesn't appear to be a
regression. I can't, however, trigger it with Broadwell, only Skylake
and Kabylake.

According to tracepoints, our Reset Device Command sometimes completes
with "Context State Error", which tells us that we're issuing Reset
Device Command when we shouldn't.


This could happen if reset device is issued twice,

xhci reset device command only works for slots in Addressed or
Configured states. Reset device sets the slot to "Defult" state.

If the slot is already in Default state when a reset device command
is issued xHC will return a  "Context State Error"



Another thing I noticed is that we're clearing PortFeature(PortReset)
less than 20ms after setting it. Full tracepoint data attached (slot 28
is the device in question. Slot 12, AFAICT, it's parent hub), but here's
a snippet showing both problems:


The device is connected to a external hub. I think this should be handled
by usb core.

hub_port_reset() in usb core should have 10ms TDRST to drive resume
and 10 ms TRSTRCY for reset recovery, (looks like it has 10+40ms for recovery)

I haven't checked, but I think it should be ok to ask the hub about the port 
status
GetPortStatus() even during reset recovery, as long as we don't communicate 
with the
device behind the hut that is being reset.

-Mathias

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: xHCI issues Reset Device Command at invalid states

2017-01-02 Thread Felipe Balbi

Hi,

Felipe Balbi  writes:
> Hi Mathias,
>
> So the problem I found with v4.10-rc1 doesn't appear to be a
> regression. I can't, however, trigger it with Broadwell, only Skylake
> and Kabylake.
>
> According to tracepoints, our Reset Device Command sometimes completes
> with "Context State Error", which tells us that we're issuing Reset
> Device Command when we shouldn't.
>
> Another thing I noticed is that we're clearing PortFeature(PortReset)
> less than 20ms after setting it. Full tracepoint data attached (slot 28

actually, this is not a problem. Reset is supposed to be 10ms minimum.

>> device-reset-2819  [002] d..1   154.868782: xhci_queue_trb: CMD: Reset 
>> Device Command: slot 28 flags C
>>   -0 [003] d.h2   154.868832: xhci_handle_event: EVENT: TRB 
>> 0001be2b2e60 status 'Context State Error' len 0 slot 28 ep 0 type 
>> 'Command Completion Event' flags e:C
>
> And here's xHCI telling us that slot 28's Context was is invalid status.

I traced this down to Slot being in Default state. According to Figure
10 Slot State Diagram, Reset Device is only allow from Configured or
Addressed. Any other states, Reset Device is invalid and shouldn't be
issued.

Now I'm wondering whether we should hide this fact from USB Core, or do
we have a bigger problem with USB Core itself.

Alan, Mathias; any ideas?

-- 
balbi


signature.asc
Description: PGP signature