Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-12-01 Thread Baolin Wang
On 2 December 2016 at 09:17, Lu Baolu  wrote:
> Hi,
>
> On 12/01/2016 04:03 PM, Baolin Wang wrote:
>> On 1 December 2016 at 15:44, Lu Baolu  wrote:
>>> Hi,
>>>
>>> On 12/01/2016 03:35 PM, Baolin Wang wrote:
 On 1 December 2016 at 14:35, Lu Baolu  wrote:
> Hi,
>
> On 12/01/2016 02:04 PM, Baolin Wang wrote:
>> Hi Baolu,
>>
>> On 1 December 2016 at 13:45, Lu Baolu  wrote:
>>> Hi,
>>>
>>> On 11/30/2016 05:02 PM, Baolin Wang wrote:
 If the hardware never responds to the stop endpoint command, the
 URBs will never be completed, and we might hang the USB subsystem.
 The original watchdog timer is used to watch if one stop endpoint
 command is timeout, if timeout, then the watchdog timer will set
 XHCI_STATE_DYING, try to halt the xHCI host, and give back all
 pending URBs.

 But now we already have one command timer to control command timeout,
 thus we can also use the command timer to watch the stop endpoint
 command, instead of one duplicate watchdog timer which need to be
 removed.

 Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
 this is the last stop endpoint command of one endpoint. Since we
 can make sure we only set one stop endpoint command for one endpoint
 by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
 this flag.
>>> I am afraid you can't do this. "stop_cmds_pending" was added
>>> to fix the problem described in the comments that you want to
>>> remove. But I didn't find any fix of this problem in your patch.
>> Now we can not pending another stop endpoint command for the same one
>> endpoint, since will check 'EP_HALT_PENDING' flag in
>> xhci_urb_dequeue() function to avoid this. But after some
>> investigation, I think I missed the stop endpoint command in
>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
>> maybe need to add 'EP_HALT_PENDING' flag checking in
>> xhci_stop_device() function. DId I miss something else? Thanks.
> Consider below three threads running on different CPUs at the same time.
>
> Thread A: xhci_handle_cmd_stop_ep()  --- in interrupt handler
> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
> Thread C: xhci_urb_dequeue --- called by usb core
>
> They are serialized by xhci->lock. Let's consider below sequence:
>
> Thread A:
> - delete xhci->cmd_timer), but will fail due to Thread B.
> - clear EP_HALT_PENDING bit
>
> Thread B:
> - halt the host controller
>
> Thread C:
> - set EP_HALT_PENDING bit
> - enqueue another stop endpoint command
> - add the timer back
 Ah, thanks for you comments.
 If thread B halted the host, then the thread C xhci_urb_dequeue() will
 check host state failed, then just return and can not set another stop
 endpoint command.
>>> Not so simple. Thread B will release xhci->lock before xhci_halt().
>> Yes.
>>
 But from your example, I think your meaning is we
 should not halt the host by thread B, since we have handled the stop
 endpoint command event.

 So I think we need to check the xhci command of this stop endpoint
 command in xhci_stop_endpoint_command_timeout(), if the xhci command
 of this stop endpoint command is not in the command list (which means
 the stop endpoint command event has been handled), then just return
 and do not halt the controller. Right?

>>> I'd like suggest you to put this change into a separated patch.
>>> It's actually a different topic from the main purpose of this patch.
>> OK, I will. Thanks for your comments.
>>
>
> If you are going to work out a v2 patch, please consider whether
> we can totally leverage the common command mechanism to
> handle this stop endpoint command.
>
> Currently, when a stop endpoint command is issued for urb unlink,
> there are two timers for this command. This is duplicated and we
> should remove the stop-endpoint-only timer. The timeout functions
> are also different. The stop-endpoint-only timer just halt the host
> controller (this should be the last sort of way), while the common
> command timer will try to recover the situation by aborting and
> restart the command ring.

Yes, thanks for your reminding and I will think about that.

-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-12-01 Thread Baolin Wang
On 1 December 2016 at 21:28, Mathias Nyman
 wrote:
> On 01.12.2016 06:54, Baolin Wang wrote:
>>
>> On 30 November 2016 at 22:09, Mathias Nyman
>>  wrote:
>>>
>>> On 30.11.2016 11:02, Baolin Wang wrote:


 If the hardware never responds to the stop endpoint command, the
 URBs will never be completed, and we might hang the USB subsystem.
 The original watchdog timer is used to watch if one stop endpoint
 command is timeout, if timeout, then the watchdog timer will set
 XHCI_STATE_DYING, try to halt the xHCI host, and give back all
 pending URBs.

 But now we already have one command timer to control command timeout,
 thus we can also use the command timer to watch the stop endpoint
 command, instead of one duplicate watchdog timer which need to be
 removed.

 Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
 this is the last stop endpoint command of one endpoint. Since we
 can make sure we only set one stop endpoint command for one endpoint
 by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
 this flag.

 We also need to clean up the command queue before trying to halt the
 xHCI host in xhci_stop_endpoint_command_timeout() function.
>>>
>>>
>>>
>>> This isn't a bad idea.
>>>
>>> There are anyway some corner cases and details that need to be
>>> checked, such as suspend (which will clear the command queue), module
>>> unload
>>> and abrupt host removal (like pci hotplug removal of host controller)
>>> we need to make sure we can trust the command timer to always return the
>>> canceled URB
>>
>>
>> Yes, you are right, we need to check these carefully.
>>
>> Suspend process, module unload and abrupt host removal, they all will
>> issue usb_disconnect() firstly before clear the command queue, it will
>> check URBs for every endpoint by
>> usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(),
>> which will make sure every URBs of endpoints will be cancelled by the
>> stop endpoint command responding or the timeout function of stop
>> endpoint command (xhci_stop_endpoint_command_timeout()) in
>> usb_hcd_flush_endpoint(). From that point, we can make sure the
>> command timer will be useful to handle stop endpoint command timeout.
>> Please correct me if I said something wrong. Thanks.
>>
>
> This relies on current queued command that times out to be the stop endpoint
> command.
>
> If host partially, or completely hangs there might be any number of commands
> in the
> command queue, and we would need to wait for each one of them to timeout,
> finish
> before we finally get to the stop endpoint command, and give back the urb.
>
> I think it would be better to first fix the issues with the current watchdog
> function, get
> those fixes into stable, and then think about moving to the command queue
> timer.

OK, make sense. Thanks.

>
> In short, this patch doesn't currently fix any existing issue, but might
> cause the
> timeout to be more unreliable
>
> -Mathias



-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-12-01 Thread Lu Baolu
Hi,

On 12/01/2016 04:03 PM, Baolin Wang wrote:
> On 1 December 2016 at 15:44, Lu Baolu  wrote:
>> Hi,
>>
>> On 12/01/2016 03:35 PM, Baolin Wang wrote:
>>> On 1 December 2016 at 14:35, Lu Baolu  wrote:
 Hi,

 On 12/01/2016 02:04 PM, Baolin Wang wrote:
> Hi Baolu,
>
> On 1 December 2016 at 13:45, Lu Baolu  wrote:
>> Hi,
>>
>> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>>> If the hardware never responds to the stop endpoint command, the
>>> URBs will never be completed, and we might hang the USB subsystem.
>>> The original watchdog timer is used to watch if one stop endpoint
>>> command is timeout, if timeout, then the watchdog timer will set
>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>>> pending URBs.
>>>
>>> But now we already have one command timer to control command timeout,
>>> thus we can also use the command timer to watch the stop endpoint
>>> command, instead of one duplicate watchdog timer which need to be
>>> removed.
>>>
>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>>> this is the last stop endpoint command of one endpoint. Since we
>>> can make sure we only set one stop endpoint command for one endpoint
>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>>> this flag.
>> I am afraid you can't do this. "stop_cmds_pending" was added
>> to fix the problem described in the comments that you want to
>> remove. But I didn't find any fix of this problem in your patch.
> Now we can not pending another stop endpoint command for the same one
> endpoint, since will check 'EP_HALT_PENDING' flag in
> xhci_urb_dequeue() function to avoid this. But after some
> investigation, I think I missed the stop endpoint command in
> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
> maybe need to add 'EP_HALT_PENDING' flag checking in
> xhci_stop_device() function. DId I miss something else? Thanks.
 Consider below three threads running on different CPUs at the same time.

 Thread A: xhci_handle_cmd_stop_ep()  --- in interrupt handler
 Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
 Thread C: xhci_urb_dequeue --- called by usb core

 They are serialized by xhci->lock. Let's consider below sequence:

 Thread A:
 - delete xhci->cmd_timer), but will fail due to Thread B.
 - clear EP_HALT_PENDING bit

 Thread B:
 - halt the host controller

 Thread C:
 - set EP_HALT_PENDING bit
 - enqueue another stop endpoint command
 - add the timer back
>>> Ah, thanks for you comments.
>>> If thread B halted the host, then the thread C xhci_urb_dequeue() will
>>> check host state failed, then just return and can not set another stop
>>> endpoint command.
>> Not so simple. Thread B will release xhci->lock before xhci_halt().
> Yes.
>
>>> But from your example, I think your meaning is we
>>> should not halt the host by thread B, since we have handled the stop
>>> endpoint command event.
>>>
>>> So I think we need to check the xhci command of this stop endpoint
>>> command in xhci_stop_endpoint_command_timeout(), if the xhci command
>>> of this stop endpoint command is not in the command list (which means
>>> the stop endpoint command event has been handled), then just return
>>> and do not halt the controller. Right?
>>>
>> I'd like suggest you to put this change into a separated patch.
>> It's actually a different topic from the main purpose of this patch.
> OK, I will. Thanks for your comments.
>

If you are going to work out a v2 patch, please consider whether
we can totally leverage the common command mechanism to
handle this stop endpoint command.

Currently, when a stop endpoint command is issued for urb unlink,
there are two timers for this command. This is duplicated and we
should remove the stop-endpoint-only timer. The timeout functions
are also different. The stop-endpoint-only timer just halt the host
controller (this should be the last sort of way), while the common
command timer will try to recover the situation by aborting and
restart the command ring.

Best regards,
Lu Baolu


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-12-01 Thread Mathias Nyman

On 01.12.2016 06:54, Baolin Wang wrote:

On 30 November 2016 at 22:09, Mathias Nyman
 wrote:

On 30.11.2016 11:02, Baolin Wang wrote:


If the hardware never responds to the stop endpoint command, the
URBs will never be completed, and we might hang the USB subsystem.
The original watchdog timer is used to watch if one stop endpoint
command is timeout, if timeout, then the watchdog timer will set
XHCI_STATE_DYING, try to halt the xHCI host, and give back all
pending URBs.

But now we already have one command timer to control command timeout,
thus we can also use the command timer to watch the stop endpoint
command, instead of one duplicate watchdog timer which need to be
removed.

Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
this is the last stop endpoint command of one endpoint. Since we
can make sure we only set one stop endpoint command for one endpoint
by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
this flag.

We also need to clean up the command queue before trying to halt the
xHCI host in xhci_stop_endpoint_command_timeout() function.



This isn't a bad idea.

There are anyway some corner cases and details that need to be
checked, such as suspend (which will clear the command queue), module unload
and abrupt host removal (like pci hotplug removal of host controller)
we need to make sure we can trust the command timer to always return the
canceled URB


Yes, you are right, we need to check these carefully.

Suspend process, module unload and abrupt host removal, they all will
issue usb_disconnect() firstly before clear the command queue, it will
check URBs for every endpoint by
usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(),
which will make sure every URBs of endpoints will be cancelled by the
stop endpoint command responding or the timeout function of stop
endpoint command (xhci_stop_endpoint_command_timeout()) in
usb_hcd_flush_endpoint(). From that point, we can make sure the
command timer will be useful to handle stop endpoint command timeout.
Please correct me if I said something wrong. Thanks.



This relies on current queued command that times out to be the stop endpoint 
command.

If host partially, or completely hangs there might be any number of commands in 
the
command queue, and we would need to wait for each one of them to timeout, finish
before we finally get to the stop endpoint command, and give back the urb.

I think it would be better to first fix the issues with the current watchdog 
function, get
those fixes into stable, and then think about moving to the command queue timer.

In short, this patch doesn't currently fix any existing issue, but might cause 
the
timeout to be more unreliable

-Mathias   
 


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-12-01 Thread Baolin Wang
On 1 December 2016 at 15:44, Lu Baolu  wrote:
> Hi,
>
> On 12/01/2016 03:35 PM, Baolin Wang wrote:
>> On 1 December 2016 at 14:35, Lu Baolu  wrote:
>>> Hi,
>>>
>>> On 12/01/2016 02:04 PM, Baolin Wang wrote:
 Hi Baolu,

 On 1 December 2016 at 13:45, Lu Baolu  wrote:
> Hi,
>
> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>> If the hardware never responds to the stop endpoint command, the
>> URBs will never be completed, and we might hang the USB subsystem.
>> The original watchdog timer is used to watch if one stop endpoint
>> command is timeout, if timeout, then the watchdog timer will set
>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>> pending URBs.
>>
>> But now we already have one command timer to control command timeout,
>> thus we can also use the command timer to watch the stop endpoint
>> command, instead of one duplicate watchdog timer which need to be
>> removed.
>>
>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>> this is the last stop endpoint command of one endpoint. Since we
>> can make sure we only set one stop endpoint command for one endpoint
>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>> this flag.
> I am afraid you can't do this. "stop_cmds_pending" was added
> to fix the problem described in the comments that you want to
> remove. But I didn't find any fix of this problem in your patch.
 Now we can not pending another stop endpoint command for the same one
 endpoint, since will check 'EP_HALT_PENDING' flag in
 xhci_urb_dequeue() function to avoid this. But after some
 investigation, I think I missed the stop endpoint command in
 xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
 maybe need to add 'EP_HALT_PENDING' flag checking in
 xhci_stop_device() function. DId I miss something else? Thanks.
>>> Consider below three threads running on different CPUs at the same time.
>>>
>>> Thread A: xhci_handle_cmd_stop_ep()  --- in interrupt handler
>>> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
>>> Thread C: xhci_urb_dequeue --- called by usb core
>>>
>>> They are serialized by xhci->lock. Let's consider below sequence:
>>>
>>> Thread A:
>>> - delete xhci->cmd_timer), but will fail due to Thread B.
>>> - clear EP_HALT_PENDING bit
>>>
>>> Thread B:
>>> - halt the host controller
>>>
>>> Thread C:
>>> - set EP_HALT_PENDING bit
>>> - enqueue another stop endpoint command
>>> - add the timer back
>> Ah, thanks for you comments.
>> If thread B halted the host, then the thread C xhci_urb_dequeue() will
>> check host state failed, then just return and can not set another stop
>> endpoint command.
>
> Not so simple. Thread B will release xhci->lock before xhci_halt().

Yes.

>
>> But from your example, I think your meaning is we
>> should not halt the host by thread B, since we have handled the stop
>> endpoint command event.
>>
>> So I think we need to check the xhci command of this stop endpoint
>> command in xhci_stop_endpoint_command_timeout(), if the xhci command
>> of this stop endpoint command is not in the command list (which means
>> the stop endpoint command event has been handled), then just return
>> and do not halt the controller. Right?
>>
>
> I'd like suggest you to put this change into a separated patch.
> It's actually a different topic from the main purpose of this patch.

OK, I will. Thanks for your comments.

-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Lu Baolu
Hi,

On 12/01/2016 03:35 PM, Baolin Wang wrote:
> On 1 December 2016 at 14:35, Lu Baolu  wrote:
>> Hi,
>>
>> On 12/01/2016 02:04 PM, Baolin Wang wrote:
>>> Hi Baolu,
>>>
>>> On 1 December 2016 at 13:45, Lu Baolu  wrote:
 Hi,

 On 11/30/2016 05:02 PM, Baolin Wang wrote:
> If the hardware never responds to the stop endpoint command, the
> URBs will never be completed, and we might hang the USB subsystem.
> The original watchdog timer is used to watch if one stop endpoint
> command is timeout, if timeout, then the watchdog timer will set
> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
> pending URBs.
>
> But now we already have one command timer to control command timeout,
> thus we can also use the command timer to watch the stop endpoint
> command, instead of one duplicate watchdog timer which need to be
> removed.
>
> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
> this is the last stop endpoint command of one endpoint. Since we
> can make sure we only set one stop endpoint command for one endpoint
> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
> this flag.
 I am afraid you can't do this. "stop_cmds_pending" was added
 to fix the problem described in the comments that you want to
 remove. But I didn't find any fix of this problem in your patch.
>>> Now we can not pending another stop endpoint command for the same one
>>> endpoint, since will check 'EP_HALT_PENDING' flag in
>>> xhci_urb_dequeue() function to avoid this. But after some
>>> investigation, I think I missed the stop endpoint command in
>>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
>>> maybe need to add 'EP_HALT_PENDING' flag checking in
>>> xhci_stop_device() function. DId I miss something else? Thanks.
>> Consider below three threads running on different CPUs at the same time.
>>
>> Thread A: xhci_handle_cmd_stop_ep()  --- in interrupt handler
>> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
>> Thread C: xhci_urb_dequeue --- called by usb core
>>
>> They are serialized by xhci->lock. Let's consider below sequence:
>>
>> Thread A:
>> - delete xhci->cmd_timer), but will fail due to Thread B.
>> - clear EP_HALT_PENDING bit
>>
>> Thread B:
>> - halt the host controller
>>
>> Thread C:
>> - set EP_HALT_PENDING bit
>> - enqueue another stop endpoint command
>> - add the timer back
> Ah, thanks for you comments.
> If thread B halted the host, then the thread C xhci_urb_dequeue() will
> check host state failed, then just return and can not set another stop
> endpoint command.

Not so simple. Thread B will release xhci->lock before xhci_halt().

> But from your example, I think your meaning is we
> should not halt the host by thread B, since we have handled the stop
> endpoint command event.
>
> So I think we need to check the xhci command of this stop endpoint
> command in xhci_stop_endpoint_command_timeout(), if the xhci command
> of this stop endpoint command is not in the command list (which means
> the stop endpoint command event has been handled), then just return
> and do not halt the controller. Right?
>

I'd like suggest you to put this change into a separated patch.
It's actually a different topic from the main purpose of this patch.

Best regards,
Lu Baolu


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Baolin Wang
On 1 December 2016 at 14:35, Lu Baolu  wrote:
> Hi,
>
> On 12/01/2016 02:04 PM, Baolin Wang wrote:
>> Hi Baolu,
>>
>> On 1 December 2016 at 13:45, Lu Baolu  wrote:
>>> Hi,
>>>
>>> On 11/30/2016 05:02 PM, Baolin Wang wrote:
 If the hardware never responds to the stop endpoint command, the
 URBs will never be completed, and we might hang the USB subsystem.
 The original watchdog timer is used to watch if one stop endpoint
 command is timeout, if timeout, then the watchdog timer will set
 XHCI_STATE_DYING, try to halt the xHCI host, and give back all
 pending URBs.

 But now we already have one command timer to control command timeout,
 thus we can also use the command timer to watch the stop endpoint
 command, instead of one duplicate watchdog timer which need to be
 removed.

 Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
 this is the last stop endpoint command of one endpoint. Since we
 can make sure we only set one stop endpoint command for one endpoint
 by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
 this flag.
>>> I am afraid you can't do this. "stop_cmds_pending" was added
>>> to fix the problem described in the comments that you want to
>>> remove. But I didn't find any fix of this problem in your patch.
>> Now we can not pending another stop endpoint command for the same one
>> endpoint, since will check 'EP_HALT_PENDING' flag in
>> xhci_urb_dequeue() function to avoid this. But after some
>> investigation, I think I missed the stop endpoint command in
>> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
>> maybe need to add 'EP_HALT_PENDING' flag checking in
>> xhci_stop_device() function. DId I miss something else? Thanks.
>
> Consider below three threads running on different CPUs at the same time.
>
> Thread A: xhci_handle_cmd_stop_ep()  --- in interrupt handler
> Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
> Thread C: xhci_urb_dequeue --- called by usb core
>
> They are serialized by xhci->lock. Let's consider below sequence:
>
> Thread A:
> - delete xhci->cmd_timer), but will fail due to Thread B.
> - clear EP_HALT_PENDING bit
>
> Thread B:
> - halt the host controller
>
> Thread C:
> - set EP_HALT_PENDING bit
> - enqueue another stop endpoint command
> - add the timer back

Ah, thanks for you comments.
If thread B halted the host, then the thread C xhci_urb_dequeue() will
check host state failed, then just return and can not set another stop
endpoint command. But from your example, I think your meaning is we
should not halt the host by thread B, since we have handled the stop
endpoint command event.

So I think we need to check the xhci command of this stop endpoint
command in xhci_stop_endpoint_command_timeout(), if the xhci command
of this stop endpoint command is not in the command list (which means
the stop endpoint command event has been handled), then just return
and do not halt the controller. Right?

-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Lu Baolu
Hi,

On 12/01/2016 02:04 PM, Baolin Wang wrote:
> Hi Baolu,
>
> On 1 December 2016 at 13:45, Lu Baolu  wrote:
>> Hi,
>>
>> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>>> If the hardware never responds to the stop endpoint command, the
>>> URBs will never be completed, and we might hang the USB subsystem.
>>> The original watchdog timer is used to watch if one stop endpoint
>>> command is timeout, if timeout, then the watchdog timer will set
>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>>> pending URBs.
>>>
>>> But now we already have one command timer to control command timeout,
>>> thus we can also use the command timer to watch the stop endpoint
>>> command, instead of one duplicate watchdog timer which need to be
>>> removed.
>>>
>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>>> this is the last stop endpoint command of one endpoint. Since we
>>> can make sure we only set one stop endpoint command for one endpoint
>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>>> this flag.
>> I am afraid you can't do this. "stop_cmds_pending" was added
>> to fix the problem described in the comments that you want to
>> remove. But I didn't find any fix of this problem in your patch.
> Now we can not pending another stop endpoint command for the same one
> endpoint, since will check 'EP_HALT_PENDING' flag in
> xhci_urb_dequeue() function to avoid this. But after some
> investigation, I think I missed the stop endpoint command in
> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
> maybe need to add 'EP_HALT_PENDING' flag checking in
> xhci_stop_device() function. DId I miss something else? Thanks.

Consider below three threads running on different CPUs at the same time.

Thread A: xhci_handle_cmd_stop_ep()  --- in interrupt handler
Thread B: xhci_stop_endpoint_command_timeout() --- timer expired
Thread C: xhci_urb_dequeue --- called by usb core

They are serialized by xhci->lock. Let's consider below sequence:

Thread A:
- delete xhci->cmd_timer), but will fail due to Thread B.
- clear EP_HALT_PENDING bit

Thread B:
- halt the host controller

Thread C:
- set EP_HALT_PENDING bit
- enqueue another stop endpoint command
- add the timer back

Best regards,
Lu Baolu

>
>> - * The timer may also fire if the host takes a very long time to respond to 
>> the
>> - * command, and the stop endpoint command completion handler cannot delete 
>> the
>> - * timer before the timer function is called.  Another endpoint 
>> cancellation may
>> - * sneak in before the timer function can grab the lock, and that may queue
>> - * another stop endpoint command and add the timer back.  So we cannot use a
>> - * simple flag to say whether there is a pending stop endpoint command for a
>> - * particular endpoint.
>> - *
>> - * Instead we use a combination of that flag and a counter for the number of
>> - * pending stop endpoint commands.  If the timer is the tail end of the last
>> - * stop endpoint command, and the endpoint's command is still pending, we 
>> assume
>> - * the host is dying.
>>
>> Best regards,
>> Lu Baolu
>>
>>> We also need to clean up the command queue before trying to halt the
>>> xHCI host in xhci_stop_endpoint_command_timeout() function.
>>>
>>> Signed-off-by: Baolin Wang 
>
>



Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Baolin Wang
On 1 December 2016 at 14:04, Baolin Wang  wrote:
> Hi Baolu,
>
> On 1 December 2016 at 13:45, Lu Baolu  wrote:
>> Hi,
>>
>> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>>> If the hardware never responds to the stop endpoint command, the
>>> URBs will never be completed, and we might hang the USB subsystem.
>>> The original watchdog timer is used to watch if one stop endpoint
>>> command is timeout, if timeout, then the watchdog timer will set
>>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>>> pending URBs.
>>>
>>> But now we already have one command timer to control command timeout,
>>> thus we can also use the command timer to watch the stop endpoint
>>> command, instead of one duplicate watchdog timer which need to be
>>> removed.
>>>
>>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>>> this is the last stop endpoint command of one endpoint. Since we
>>> can make sure we only set one stop endpoint command for one endpoint
>>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>>> this flag.
>>
>> I am afraid you can't do this. "stop_cmds_pending" was added
>> to fix the problem described in the comments that you want to
>> remove. But I didn't find any fix of this problem in your patch.
>
> Now we can not pending another stop endpoint command for the same one
> endpoint, since will check 'EP_HALT_PENDING' flag in
> xhci_urb_dequeue() function to avoid this. But after some
> investigation, I think I missed the stop endpoint command in
> xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
> maybe need to add 'EP_HALT_PENDING' flag checking in
> xhci_stop_device() function. DId I miss something else? Thanks.

Sorry, wait, we will wait for the stop endpoint command finished in
xhci_stop_device() function. So we don't need check 'EP_HALT_PENDING'
flag as the orignal code did.

-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Baolin Wang
Hi Baolu,

On 1 December 2016 at 13:45, Lu Baolu  wrote:
> Hi,
>
> On 11/30/2016 05:02 PM, Baolin Wang wrote:
>> If the hardware never responds to the stop endpoint command, the
>> URBs will never be completed, and we might hang the USB subsystem.
>> The original watchdog timer is used to watch if one stop endpoint
>> command is timeout, if timeout, then the watchdog timer will set
>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>> pending URBs.
>>
>> But now we already have one command timer to control command timeout,
>> thus we can also use the command timer to watch the stop endpoint
>> command, instead of one duplicate watchdog timer which need to be
>> removed.
>>
>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>> this is the last stop endpoint command of one endpoint. Since we
>> can make sure we only set one stop endpoint command for one endpoint
>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>> this flag.
>
> I am afraid you can't do this. "stop_cmds_pending" was added
> to fix the problem described in the comments that you want to
> remove. But I didn't find any fix of this problem in your patch.

Now we can not pending another stop endpoint command for the same one
endpoint, since will check 'EP_HALT_PENDING' flag in
xhci_urb_dequeue() function to avoid this. But after some
investigation, I think I missed the stop endpoint command in
xhci_stop_device() which did not check the 'EP_HALT_PENDING' flag,
maybe need to add 'EP_HALT_PENDING' flag checking in
xhci_stop_device() function. DId I miss something else? Thanks.

>
> - * The timer may also fire if the host takes a very long time to respond to 
> the
> - * command, and the stop endpoint command completion handler cannot delete 
> the
> - * timer before the timer function is called.  Another endpoint cancellation 
> may
> - * sneak in before the timer function can grab the lock, and that may queue
> - * another stop endpoint command and add the timer back.  So we cannot use a
> - * simple flag to say whether there is a pending stop endpoint command for a
> - * particular endpoint.
> - *
> - * Instead we use a combination of that flag and a counter for the number of
> - * pending stop endpoint commands.  If the timer is the tail end of the last
> - * stop endpoint command, and the endpoint's command is still pending, we 
> assume
> - * the host is dying.
>
> Best regards,
> Lu Baolu
>
>>
>> We also need to clean up the command queue before trying to halt the
>> xHCI host in xhci_stop_endpoint_command_timeout() function.
>>
>> Signed-off-by: Baolin Wang 
>



-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Lu Baolu
Hi,

On 11/30/2016 05:02 PM, Baolin Wang wrote:
> If the hardware never responds to the stop endpoint command, the
> URBs will never be completed, and we might hang the USB subsystem.
> The original watchdog timer is used to watch if one stop endpoint
> command is timeout, if timeout, then the watchdog timer will set
> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
> pending URBs.
>
> But now we already have one command timer to control command timeout,
> thus we can also use the command timer to watch the stop endpoint
> command, instead of one duplicate watchdog timer which need to be
> removed.
>
> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
> this is the last stop endpoint command of one endpoint. Since we
> can make sure we only set one stop endpoint command for one endpoint
> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
> this flag.

I am afraid you can't do this. "stop_cmds_pending" was added
to fix the problem described in the comments that you want to
remove. But I didn't find any fix of this problem in your patch.

- * The timer may also fire if the host takes a very long time to respond to the
- * command, and the stop endpoint command completion handler cannot delete the
- * timer before the timer function is called.  Another endpoint cancellation 
may
- * sneak in before the timer function can grab the lock, and that may queue
- * another stop endpoint command and add the timer back.  So we cannot use a
- * simple flag to say whether there is a pending stop endpoint command for a
- * particular endpoint.
- *
- * Instead we use a combination of that flag and a counter for the number of
- * pending stop endpoint commands.  If the timer is the tail end of the last
- * stop endpoint command, and the endpoint's command is still pending, we 
assume
- * the host is dying.

Best regards,
Lu Baolu

>
> We also need to clean up the command queue before trying to halt the
> xHCI host in xhci_stop_endpoint_command_timeout() function.
>
> Signed-off-by: Baolin Wang 



Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Baolin Wang
On 30 November 2016 at 22:09, Mathias Nyman
 wrote:
> On 30.11.2016 11:02, Baolin Wang wrote:
>>
>> If the hardware never responds to the stop endpoint command, the
>> URBs will never be completed, and we might hang the USB subsystem.
>> The original watchdog timer is used to watch if one stop endpoint
>> command is timeout, if timeout, then the watchdog timer will set
>> XHCI_STATE_DYING, try to halt the xHCI host, and give back all
>> pending URBs.
>>
>> But now we already have one command timer to control command timeout,
>> thus we can also use the command timer to watch the stop endpoint
>> command, instead of one duplicate watchdog timer which need to be
>> removed.
>>
>> Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
>> this is the last stop endpoint command of one endpoint. Since we
>> can make sure we only set one stop endpoint command for one endpoint
>> by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
>> this flag.
>>
>> We also need to clean up the command queue before trying to halt the
>> xHCI host in xhci_stop_endpoint_command_timeout() function.
>
>
> This isn't a bad idea.
>
> There are anyway some corner cases and details that need to be
> checked, such as suspend (which will clear the command queue), module unload
> and abrupt host removal (like pci hotplug removal of host controller)
> we need to make sure we can trust the command timer to always return the
> canceled URB

Yes, you are right, we need to check these carefully.

Suspend process, module unload and abrupt host removal, they all will
issue usb_disconnect() firstly before clear the command queue, it will
check URBs for every endpoint by
usb_disconnect()--->usb_disable_device()--->usb_disable_endpoint(),
which will make sure every URBs of endpoints will be cancelled by the
stop endpoint command responding or the timeout function of stop
endpoint command (xhci_stop_endpoint_command_timeout()) in
usb_hcd_flush_endpoint(). From that point, we can make sure the
command timer will be useful to handle stop endpoint command timeout.
Please correct me if I said something wrong. Thanks.

-- 
Baolin.wang
Best Regards


Re: [RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Mathias Nyman

On 30.11.2016 11:02, Baolin Wang wrote:

If the hardware never responds to the stop endpoint command, the
URBs will never be completed, and we might hang the USB subsystem.
The original watchdog timer is used to watch if one stop endpoint
command is timeout, if timeout, then the watchdog timer will set
XHCI_STATE_DYING, try to halt the xHCI host, and give back all
pending URBs.

But now we already have one command timer to control command timeout,
thus we can also use the command timer to watch the stop endpoint
command, instead of one duplicate watchdog timer which need to be
removed.

Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
this is the last stop endpoint command of one endpoint. Since we
can make sure we only set one stop endpoint command for one endpoint
by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
this flag.

We also need to clean up the command queue before trying to halt the
xHCI host in xhci_stop_endpoint_command_timeout() function.


This isn't a bad idea.

There are anyway some corner cases and details that need to be
checked, such as suspend (which will clear the command queue), module unload
and abrupt host removal (like pci hotplug removal of host controller)
we need to make sure we can trust the command timer to always return the 
canceled URB

-Mathias




[RFC] usb: host: xhci: Remove the watchdog timer and use command timer to watch stop endpoint command

2016-11-30 Thread Baolin Wang
If the hardware never responds to the stop endpoint command, the
URBs will never be completed, and we might hang the USB subsystem.
The original watchdog timer is used to watch if one stop endpoint
command is timeout, if timeout, then the watchdog timer will set
XHCI_STATE_DYING, try to halt the xHCI host, and give back all
pending URBs.

But now we already have one command timer to control command timeout,
thus we can also use the command timer to watch the stop endpoint
command, instead of one duplicate watchdog timer which need to be
removed.

Meanwhile we don't need the 'stop_cmds_pending' flag to identy if
this is the last stop endpoint command of one endpoint. Since we
can make sure we only set one stop endpoint command for one endpoint
by 'EP_HALT_PENDING' flag in xhci_urb_dequeue() function. Thus remove
this flag.

We also need to clean up the command queue before trying to halt the
xHCI host in xhci_stop_endpoint_command_timeout() function.

Signed-off-by: Baolin Wang 
---
 drivers/usb/host/xhci-mem.c  |   10 +--
 drivers/usb/host/xhci-ring.c |   61 +-
 drivers/usb/host/xhci.c  |8 +-
 drivers/usb/host/xhci.h  |5 
 4 files changed, 21 insertions(+), 63 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index d6f59a3..2a5cd89 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -844,14 +844,6 @@ void xhci_free_stream_info(struct xhci_hcd *xhci,
 
 /* Device context manipulation */
 
-static void xhci_init_endpoint_timer(struct xhci_hcd *xhci,
-   struct xhci_virt_ep *ep)
-{
-   setup_timer(&ep->stop_cmd_timer, xhci_stop_endpoint_command_watchdog,
-   (unsigned long)ep);
-   ep->xhci = xhci;
-}
-
 static void xhci_free_tt_info(struct xhci_hcd *xhci,
struct xhci_virt_device *virt_dev,
int slot_id)
@@ -1014,7 +1006,7 @@ int xhci_alloc_virt_device(struct xhci_hcd *xhci, int 
slot_id,
 
/* Initialize the cancellation list and watchdog timers for each ep */
for (i = 0; i < 31; i++) {
-   xhci_init_endpoint_timer(xhci, &dev->eps[i]);
+   dev->eps[i].xhci = xhci;
INIT_LIST_HEAD(&dev->eps[i].cancelled_td_list);
INIT_LIST_HEAD(&dev->eps[i].bw_endpoint_list);
}
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2057d08..67a1bd6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -561,18 +561,6 @@ static void td_to_noop(struct xhci_hcd *xhci, struct 
xhci_ring *ep_ring,
}
 }
 
-static void xhci_stop_watchdog_timer_in_irq(struct xhci_hcd *xhci,
-   struct xhci_virt_ep *ep)
-{
-   ep->ep_state &= ~EP_HALT_PENDING;
-   /* Can't del_timer_sync in interrupt, so we attempt to cancel.  If the
-* timer is running on another CPU, we don't decrement stop_cmds_pending
-* (since we didn't successfully stop the watchdog timer).
-*/
-   if (del_timer(&ep->stop_cmd_timer))
-   ep->stop_cmds_pending--;
-}
-
 /* Must be called with xhci->lock held in interrupt context */
 static void xhci_giveback_urb_in_irq(struct xhci_hcd *xhci,
struct xhci_td *cur_td, int status)
@@ -664,7 +652,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, 
int slot_id,
ep = &xhci->devs[slot_id]->eps[ep_index];
 
if (list_empty(&ep->cancelled_td_list)) {
-   xhci_stop_watchdog_timer_in_irq(xhci, ep);
+   ep->ep_state &= ~EP_HALT_PENDING;
ep->stopped_td = NULL;
ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
return;
@@ -719,7 +707,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, 
int slot_id,
list_del_init(&cur_td->td_list);
}
last_unlinked_td = cur_td;
-   xhci_stop_watchdog_timer_in_irq(xhci, ep);
+   ep->ep_state &= ~EP_HALT_PENDING;
 
/* If necessary, queue a Set Transfer Ring Dequeue Pointer command */
if (deq_state.new_deq_ptr && deq_state.new_deq_seg) {
@@ -817,38 +805,19 @@ static void xhci_kill_endpoint_urbs(struct xhci_hcd *xhci,
}
 }
 
-/* Watchdog timer function for when a stop endpoint command fails to complete.
+/* This timeout function for when a stop endpoint command fails to complete.
  * In this case, we assume the host controller is broken or dying or dead.  The
  * host may still be completing some other events, so we have to be careful to
  * let the event ring handler and the URB dequeueing/enqueueing functions know
  * through xhci->state.
- *
- * The timer may also fire if the host takes a very long time to respond to the
- * command, and the stop endpoint command completion handler cannot delete the
- * timer before the timer function is called.  Another endpoint cancellation 
may
- * sneak in befo