Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-04-09 Thread Bart Van Assche
On Tue, 2019-04-09 at 10:44 -0400, Alan Stern wrote:
> On Mon, 8 Apr 2019, Martin K. Petersen wrote:
> 
> > 
> > Alan,
> > 
> > > So it looks as though the SCSI subsystem doesn't like to have a reset 
> > > handler call scsi_remove_host.
> > 
> > Are you talking about a PCI device removal handler or a SCSI error
> > handler?
> 
> The context of this discussion is a USB mass-storage device where the
> device's port on its upstream hub has been powered off.  The
> powered-off port causes an executing command to time out.  As a result
> the SCSI error handler runs and calls the USB reset routine, but the
> reset fails because the kernel is unable to communicate with the device
> through the powered-off port.  This causes the USB reset routine to
> unbind the device from its USB driver, which in turn calls
> scsi_remove_host -- while the error handler is still running.

>From which context does that unbind happen? From inside a SCSI EH callback
or from the context of a workqueue? I think the former is not allowed but
that the latter is allowed. The SRP initiator driver (ib_srp.c) follows the
latter approach. See also srp_queue_remove_work().

Bart.


RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-04-03 Thread Kento.A.Kobayashi
Hi,

>> Root Cause
>> - Block layer timeout happens after power off UAS USB device which is 
>> accessed as reproduce step. During timeout error handler process, scsi host 
>> state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot 
>> be released. And in final, usb subsystem hangs up.
>> Follow is function call:
>> blk_mq_timeout_work 
>>   …->scsi_times_out  (… means some functions are not listed before this 
>> function.)
>> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>>   … -> scsi_error_handler
>> …-> uas_eh_device_reset_handler
>> -> usb_lock_device_for_reset  <- take lock
>>   -> usb_reset_device
>> …-> rebind = uas_post_reset (return 1 since ENODEV) 
>> …-> usb_unbind_and_rebind_marked_interfaces (rebind=1)
>>…-> uas_disconnect  (scsi_host_set_state to 
>> SHOST_CANCEL_RECOVERY)
>> … -> scsi_queue_rq
>
>How does scsi_queue_rq get called here?  As far as I can see, this shouldn't 
>happen.

We confirmed the function call path on linux 4.9 when this problem occured 
since we are working on it. In linux 4.9, the last function is scsi_request_fn 
instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called 
by follow path.
uas_disconnect
|- scsi_remove_host
 |- scsi_forget_host
  |- __scsi_remove_device
   |- device_del
|- bus_remove_device
 |- device_release_driver
  |- device_release_driver_internal
   |- __device_release_driver
|- drv->remove(dev) (sd_remove)  
 |- sd_shutdown
  |- sd_sync_cache
   |- scsi_execute
|- __scsi_execute
 |- blk_execute_rq
  |- blk_execute_rq_nowait
   |- blk_mq_sched_insert_request
|- blk_mq_run_hw_queue
 |- __blk_mq_delay_run_hw_queue
  |- __blk_mq_run_hw_queue
   |- blk_mq_sched_dispatch_requests
|- blk_mq_dispatch_rq_list
 |- q->mq_ops->queue_rq (scsi_queue_rq)

>> Countermeasure
>> - Make uas_post_reset doesn’t return 1 when ENODEV returns from 
>> uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces 
>> doesn’t need to do unbind/rebind operations in this situation.
>> blk_mq_timeout_work
>>   …->scsi_times_out  (… means some functions are not listed before this 
>> function.)
>> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) 
>>   … -> scsi_error_handler
>>…-> uas_eh_device_reset_handler (*1)
>>-> usb_lock_device_for_reset  <- take lock
>>  -> usb_reset_device
>>-> usb_reset_and_verify_device (return ENODEV and FAILED will 
>> be reported to *1)
>>-> uas_post_reset returns 0 when ENODEV => rebind=0 
>>-> usb_unbind_and_rebind_marked_interfaces (rebind=0)
>
>The difference is that uas_disconnect wasn't called here.  But that routine 
>should not cause any problems -- you're always supposed to be able to unbind a 
>driver from a device.  So it looks like this is not the right way to solve the 
>problem.

We confirmed usb_driver_release_interface will call usb_unbind_interface when 
this problem occurs.
So usb_unbind_interface will call driver disconnect callbak.

Regards,
Kento Kobayashi


Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-03-08 Thread Alan Stern
On Fri, 8 Mar 2019, Oliver Neukum wrote:

> On Fr, 2019-03-08 at 09:13 +, kento.a.kobaya...@sony.com wrote:
> > The usb_reset_and_verify_device included in usb_reset_device fails
> > with -ENODEV after power off hub port, and the -ENODEV error will
> > be reported to uas_eh_bus_reset_handler and upper layer, so it
> > doesn't need to do rebind if -ENODEV happens.
> 
> Hi,
> 
> no I am sorry, that is an assumption you just cannot make.
> Anything can trigger a reset. That being SCSI is the common
> case certainly, but not the only case. And in those cases
> we cannot depend on upper layers doing the right thing, if
> we just ignore an error.
> 
> NACK
> 
>   Sorry
>   Oliver

Note that the reset routines in usb-storage do not make any exceptions
for -ENODEV.

Alan Stern



Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-03-08 Thread Oliver Neukum
On Fr, 2019-03-08 at 09:13 +, kento.a.kobaya...@sony.com wrote:
> The usb_reset_and_verify_device included in usb_reset_device fails
> with -ENODEV after power off hub port, and the -ENODEV error will
> be reported to uas_eh_bus_reset_handler and upper layer, so it
> doesn't need to do rebind if -ENODEV happens.

Hi,

no I am sorry, that is an assumption you just cannot make.
Anything can trigger a reset. That being SCSI is the common
case certainly, but not the only case. And in those cases
we cannot depend on upper layers doing the right thing, if
we just ignore an error.

NACK

Sorry
Oliver



[PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-03-08 Thread Kento.A.Kobayashi
The issue happens with following steps:
Access usb3.0/3.1 device that uses uas driver.
Power off hub port connecting device by ioctl(USBDEVFS_CONTROL).
Wait longer than 30s(scsi layer timeout period is 30s).
Execute commands like lsusb, no response and usb subsytem hangs.

After scsi layer timeout, uas_eh_bus_reset_handler works and enter
usb_reset_device. During reset, current uas_post_reset returns 1 if
uas_configure_endpoints fails with -ENODEV, then it tries to rebind
uas driver. The unbind request cannot complete because program goes
to host_not_ready in scsi_request_fn. As result, it stops at device
reset process and the lock took before reset will not be released.

The usb_reset_and_verify_device included in usb_reset_device fails
with -ENODEV after power off hub port, and the -ENODEV error will
be reported to uas_eh_bus_reset_handler and upper layer, so it
doesn't need to do rebind if -ENODEV happens.

Signed-off-by: Kento Kobayashi 
---
drivers/usb/storage/uas.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c
index 36742e8..24b09fd 100644
--- a/drivers/usb/storage/uas.c
+++ b/drivers/usb/storage/uas.c
@@ -1116,6 +1116,9 @@ static int uas_post_reset(struct usb_interface *intf)

scsi_unblock_requests(shost);

+   if (err == -ENODEV)
+   return 0;
+
return err ? 1 : 0;
 }

-- 
2.7.4


RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-03-03 Thread Jacky.Cao
Dear Greg-san:

I confirmed our company email related counter person that below footer is added 
automatically
after we send email to who doesn't work in our company.
I am very sorry that I didn't notice this issue and make you confuse.
Now I am confirming with our counter person whether there is solution to not 
add below footer.
After that, I will re-send patch mail to you.

Best Regards
Jacky

-Original Message-
From: Greg KH [mailto:gre...@linuxfoundation.org]
Sent: Monday, March 04, 2019 2:23 PM
To: Cao, Jacky
Cc: oneu...@suse.com; st...@rowland.harvard.edu; linux-...@vger.kernel.org; 
linux-s...@vger.kernel.org; usb-stor...@lists.one-eyed-alien.net; 
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

On Mon, Mar 04, 2019 at 02:08:54AM +, jacky@sony.com wrote:
> 
>
> This email is confidential and intended only for the use of the individual or 
> entity named above and may contain information that is privileged. If you are 
> not the intended recipient, you are notified that any dissemination, 
> distribution or copying of this email is strictly prohibited. If you have 
> received this email in error, please notify us immediately by return email or 
> telephone and destroy the original message. - This mail is sent via Sony Asia 
> Pacific Mail Gateway..

This footer requires me to delete this email as it is not compatible
with doing Linux kernel development.

greg k-h



This email is confidential and intended only for the use of the individual or 
entity named above and may contain information that is privileged. If you are 
not the intended recipient, you are notified that any dissemination, 
distribution or copying of this email is strictly prohibited. If you have 
received this email in error, please notify us immediately by return email or 
telephone and destroy the original message. - This mail is sent via Sony Asia 
Pacific Mail Gateway..


Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port

2019-03-03 Thread Greg KH
On Mon, Mar 04, 2019 at 02:08:54AM +, jacky@sony.com wrote:
> 
> 
> This email is confidential and intended only for the use of the individual or 
> entity named above and may contain information that is privileged. If you are 
> not the intended recipient, you are notified that any dissemination, 
> distribution or copying of this email is strictly prohibited. If you have 
> received this email in error, please notify us immediately by return email or 
> telephone and destroy the original message. - This mail is sent via Sony Asia 
> Pacific Mail Gateway..

This footer requires me to delete this email as it is not compatible
with doing Linux kernel development.

greg k-h