Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port
On Tue, 2019-04-09 at 10:44 -0400, Alan Stern wrote: > On Mon, 8 Apr 2019, Martin K. Petersen wrote: > > > > > Alan, > > > > > So it looks as though the SCSI subsystem doesn't like to have a reset > > > handler call scsi_remove_host. > > > > Are you talking about a PCI device removal handler or a SCSI error > > handler? > > The context of this discussion is a USB mass-storage device where the > device's port on its upstream hub has been powered off. The > powered-off port causes an executing command to time out. As a result > the SCSI error handler runs and calls the USB reset routine, but the > reset fails because the kernel is unable to communicate with the device > through the powered-off port. This causes the USB reset routine to > unbind the device from its USB driver, which in turn calls > scsi_remove_host -- while the error handler is still running. >From which context does that unbind happen? From inside a SCSI EH callback or from the context of a workqueue? I think the former is not allowed but that the latter is allowed. The SRP initiator driver (ib_srp.c) follows the latter approach. See also srp_queue_remove_work(). Bart.
RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub port
Hi, >> Root Cause >> - Block layer timeout happens after power off UAS USB device which is >> accessed as reproduce step. During timeout error handler process, scsi host >> state becomes SHOST_CANCEL_RECOVERY that causes IO hangs up and lock cannot >> be released. And in final, usb subsystem hangs up. >> Follow is function call: >> blk_mq_timeout_work >> …->scsi_times_out (… means some functions are not listed before this >> function.) >> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) >> … -> scsi_error_handler >> …-> uas_eh_device_reset_handler >> -> usb_lock_device_for_reset <- take lock >> -> usb_reset_device >> …-> rebind = uas_post_reset (return 1 since ENODEV) >> …-> usb_unbind_and_rebind_marked_interfaces (rebind=1) >>…-> uas_disconnect (scsi_host_set_state to >> SHOST_CANCEL_RECOVERY) >> … -> scsi_queue_rq > >How does scsi_queue_rq get called here? As far as I can see, this shouldn't >happen. We confirmed the function call path on linux 4.9 when this problem occured since we are working on it. In linux 4.9, the last function is scsi_request_fn instead of scsi_queue_rq. In staging.git, we think the scsi_queue_rq is called by follow path. uas_disconnect |- scsi_remove_host |- scsi_forget_host |- __scsi_remove_device |- device_del |- bus_remove_device |- device_release_driver |- device_release_driver_internal |- __device_release_driver |- drv->remove(dev) (sd_remove) |- sd_shutdown |- sd_sync_cache |- scsi_execute |- __scsi_execute |- blk_execute_rq |- blk_execute_rq_nowait |- blk_mq_sched_insert_request |- blk_mq_run_hw_queue |- __blk_mq_delay_run_hw_queue |- __blk_mq_run_hw_queue |- blk_mq_sched_dispatch_requests |- blk_mq_dispatch_rq_list |- q->mq_ops->queue_rq (scsi_queue_rq) >> Countermeasure >> - Make uas_post_reset doesn’t return 1 when ENODEV returns from >> uas_configure_endpoints since usb_unbind_and_rebind_marded_interfaces >> doesn’t need to do unbind/rebind operations in this situation. >> blk_mq_timeout_work >> …->scsi_times_out (… means some functions are not listed before this >> function.) >> …-> scsi_eh_scmd_add(scsi_host_set_state to SHOST_RECOVERY) >> … -> scsi_error_handler >>…-> uas_eh_device_reset_handler (*1) >>-> usb_lock_device_for_reset <- take lock >> -> usb_reset_device >>-> usb_reset_and_verify_device (return ENODEV and FAILED will >> be reported to *1) >>-> uas_post_reset returns 0 when ENODEV => rebind=0 >>-> usb_unbind_and_rebind_marked_interfaces (rebind=0) > >The difference is that uas_disconnect wasn't called here. But that routine >should not cause any problems -- you're always supposed to be able to unbind a >driver from a device. So it looks like this is not the right way to solve the >problem. We confirmed usb_driver_release_interface will call usb_unbind_interface when this problem occurs. So usb_unbind_interface will call driver disconnect callbak. Regards, Kento Kobayashi
Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port
On Fri, 8 Mar 2019, Oliver Neukum wrote: > On Fr, 2019-03-08 at 09:13 +, kento.a.kobaya...@sony.com wrote: > > The usb_reset_and_verify_device included in usb_reset_device fails > > with -ENODEV after power off hub port, and the -ENODEV error will > > be reported to uas_eh_bus_reset_handler and upper layer, so it > > doesn't need to do rebind if -ENODEV happens. > > Hi, > > no I am sorry, that is an assumption you just cannot make. > Anything can trigger a reset. That being SCSI is the common > case certainly, but not the only case. And in those cases > we cannot depend on upper layers doing the right thing, if > we just ignore an error. > > NACK > > Sorry > Oliver Note that the reset routines in usb-storage do not make any exceptions for -ENODEV. Alan Stern
Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port
On Fr, 2019-03-08 at 09:13 +, kento.a.kobaya...@sony.com wrote: > The usb_reset_and_verify_device included in usb_reset_device fails > with -ENODEV after power off hub port, and the -ENODEV error will > be reported to uas_eh_bus_reset_handler and upper layer, so it > doesn't need to do rebind if -ENODEV happens. Hi, no I am sorry, that is an assumption you just cannot make. Anything can trigger a reset. That being SCSI is the common case certainly, but not the only case. And in those cases we cannot depend on upper layers doing the right thing, if we just ignore an error. NACK Sorry Oliver
[PATCH] usb: uas: fix usb subsystem hang after power off hub port
The issue happens with following steps: Access usb3.0/3.1 device that uses uas driver. Power off hub port connecting device by ioctl(USBDEVFS_CONTROL). Wait longer than 30s(scsi layer timeout period is 30s). Execute commands like lsusb, no response and usb subsytem hangs. After scsi layer timeout, uas_eh_bus_reset_handler works and enter usb_reset_device. During reset, current uas_post_reset returns 1 if uas_configure_endpoints fails with -ENODEV, then it tries to rebind uas driver. The unbind request cannot complete because program goes to host_not_ready in scsi_request_fn. As result, it stops at device reset process and the lock took before reset will not be released. The usb_reset_and_verify_device included in usb_reset_device fails with -ENODEV after power off hub port, and the -ENODEV error will be reported to uas_eh_bus_reset_handler and upper layer, so it doesn't need to do rebind if -ENODEV happens. Signed-off-by: Kento Kobayashi --- drivers/usb/storage/uas.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/storage/uas.c b/drivers/usb/storage/uas.c index 36742e8..24b09fd 100644 --- a/drivers/usb/storage/uas.c +++ b/drivers/usb/storage/uas.c @@ -1116,6 +1116,9 @@ static int uas_post_reset(struct usb_interface *intf) scsi_unblock_requests(shost); + if (err == -ENODEV) + return 0; + return err ? 1 : 0; } -- 2.7.4
RE: [PATCH] usb: uas: fix usb subsystem hang after power off hub port
Dear Greg-san: I confirmed our company email related counter person that below footer is added automatically after we send email to who doesn't work in our company. I am very sorry that I didn't notice this issue and make you confuse. Now I am confirming with our counter person whether there is solution to not add below footer. After that, I will re-send patch mail to you. Best Regards Jacky -Original Message- From: Greg KH [mailto:gre...@linuxfoundation.org] Sent: Monday, March 04, 2019 2:23 PM To: Cao, Jacky Cc: oneu...@suse.com; st...@rowland.harvard.edu; linux-...@vger.kernel.org; linux-s...@vger.kernel.org; usb-stor...@lists.one-eyed-alien.net; linux-kernel@vger.kernel.org Subject: Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port On Mon, Mar 04, 2019 at 02:08:54AM +, jacky@sony.com wrote: > > > This email is confidential and intended only for the use of the individual or > entity named above and may contain information that is privileged. If you are > not the intended recipient, you are notified that any dissemination, > distribution or copying of this email is strictly prohibited. If you have > received this email in error, please notify us immediately by return email or > telephone and destroy the original message. - This mail is sent via Sony Asia > Pacific Mail Gateway.. This footer requires me to delete this email as it is not compatible with doing Linux kernel development. greg k-h This email is confidential and intended only for the use of the individual or entity named above and may contain information that is privileged. If you are not the intended recipient, you are notified that any dissemination, distribution or copying of this email is strictly prohibited. If you have received this email in error, please notify us immediately by return email or telephone and destroy the original message. - This mail is sent via Sony Asia Pacific Mail Gateway..
Re: [PATCH] usb: uas: fix usb subsystem hang after power off hub port
On Mon, Mar 04, 2019 at 02:08:54AM +, jacky@sony.com wrote: > > > This email is confidential and intended only for the use of the individual or > entity named above and may contain information that is privileged. If you are > not the intended recipient, you are notified that any dissemination, > distribution or copying of this email is strictly prohibited. If you have > received this email in error, please notify us immediately by return email or > telephone and destroy the original message. - This mail is sent via Sony Asia > Pacific Mail Gateway.. This footer requires me to delete this email as it is not compatible with doing Linux kernel development. greg k-h