Re: [usb-storage] UAS hangs khubd on USB disconnect
On Thu, Dec 12, 2013 at 05:04:31PM -0500, Alan Stern wrote: On Wed, 11 Dec 2013, Sarah Sharp wrote: Hi Hans, I've been testing the UAS code you sent a pull request for against 3.13-rc1, and I've run into a rather nasty issue with USB disconnect. I ran some tests with a USB 3.0 storage device under xHCI. The disk has three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4). There was a btrfs partition on sdb3, but I deleted it. If I start to play a movie on the ext4 partition, and then yank the USB cable, the uas driver is unbound from the device. It looks like something goes wrong in the SCSI layer shortly after that, causing an oops in sysfs_remove_group(). I did a little testing. It turns out this WARN (not an oops) is the result of recent changes to sysfs, combined with the peculiar way the SCSI layer handles targets. In the new kernel, when you call device_del for some object, the object's directory and everything beneath it get removed from sysfs. This wasn't true in the past. When a USB drive is unplugged, almost everything below it gets unregistered. But not the SCSI target -- it remains registered until the number of reap references drops to 0. This doesn't happen until all the devices beneath it are released, which happens when all the open file references are closed and the filesystem is unmounted. So scsi_target_reap_usercontext ends up calling device_del for the target after everything else has been removed from sysfs. As part of normal device_del processing, attribute groups get removed. In particular the power/ subdirectory is removed from the target's sysfs directory. But the sysfs directories are long gone by this time, so sysfs_remove_group complains that it was asked to remove a non-existent subdirectory. Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. Sarah Sharp -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 13 Dec 2013, Sarah Sharp wrote: Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? Easily. The downside is that there would no longer be any warning when someone tries to remove a wrong subdirectory by mistake. The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
Hello, guys. (cc'ing Greg) On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote: On Fri, 13 Dec 2013, Sarah Sharp wrote: Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? Easily. The downside is that there would no longer be any warning when someone tries to remove a wrong subdirectory by mistake. The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. While the recent sysfs changes made this issue more visible, Greg wants to make sure that devices are removed from leaf up in all cases and keep the warning to ensure that. Would there be a way fix SCSI removal ordering? Thanks. -- tejun -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote: On Fri, 13 Dec 2013, Sarah Sharp wrote: Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? Easily. The downside is that there would no longer be any warning when someone tries to remove a wrong subdirectory by mistake. The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. I can certainly test it with my UAS device as well. I don't know if the issue arises from the late device_del. Looking at Hans' stack trace, the BUG in blk_free_tags gets triggered when the scsi_host is released before the block_queue release. So I don't think moving the scsi_target delete sooner would help? I really don't know anything about the SCSI or block layer though. I can confirm that simply removing the BUG() call in blk_free_tags allows the partitions on the UAS device to be mounted after it was hot-removed in the middle of video playback. Hans, maybe in order to get an answer to your question[1], you should submit a patch to the block layer maintainer, Jens Axboe? Sarah Sharp [1] http://www.spinics.net/lists/linux-scsi/msg70002.html -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote: Hello, guys. (cc'ing Greg) On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote: On Fri, 13 Dec 2013, Sarah Sharp wrote: Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? Easily. The downside is that there would no longer be any warning when someone tries to remove a wrong subdirectory by mistake. The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. While the recent sysfs changes made this issue more visible, Greg wants to make sure that devices are removed from leaf up in all cases and keep the warning to ensure that. Would there be a way fix SCSI removal ordering? Could someone analyse the actual problem? We're quite careful even on host remove to iterate and remove all the devices, then targets, then host (and allied transport objects). Which removal is inverted? James -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 2013-12-13 at 11:18 -0800, James Bottomley wrote: On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote: Hello, guys. (cc'ing Greg) On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote: On Fri, 13 Dec 2013, Sarah Sharp wrote: Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? Easily. The downside is that there would no longer be any warning when someone tries to remove a wrong subdirectory by mistake. The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. While the recent sysfs changes made this issue more visible, Greg wants to make sure that devices are removed from leaf up in all cases and keep the warning to ensure that. Would there be a way fix SCSI removal ordering? Could someone analyse the actual problem? We're quite careful even on host remove to iterate and remove all the devices, then targets, then host (and allied transport objects). Which removal is inverted? Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). James --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 13 Dec 2013, James Bottomley wrote: I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. While the recent sysfs changes made this issue more visible, Greg wants to make sure that devices are removed from leaf up in all cases and keep the warning to ensure that. Would there be a way fix SCSI removal ordering? Could someone analyse the actual problem? We're quite careful even on host remove to iterate and remove all the devices, then targets, then host (and allied transport objects). Which removal is inverted? The scsi_host is removed before the scsi_target. The reason is that scsi_remove_host() calls device_del(shost-shost_gendev) directly, but scsi_target_reap_usercontext() doesn't call device_del(starget-dev) until it gets invoked (indirectly) from scsi_device_dev_release_usercontext(), by way of scsi_target_reap(). Thus, the host gets removed from visibility at the appropriate time, but the target remains until all the scsi_devices beneath it are not only removed from visibility but also released (their refcounts drop to 0). This can occur much later if, for example, a scsi_device holds a mounted filesystem. The scsi_host and scsi_device are removed when the underlying USB device is unplugged. But the scsi_device isn't released, and hence the scsi_target isn't removed, until the filesystem is unmounted. Broadly speaking, the correct approach would be to call scsi_target_reap() from __scsi_remove_device() instead of from scsi_device_dev_release_usercontext(). Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
Hi James, On 12/13/2013 09:03 PM, James Bottomley wrote: On Fri, 2013-12-13 at 11:18 -0800, James Bottomley wrote: On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote: Hello, guys. (cc'ing Greg) On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote: On Fri, 13 Dec 2013, Sarah Sharp wrote: Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Thanks for looking into this. I think just getting rid of the WARN would be sufficient. Can you make a patch for that? Easily. The downside is that there would no longer be any warning when someone tries to remove a wrong subdirectory by mistake. The patch still won't help with the UAS issues with scsi_init_shared_tag_map though. I wasn't clear on the reason for that problem. Does it also arise from late device_del for scsi_target? I could try to change the way that works, if anybody (Hans?) would like to test it. While the recent sysfs changes made this issue more visible, Greg wants to make sure that devices are removed from leaf up in all cases and keep the warning to ensure that. Would there be a way fix SCSI removal ordering? Could someone analyse the actual problem? We're quite careful even on host remove to iterate and remove all the devices, then targets, then host (and allied transport objects). Which removal is inverted? Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). Thanks I'll give this patch a try. As for backtraces I've posted some (partial) backtraces as well as reproduction instructions here: http://www.spinics.net/lists/linux-scsi/msg70002.html Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 13 Dec 2013, James Bottomley wrote: Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). James --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the This is not right. The problem is that you don't keep track explicitly of the number of references to a target; you rely implicitly on starget-devices being non-empty. starget-reap_ref is only a count of local operations that should block removal. Consider, for example, what would happen if there is more than one LUN. What if one of them is removed while the other remains? A more invasive change is needed. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, Dec 13, 2013 at 12:03:19PM -0800, James Bottomley wrote: Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). I can confirm that this patch fixes both the sysfs warning, and the issue with USB storage disconnect during video playback. I did trigger a new (possibly unrelated?) mutex deadlock warning. dmesg is attached. Sarah Sharp --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the Dec 13 13:02:02 xanatos kernel: [7.029300] usb usb4: bus auto-suspend, wakeup 1 Dec 13 13:02:02 xanatos kernel: [7.040327] input: SynPS/2 Synaptics TouchPad as /devices/platform/i8042/serio1/input/input11 Dec 13 13:02:02 xanatos kernel: [7.112065] btusb 3-1.4:1.0: usb_probe_interface Dec 13 13:02:02 xanatos kernel: [7.112070] btusb 3-1.4:1.0: usb_probe_interface - got id Dec 13 13:02:02 xanatos kernel: [7.122731] usbcore: registered new interface driver btusb Dec 13 13:02:02 xanatos kernel: [7.167710] Linux video capture interface: v2.00 Dec 13 13:02:02 xanatos kernel: [7.235181] uvcvideo 3-1.6:1.0: usb_probe_interface Dec 13 13:02:02 xanatos kernel: [7.235187] uvcvideo 3-1.6:1.0: usb_probe_interface - got id Dec 13 13:02:02 xanatos kernel: [7.235293] uvcvideo: Found UVC 1.00 device Integrated Camera (04f2:b2ea) Dec 13 13:02:02 xanatos kernel: [7.242661] input: Integrated Camera as /devices/pci:00/:00:1a.0/usb3/3-1/3-1.6/3-1.6:1.0/input/input20 Dec 13 13:02:02 xanatos kernel: [7.244470] usbcore: registered new interface driver uvcvideo Dec 13 13:02:02 xanatos kernel: [7.244473] USB Video Class driver (1.1.1) Dec 13 13:02:03 xanatos kernel: [8.044806] bio: create slab bio-2 at 2 Dec 13 13:02:03 xanatos kernel: [8.261355] Adding 4085756k swap on /dev/mapper/cryptswap1. Priority:-1 extents:1 across:4085756k SSFS Dec 13 13:02:03 xanatos kernel: [8.407323] e1000e :00:19.0: irq 44 for MSI/MSI-X Dec 13 13:02:03 xanatos kernel: [8.510442] e1000e :00:19.0: irq 44 for MSI/MSI-X Dec 13 13:02:03 xanatos kernel: [8.510945] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready Dec 13 13:02:03 xanatos kernel: [8.516037] iwlwifi :03:00.0: L1 Enabled; Disabling L0S Dec 13 13:02:03 xanatos kernel: [8.517364] iwlwifi :03:00.0: Radio type=0x1-0x2-0x0 Dec 13 13:02:03 xanatos kernel: [8.785685] iwlwifi :03:00.0: L1 Enabled; Disabling L0S Dec 13 13:02:03 xanatos kernel: [8.792724] iwlwifi :03:00.0: Radio type=0x1-0x2-0x0 Dec 13 13:02:04 xanatos kernel: [8.876409] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready Dec 13 13:02:04 xanatos kernel: [9.787586] usb 3-1.6: usb auto-suspend, wakeup 0 Dec 13 13:02:05 xanatos kernel: [9.910341] psmouse serio2: alps: Unknown ALPS touchpad: E7=10 00 64, EC=10 00 64 Dec 13 13:02:06 xanatos kernel: [ 11.169530] psmouse serio2: trackpoint: IBM TrackPoint firmware: 0x0e, buttons: 3/3 Dec 13 13:02:06 xanatos kernel: [ 11.375342] input: TPPS/2 IBM TrackPoint as /devices/platform/i8042/serio1/serio2/input/input19 Dec 13 13:02:07 xanatos kernel: [ 11.917809] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Dec 13 13:02:07 xanatos kernel: [ 11.917863] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready Dec 13 13:02:12 xanatos kernel: [ 17.125215] Dec 13 13:02:12 xanatos kernel: [ 17.125218] == Dec 13 13:02:12 xanatos kernel: [ 17.125219] [ INFO: possible circular locking dependency detected ] Dec 13 13:02:12 xanatos kernel: [ 17.125221] 3.13.0-rc1+ #140 Not tainted Dec 13 13:02:12
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote: On Fri, 13 Dec 2013, James Bottomley wrote: Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). James --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the This is not right. The problem is that you don't keep track explicitly of the number of references to a target; you rely implicitly on starget-devices being non-empty. starget-reap_ref is only a count of local operations that should block removal. No, it was supposed explicitly to be a visibility counter to answer the question when can we delete the target. It's incremented every time we add a device to the target (and when we do an operation that may remove one to keep an atomic context before we blow it away) and decremented every time we remove one. Consider, for example, what would happen if there is more than one LUN. What if one of them is removed while the other remains? Then the reap reference remains above zero and the target stays. A more invasive change is needed. I think you might be right in that we need to kill the list_empty check, but I think that should be it. James -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
Hi, On 12/13/2013 09:03 PM, James Bottomley wrote: snip Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). James --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the I've given this patch a try and it fixes the blk-tag.c: 89 BUG() I was seeing. As for the other patch you (James) have send for that problem: diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the That too fixes the blk-tag.c: 89 BUG() I was seeing. Either patch by itself seems to be enough to fix this issue for me. Thanks Regards, Hans -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 13 Dec 2013, James Bottomley wrote: On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote: On Fri, 13 Dec 2013, James Bottomley wrote: Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). James --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the This is not right. The problem is that you don't keep track explicitly of the number of references to a target; you rely implicitly on starget-devices being non-empty. starget-reap_ref is only a count of local operations that should block removal. No, it was supposed explicitly to be a visibility counter to answer the question when can we delete the target. It's incremented every time we add a device to the target (and when we do an operation that may remove one to keep an atomic context before we blow it away) and decremented every time we remove one. Sorry, but you're wrong. starget-reap_ref is _not_ incremented every time we add a device to the target. That's one of the things we need to fix. Consider, for example, what would happen if there is more than one LUN. What if one of them is removed while the other remains? Then the reap reference remains above zero and the target stays. A more invasive change is needed. I think you might be right in that we need to kill the list_empty check, but I think that should be it. That, plus a one or two other things. Look over the patch below. Alan Stern Index: usb-3.13/drivers/scsi/scsi_scan.c === --- usb-3.13.orig/drivers/scsi/scsi_scan.c +++ usb-3.13/drivers/scsi/scsi_scan.c @@ -334,6 +334,7 @@ static void scsi_target_dev_release(stru struct device *parent = dev-parent; struct scsi_target *starget = to_scsi_target(dev); + WARN_ON(!list_empty(starget-devices)); kfree(starget); put_device(parent); } @@ -481,7 +482,7 @@ void scsi_target_reap(struct scsi_target spin_lock_irqsave(shost-host_lock, flags); state = starget-state; - if (--starget-reap_ref == 0 list_empty(starget-devices)) { + if (--starget-reap_ref == 0) { empty = 1; starget-state = STARGET_DEL; } Index: usb-3.13/drivers/scsi/scsi_sysfs.c === --- usb-3.13.orig/drivers/scsi/scsi_sysfs.c +++ usb-3.13/drivers/scsi/scsi_sysfs.c @@ -369,17 +369,13 @@ static void scsi_device_dev_release_user { struct scsi_device *sdev; struct device *parent; - struct scsi_target *starget; struct list_head *this, *tmp; unsigned long flags; sdev = container_of(work, struct scsi_device, ew.work); - parent = sdev-sdev_gendev.parent; - starget = to_scsi_target(parent); spin_lock_irqsave(sdev-host-host_lock, flags); - starget-reap_ref++; list_del(sdev-siblings); list_del(sdev-same_target_siblings); list_del(sdev-starved_entry); @@ -399,13 +395,10 @@ static void scsi_device_dev_release_user /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); - if (parent) - put_device(parent); + put_device(parent); } static void scsi_device_dev_release(struct device *dev) @@ -1044,6 +1037,8 @@ void __scsi_remove_device(struct scsi_de } else put_device(sdev-sdev_dev); +
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 2013-12-13 at 19:48 -0500, Alan Stern wrote: On Fri, 13 Dec 2013, James Bottomley wrote: On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote: On Fri, 13 Dec 2013, James Bottomley wrote: Actually, I think I have this figured out. There's a thinko in one of the scsi_target_reap() cases. The original (and still existing) problem with targets is that nothing creates them and nothing destroys them, so, while we could rely on the refcounting of the device model to preserve the actual target object, we had no idea when to remove it from visibility. That was the job of the reap reference, to track visibility. It looks like the reap on device last put is occurring too late. I think we should reap immediately after doing the sdev device_del, so does this fix the warn on? (I'm not sure because no-one has actually posted a backtrace, but it sounds like this is the problem). James --- diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 8ff62c2..98d4eb3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) /* NULL queue means the device can't be used */ sdev-request_queue = NULL; - scsi_target_reap(scsi_target(sdev)); - kfree(sdev-inquiry); kfree(sdev); @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev) } else put_device(sdev-sdev_dev); + scsi_target_reap(scsi_target(sdev)); + /* * Stop accepting new requests and wait until all queuecommand() and * scsi_run_queue() invocations have finished before tearing down the This is not right. The problem is that you don't keep track explicitly of the number of references to a target; you rely implicitly on starget-devices being non-empty. starget-reap_ref is only a count of local operations that should block removal. No, it was supposed explicitly to be a visibility counter to answer the question when can we delete the target. It's incremented every time we add a device to the target (and when we do an operation that may remove one to keep an atomic context before we blow it away) and decremented every time we remove one. Sorry, but you're wrong. starget-reap_ref is _not_ incremented every time we add a device to the target. That's one of the things we need to fix. Well, then we would have a pretty astonishing cockup in the code. The found case of scsi_alloc_target increments the reference each time it's called, so scsi_add_device() definitely behaves like this. I suppose it's possible the list_empty() check is covering a miscount in some of the other probing routines, but that would mean we have stale targets for a lot of our use cases. I'll audit the code. James -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Fri, 13 Dec 2013, James Bottomley wrote: Sorry, but you're wrong. starget-reap_ref is _not_ incremented every time we add a device to the target. That's one of the things we need to fix. Well, then we would have a pretty astonishing cockup in the code. The found case of scsi_alloc_target increments the reference each time it's called, so scsi_add_device() definitely behaves like this. You forgot that __scsi_add_device() calls scsi_target_reap() at the end. So the reference count is incremented and then decremented again. It's easy enough to check that the scsi_probe_and_add_lun pathway doesn't elevate the refcount. Print out the value of starget-reap_ref just after __scsi_add_device() calls scsi_alloc_target() and just before it calls scsi_target_reap(). I suppose it's possible the list_empty() check is covering a miscount in some of the other probing routines, but that would mean we have stale targets for a lot of our use cases. I'll audit the code. That's probably right; whenever a target has more than one LUN we must end up leaking the target. In the common case of one LUN it works out, because the list is empty by the time the scsi_device is released. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [usb-storage] UAS hangs khubd on USB disconnect
On Wed, 11 Dec 2013, Sarah Sharp wrote: Hi Hans, I've been testing the UAS code you sent a pull request for against 3.13-rc1, and I've run into a rather nasty issue with USB disconnect. I ran some tests with a USB 3.0 storage device under xHCI. The disk has three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4). There was a btrfs partition on sdb3, but I deleted it. If I start to play a movie on the ext4 partition, and then yank the USB cable, the uas driver is unbound from the device. It looks like something goes wrong in the SCSI layer shortly after that, causing an oops in sysfs_remove_group(). I did a little testing. It turns out this WARN (not an oops) is the result of recent changes to sysfs, combined with the peculiar way the SCSI layer handles targets. In the new kernel, when you call device_del for some object, the object's directory and everything beneath it get removed from sysfs. This wasn't true in the past. When a USB drive is unplugged, almost everything below it gets unregistered. But not the SCSI target -- it remains registered until the number of reap references drops to 0. This doesn't happen until all the devices beneath it are released, which happens when all the open file references are closed and the filesystem is unmounted. So scsi_target_reap_usercontext ends up calling device_del for the target after everything else has been removed from sysfs. As part of normal device_del processing, attribute groups get removed. In particular the power/ subdirectory is removed from the target's sysfs directory. But the sysfs directories are long gone by this time, so sysfs_remove_group complains that it was asked to remove a non-existent subdirectory. Given the way things work now, I suspect these warnings are truly harmless. We could simply get rid of the WARN in sysfs_remove_group. The alternative is to call device_del for SCSI targets earlier on, such as when their hosts are unregistered. I don't know how James would feel about this approach. It would be difficult because targets use their own reference counts instead of relying on the usual device refcounting mechanism. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html