Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Sarah Sharp
On Thu, Dec 12, 2013 at 05:04:31PM -0500, Alan Stern wrote:
 On Wed, 11 Dec 2013, Sarah Sharp wrote:
 
  Hi Hans,
  
  I've been testing the UAS code you sent a pull request for against
  3.13-rc1, and I've run into a rather nasty issue with USB disconnect.
  
  I ran some tests with a USB 3.0 storage device under xHCI.  The disk has
  three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4).
  There was a btrfs partition on sdb3, but I deleted it.
  
  If I start to play a movie on the ext4 partition, and then yank the USB
  cable, the uas driver is unbound from the device.  It looks like
  something goes wrong in the SCSI layer shortly after that, causing an
  oops in sysfs_remove_group().
 
 I did a little testing.  It turns out this WARN (not an oops) is the 
 result of recent changes to sysfs, combined with the peculiar way the 
 SCSI layer handles targets.
 
 In the new kernel, when you call device_del for some object, the 
 object's directory and everything beneath it get removed from sysfs.  
 This wasn't true in the past.
 
 When a USB drive is unplugged, almost everything below it gets
 unregistered.  But not the SCSI target -- it remains registered until
 the number of reap references drops to 0.  This doesn't happen until
 all the devices beneath it are released, which happens when all the
 open file references are closed and the filesystem is unmounted.
 
 So scsi_target_reap_usercontext ends up calling device_del for the
 target after everything else has been removed from sysfs.  As part of
 normal device_del processing, attribute groups get removed.  In
 particular the power/ subdirectory is removed from the target's sysfs
 directory.  But the sysfs directories are long gone by this time, so
 sysfs_remove_group complains that it was asked to remove a non-existent
 subdirectory.
 
 Given the way things work now, I suspect these warnings are truly 
 harmless.  We could simply get rid of the WARN in sysfs_remove_group.
 
 The alternative is to call device_del for SCSI targets earlier on, such 
 as when their hosts are unregistered.  I don't know how James would 
 feel about this approach.  It would be difficult because targets use 
 their own reference counts instead of relying on the usual device 
 refcounting mechanism.

Thanks for looking into this.  I think just getting rid of the WARN
would be sufficient.  Can you make a patch for that?

The patch still won't help with the UAS issues with
scsi_init_shared_tag_map though.

Sarah Sharp
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Alan Stern
On Fri, 13 Dec 2013, Sarah Sharp wrote:

  Given the way things work now, I suspect these warnings are truly 
  harmless.  We could simply get rid of the WARN in sysfs_remove_group.
  
  The alternative is to call device_del for SCSI targets earlier on, such 
  as when their hosts are unregistered.  I don't know how James would 
  feel about this approach.  It would be difficult because targets use 
  their own reference counts instead of relying on the usual device 
  refcounting mechanism.
 
 Thanks for looking into this.  I think just getting rid of the WARN
 would be sufficient.  Can you make a patch for that?

Easily.  The downside is that there would no longer be any warning 
when someone tries to remove a wrong subdirectory by mistake.

 The patch still won't help with the UAS issues with
 scsi_init_shared_tag_map though.

I wasn't clear on the reason for that problem.  Does it also arise from 
late device_del for scsi_target?  I could try to change the way that 
works, if anybody (Hans?) would like to test it.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Tejun Heo
Hello, guys.

(cc'ing Greg)

On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
 On Fri, 13 Dec 2013, Sarah Sharp wrote:
 
   Given the way things work now, I suspect these warnings are truly 
   harmless.  We could simply get rid of the WARN in sysfs_remove_group.
   
   The alternative is to call device_del for SCSI targets earlier on, such 
   as when their hosts are unregistered.  I don't know how James would 
   feel about this approach.  It would be difficult because targets use 
   their own reference counts instead of relying on the usual device 
   refcounting mechanism.
  
  Thanks for looking into this.  I think just getting rid of the WARN
  would be sufficient.  Can you make a patch for that?
 
 Easily.  The downside is that there would no longer be any warning 
 when someone tries to remove a wrong subdirectory by mistake.
 
  The patch still won't help with the UAS issues with
  scsi_init_shared_tag_map though.
 
 I wasn't clear on the reason for that problem.  Does it also arise from 
 late device_del for scsi_target?  I could try to change the way that 
 works, if anybody (Hans?) would like to test it.

While the recent sysfs changes made this issue more visible, Greg
wants to make sure that devices are removed from leaf up in all cases
and keep the warning to ensure that.  Would there be a way fix SCSI
removal ordering?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Sarah Sharp
On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
 On Fri, 13 Dec 2013, Sarah Sharp wrote:
 
   Given the way things work now, I suspect these warnings are truly 
   harmless.  We could simply get rid of the WARN in sysfs_remove_group.
   
   The alternative is to call device_del for SCSI targets earlier on, such 
   as when their hosts are unregistered.  I don't know how James would 
   feel about this approach.  It would be difficult because targets use 
   their own reference counts instead of relying on the usual device 
   refcounting mechanism.
  
  Thanks for looking into this.  I think just getting rid of the WARN
  would be sufficient.  Can you make a patch for that?
 
 Easily.  The downside is that there would no longer be any warning 
 when someone tries to remove a wrong subdirectory by mistake.
 
  The patch still won't help with the UAS issues with
  scsi_init_shared_tag_map though.
 
 I wasn't clear on the reason for that problem.  Does it also arise from 
 late device_del for scsi_target?  I could try to change the way that 
 works, if anybody (Hans?) would like to test it.

I can certainly test it with my UAS device as well.  I don't know if the
issue arises from the late device_del.  Looking at Hans' stack trace,
the BUG in blk_free_tags gets triggered when the scsi_host is released
before the block_queue release.  So I don't think moving the scsi_target
delete sooner would help?  I really don't know anything about the SCSI
or block layer though.

I can confirm that simply removing the BUG() call in blk_free_tags
allows the partitions on the UAS device to be mounted after it was
hot-removed in the middle of video playback.  Hans, maybe in order to
get an answer to your question[1], you should submit a patch to the
block layer maintainer, Jens Axboe?

Sarah Sharp

[1] http://www.spinics.net/lists/linux-scsi/msg70002.html
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread James Bottomley
On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote:
 Hello, guys.
 
 (cc'ing Greg)
 
 On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
  On Fri, 13 Dec 2013, Sarah Sharp wrote:
  
Given the way things work now, I suspect these warnings are truly 
harmless.  We could simply get rid of the WARN in sysfs_remove_group.

The alternative is to call device_del for SCSI targets earlier on, such 
as when their hosts are unregistered.  I don't know how James would 
feel about this approach.  It would be difficult because targets use 
their own reference counts instead of relying on the usual device 
refcounting mechanism.
   
   Thanks for looking into this.  I think just getting rid of the WARN
   would be sufficient.  Can you make a patch for that?
  
  Easily.  The downside is that there would no longer be any warning 
  when someone tries to remove a wrong subdirectory by mistake.
  
   The patch still won't help with the UAS issues with
   scsi_init_shared_tag_map though.
  
  I wasn't clear on the reason for that problem.  Does it also arise from 
  late device_del for scsi_target?  I could try to change the way that 
  works, if anybody (Hans?) would like to test it.
 
 While the recent sysfs changes made this issue more visible, Greg
 wants to make sure that devices are removed from leaf up in all cases
 and keep the warning to ensure that.  Would there be a way fix SCSI
 removal ordering?

Could someone analyse the actual problem?  We're quite careful even on
host remove to iterate and remove all the devices, then targets, then
host (and allied transport objects).  Which removal is inverted?

James


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread James Bottomley
On Fri, 2013-12-13 at 11:18 -0800, James Bottomley wrote:
 On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote:
  Hello, guys.
  
  (cc'ing Greg)
  
  On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:
   On Fri, 13 Dec 2013, Sarah Sharp wrote:
   
 Given the way things work now, I suspect these warnings are truly 
 harmless.  We could simply get rid of the WARN in sysfs_remove_group.
 
 The alternative is to call device_del for SCSI targets earlier on, 
 such 
 as when their hosts are unregistered.  I don't know how James would 
 feel about this approach.  It would be difficult because targets use 
 their own reference counts instead of relying on the usual device 
 refcounting mechanism.

Thanks for looking into this.  I think just getting rid of the WARN
would be sufficient.  Can you make a patch for that?
   
   Easily.  The downside is that there would no longer be any warning 
   when someone tries to remove a wrong subdirectory by mistake.
   
The patch still won't help with the UAS issues with
scsi_init_shared_tag_map though.
   
   I wasn't clear on the reason for that problem.  Does it also arise from 
   late device_del for scsi_target?  I could try to change the way that 
   works, if anybody (Hans?) would like to test it.
  
  While the recent sysfs changes made this issue more visible, Greg
  wants to make sure that devices are removed from leaf up in all cases
  and keep the warning to ensure that.  Would there be a way fix SCSI
  removal ordering?
 
 Could someone analyse the actual problem?  We're quite careful even on
 host remove to iterate and remove all the devices, then targets, then
 host (and allied transport objects).  Which removal is inverted?

Actually, I think I have this figured out.  There's a thinko in one of
the scsi_target_reap() cases.  The original (and still existing) problem
with targets is that nothing creates them and nothing destroys them, so,
while we could rely on the refcounting of the device model to preserve
the actual target object, we had no idea when to remove it from
visibility.  That was the job of the reap reference, to track
visibility.  It looks like the reap on device last put is occurring too
late.  I think we should reap immediately after doing the sdev
device_del, so does this fix the warn on? (I'm not sure because no-one
has actually posted a backtrace, but it sounds like this is the
problem).

James

---

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..98d4eb3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
/* NULL queue means the device can't be used */
sdev-request_queue = NULL;
 
-   scsi_target_reap(scsi_target(sdev));
-
kfree(sdev-inquiry);
kfree(sdev);
 
@@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
} else
put_device(sdev-sdev_dev);
 
+   scsi_target_reap(scsi_target(sdev));
+
/*
 * Stop accepting new requests and wait until all queuecommand() and
 * scsi_run_queue() invocations have finished before tearing down the


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Alan Stern
On Fri, 13 Dec 2013, James Bottomley wrote:

   I wasn't clear on the reason for that problem.  Does it also arise from 
   late device_del for scsi_target?  I could try to change the way that 
   works, if anybody (Hans?) would like to test it.
  
  While the recent sysfs changes made this issue more visible, Greg
  wants to make sure that devices are removed from leaf up in all cases
  and keep the warning to ensure that.  Would there be a way fix SCSI
  removal ordering?
 
 Could someone analyse the actual problem?  We're quite careful even on
 host remove to iterate and remove all the devices, then targets, then
 host (and allied transport objects).  Which removal is inverted?

The scsi_host is removed before the scsi_target.

The reason is that scsi_remove_host() calls
device_del(shost-shost_gendev) directly, but
scsi_target_reap_usercontext() doesn't call device_del(starget-dev)
until it gets invoked (indirectly) from
scsi_device_dev_release_usercontext(), by way of scsi_target_reap().

Thus, the host gets removed from visibility at the appropriate time,
but the target remains until all the scsi_devices beneath it are not
only removed from visibility but also released (their refcounts drop to
0).

This can occur much later if, for example, a scsi_device holds a
mounted filesystem.  The scsi_host and scsi_device are removed when the
underlying USB device is unplugged.  But the scsi_device isn't
released, and hence the scsi_target isn't removed, until the filesystem
is unmounted.

Broadly speaking, the correct approach would be to call
scsi_target_reap() from __scsi_remove_device() instead of from
scsi_device_dev_release_usercontext().

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Hans de Goede

Hi James,

On 12/13/2013 09:03 PM, James Bottomley wrote:

On Fri, 2013-12-13 at 11:18 -0800, James Bottomley wrote:

On Fri, 2013-12-13 at 13:33 -0500, Tejun Heo wrote:

Hello, guys.

(cc'ing Greg)

On Fri, Dec 13, 2013 at 01:19:36PM -0500, Alan Stern wrote:

On Fri, 13 Dec 2013, Sarah Sharp wrote:


Given the way things work now, I suspect these warnings are truly
harmless.  We could simply get rid of the WARN in sysfs_remove_group

The alternative is to call device_del for SCSI targets earlier on, such
as when their hosts are unregistered.  I don't know how James would
feel about this approach.  It would be difficult because targets use
their own reference counts instead of relying on the usual device
refcounting mechanism.


Thanks for looking into this.  I think just getting rid of the WARN
would be sufficient.  Can you make a patch for that?


Easily.  The downside is that there would no longer be any warning
when someone tries to remove a wrong subdirectory by mistake.


The patch still won't help with the UAS issues with
scsi_init_shared_tag_map though.


I wasn't clear on the reason for that problem.  Does it also arise from
late device_del for scsi_target?  I could try to change the way that
works, if anybody (Hans?) would like to test it.


While the recent sysfs changes made this issue more visible, Greg
wants to make sure that devices are removed from leaf up in all cases
and keep the warning to ensure that.  Would there be a way fix SCSI
removal ordering?


Could someone analyse the actual problem?  We're quite careful even on
host remove to iterate and remove all the devices, then targets, then
host (and allied transport objects).  Which removal is inverted?


Actually, I think I have this figured out.  There's a thinko in one of
the scsi_target_reap() cases.  The original (and still existing) problem
with targets is that nothing creates them and nothing destroys them, so,
while we could rely on the refcounting of the device model to preserve
the actual target object, we had no idea when to remove it from
visibility.  That was the job of the reap reference, to track
visibility.  It looks like the reap on device last put is occurring too
late.  I think we should reap immediately after doing the sdev
device_del, so does this fix the warn on? (I'm not sure because no-one
has actually posted a backtrace, but it sounds like this is the
problem).


Thanks I'll give this patch a try. As for backtraces I've posted some
(partial) backtraces as well as reproduction instructions here:
http://www.spinics.net/lists/linux-scsi/msg70002.html

Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Alan Stern
On Fri, 13 Dec 2013, James Bottomley wrote:

 Actually, I think I have this figured out.  There's a thinko in one of
 the scsi_target_reap() cases.  The original (and still existing) problem
 with targets is that nothing creates them and nothing destroys them, so,
 while we could rely on the refcounting of the device model to preserve
 the actual target object, we had no idea when to remove it from
 visibility.  That was the job of the reap reference, to track
 visibility.  It looks like the reap on device last put is occurring too
 late.  I think we should reap immediately after doing the sdev
 device_del, so does this fix the warn on? (I'm not sure because no-one
 has actually posted a backtrace, but it sounds like this is the
 problem).
 
 James
 
 ---
 
 diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
 index 8ff62c2..98d4eb3 100644
 --- a/drivers/scsi/scsi_sysfs.c
 +++ b/drivers/scsi/scsi_sysfs.c
 @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct 
 work_struct *work)
   /* NULL queue means the device can't be used */
   sdev-request_queue = NULL;
  
 - scsi_target_reap(scsi_target(sdev));
 -
   kfree(sdev-inquiry);
   kfree(sdev);
  
 @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
   } else
   put_device(sdev-sdev_dev);
  
 + scsi_target_reap(scsi_target(sdev));
 +
   /*
* Stop accepting new requests and wait until all queuecommand() and
* scsi_run_queue() invocations have finished before tearing down the

This is not right.  The problem is that you don't keep track explicitly 
of the number of references to a target; you rely implicitly on 
starget-devices being non-empty.  starget-reap_ref is only a count of 
local operations that should block removal.

Consider, for example, what would happen if there is more than one LUN.  
What if one of them is removed while the other remains?

A more invasive change is needed.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Sarah Sharp
On Fri, Dec 13, 2013 at 12:03:19PM -0800, James Bottomley wrote:
 Actually, I think I have this figured out.  There's a thinko in one of
 the scsi_target_reap() cases.  The original (and still existing) problem
 with targets is that nothing creates them and nothing destroys them, so,
 while we could rely on the refcounting of the device model to preserve
 the actual target object, we had no idea when to remove it from
 visibility.  That was the job of the reap reference, to track
 visibility.  It looks like the reap on device last put is occurring too
 late.  I think we should reap immediately after doing the sdev
 device_del, so does this fix the warn on? (I'm not sure because no-one
 has actually posted a backtrace, but it sounds like this is the
 problem).

I can confirm that this patch fixes both the sysfs warning, and the
issue with USB storage disconnect during video playback.  I did trigger
a new (possibly unrelated?) mutex deadlock warning.  dmesg is attached.

Sarah Sharp

 ---
 
 diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
 index 8ff62c2..98d4eb3 100644
 --- a/drivers/scsi/scsi_sysfs.c
 +++ b/drivers/scsi/scsi_sysfs.c
 @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct 
 work_struct *work)
   /* NULL queue means the device can't be used */
   sdev-request_queue = NULL;
  
 - scsi_target_reap(scsi_target(sdev));
 -
   kfree(sdev-inquiry);
   kfree(sdev);
  
 @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
   } else
   put_device(sdev-sdev_dev);
  
 + scsi_target_reap(scsi_target(sdev));
 +
   /*
* Stop accepting new requests and wait until all queuecommand() and
* scsi_run_queue() invocations have finished before tearing down the
 
 
Dec 13 13:02:02 xanatos kernel: [7.029300] usb usb4: bus auto-suspend, 
wakeup 1
Dec 13 13:02:02 xanatos kernel: [7.040327] input: SynPS/2 Synaptics 
TouchPad as /devices/platform/i8042/serio1/input/input11
Dec 13 13:02:02 xanatos kernel: [7.112065] btusb 3-1.4:1.0: 
usb_probe_interface
Dec 13 13:02:02 xanatos kernel: [7.112070] btusb 3-1.4:1.0: 
usb_probe_interface - got id
Dec 13 13:02:02 xanatos kernel: [7.122731] usbcore: registered new 
interface driver btusb
Dec 13 13:02:02 xanatos kernel: [7.167710] Linux video capture interface: 
v2.00
Dec 13 13:02:02 xanatos kernel: [7.235181] uvcvideo 3-1.6:1.0: 
usb_probe_interface
Dec 13 13:02:02 xanatos kernel: [7.235187] uvcvideo 3-1.6:1.0: 
usb_probe_interface - got id
Dec 13 13:02:02 xanatos kernel: [7.235293] uvcvideo: Found UVC 1.00 device 
Integrated Camera (04f2:b2ea)
Dec 13 13:02:02 xanatos kernel: [7.242661] input: Integrated Camera as 
/devices/pci:00/:00:1a.0/usb3/3-1/3-1.6/3-1.6:1.0/input/input20
Dec 13 13:02:02 xanatos kernel: [7.244470] usbcore: registered new 
interface driver uvcvideo
Dec 13 13:02:02 xanatos kernel: [7.244473] USB Video Class driver (1.1.1)
Dec 13 13:02:03 xanatos kernel: [8.044806] bio: create slab bio-2 at 2
Dec 13 13:02:03 xanatos kernel: [8.261355] Adding 4085756k swap on 
/dev/mapper/cryptswap1.  Priority:-1 extents:1 across:4085756k SSFS
Dec 13 13:02:03 xanatos kernel: [8.407323] e1000e :00:19.0: irq 44 for 
MSI/MSI-X
Dec 13 13:02:03 xanatos kernel: [8.510442] e1000e :00:19.0: irq 44 for 
MSI/MSI-X
Dec 13 13:02:03 xanatos kernel: [8.510945] IPv6: ADDRCONF(NETDEV_UP): eth1: 
link is not ready
Dec 13 13:02:03 xanatos kernel: [8.516037] iwlwifi :03:00.0: L1 
Enabled; Disabling L0S
Dec 13 13:02:03 xanatos kernel: [8.517364] iwlwifi :03:00.0: Radio 
type=0x1-0x2-0x0
Dec 13 13:02:03 xanatos kernel: [8.785685] iwlwifi :03:00.0: L1 
Enabled; Disabling L0S
Dec 13 13:02:03 xanatos kernel: [8.792724] iwlwifi :03:00.0: Radio 
type=0x1-0x2-0x0
Dec 13 13:02:04 xanatos kernel: [8.876409] IPv6: ADDRCONF(NETDEV_UP): 
wlan1: link is not ready
Dec 13 13:02:04 xanatos kernel: [9.787586] usb 3-1.6: usb auto-suspend, 
wakeup 0
Dec 13 13:02:05 xanatos kernel: [9.910341] psmouse serio2: alps: Unknown 
ALPS touchpad: E7=10 00 64, EC=10 00 64
Dec 13 13:02:06 xanatos kernel: [   11.169530] psmouse serio2: trackpoint: IBM 
TrackPoint firmware: 0x0e, buttons: 3/3
Dec 13 13:02:06 xanatos kernel: [   11.375342] input: TPPS/2 IBM TrackPoint as 
/devices/platform/i8042/serio1/serio2/input/input19
Dec 13 13:02:07 xanatos kernel: [   11.917809] e1000e: eth1 NIC Link is Up 1000 
Mbps Full Duplex, Flow Control: Rx/Tx
Dec 13 13:02:07 xanatos kernel: [   11.917863] IPv6: ADDRCONF(NETDEV_CHANGE): 
eth1: link becomes ready
Dec 13 13:02:12 xanatos kernel: [   17.125215] 
Dec 13 13:02:12 xanatos kernel: [   17.125218] 
==
Dec 13 13:02:12 xanatos kernel: [   17.125219] [ INFO: possible circular 
locking dependency detected ]
Dec 13 13:02:12 xanatos kernel: [   17.125221] 3.13.0-rc1+ #140 Not tainted
Dec 13 13:02:12 

Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread James Bottomley
On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote:
 On Fri, 13 Dec 2013, James Bottomley wrote:
 
  Actually, I think I have this figured out.  There's a thinko in one of
  the scsi_target_reap() cases.  The original (and still existing) problem
  with targets is that nothing creates them and nothing destroys them, so,
  while we could rely on the refcounting of the device model to preserve
  the actual target object, we had no idea when to remove it from
  visibility.  That was the job of the reap reference, to track
  visibility.  It looks like the reap on device last put is occurring too
  late.  I think we should reap immediately after doing the sdev
  device_del, so does this fix the warn on? (I'm not sure because no-one
  has actually posted a backtrace, but it sounds like this is the
  problem).
  
  James
  
  ---
  
  diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
  index 8ff62c2..98d4eb3 100644
  --- a/drivers/scsi/scsi_sysfs.c
  +++ b/drivers/scsi/scsi_sysfs.c
  @@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct 
  work_struct *work)
  /* NULL queue means the device can't be used */
  sdev-request_queue = NULL;
   
  -   scsi_target_reap(scsi_target(sdev));
  -
  kfree(sdev-inquiry);
  kfree(sdev);
   
  @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
  } else
  put_device(sdev-sdev_dev);
   
  +   scsi_target_reap(scsi_target(sdev));
  +
  /*
   * Stop accepting new requests and wait until all queuecommand() and
   * scsi_run_queue() invocations have finished before tearing down the
 
 This is not right.  The problem is that you don't keep track explicitly 
 of the number of references to a target; you rely implicitly on 
 starget-devices being non-empty.  starget-reap_ref is only a count of 
 local operations that should block removal.

No, it was supposed explicitly to be a visibility counter to answer the
question when can we delete the target.  It's incremented every time we
add a device to the target (and when we do an operation that may remove
one to keep an atomic context before we blow it away) and decremented
every time we remove one.

 Consider, for example, what would happen if there is more than one LUN.  
 What if one of them is removed while the other remains?

Then the reap reference remains above zero and the target stays.

 A more invasive change is needed.

I think you might be right in that we need to kill the list_empty check,
but I think that should be it.

James



--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Hans de Goede

Hi,

On 12/13/2013 09:03 PM, James Bottomley wrote:

snip


Actually, I think I have this figured out.  There's a thinko in one of
the scsi_target_reap() cases.  The original (and still existing) problem
with targets is that nothing creates them and nothing destroys them, so,
while we could rely on the refcounting of the device model to preserve
the actual target object, we had no idea when to remove it from
visibility.  That was the job of the reap reference, to track
visibility.  It looks like the reap on device last put is occurring too
late.  I think we should reap immediately after doing the sdev
device_del, so does this fix the warn on? (I'm not sure because no-one
has actually posted a backtrace, but it sounds like this is the
problem).

James

---

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..98d4eb3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
/* NULL queue means the device can't be used */
sdev-request_queue = NULL;

-   scsi_target_reap(scsi_target(sdev));
-
kfree(sdev-inquiry);
kfree(sdev);

@@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
} else
put_device(sdev-sdev_dev);

+   scsi_target_reap(scsi_target(sdev));
+
/*
 * Stop accepting new requests and wait until all queuecommand() and
 * scsi_run_queue() invocations have finished before tearing down the


I've given this patch a try and it fixes the blk-tag.c: 89 BUG() I was seeing.

As for the other patch you (James) have send for that problem:

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..98d4eb3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,8 +399,6 @@ static void scsi_device_dev_release_usercontext(struct 
work_struct *work)
/* NULL queue means the device can't be used */
sdev-request_queue = NULL;

-   scsi_target_reap(scsi_target(sdev));
-
kfree(sdev-inquiry);
kfree(sdev);

@@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
} else
put_device(sdev-sdev_dev);

+   scsi_target_reap(scsi_target(sdev));
+
/*
 * Stop accepting new requests and wait until all queuecommand() and
 * scsi_run_queue() invocations have finished before tearing down the

That too fixes the blk-tag.c: 89 BUG() I was seeing. Either patch by itself
seems to be enough to fix this issue for me.

Thanks  Regards,

Hans
--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Alan Stern
On Fri, 13 Dec 2013, James Bottomley wrote:

 On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote:
  On Fri, 13 Dec 2013, James Bottomley wrote:
  
   Actually, I think I have this figured out.  There's a thinko in one of
   the scsi_target_reap() cases.  The original (and still existing) problem
   with targets is that nothing creates them and nothing destroys them, so,
   while we could rely on the refcounting of the device model to preserve
   the actual target object, we had no idea when to remove it from
   visibility.  That was the job of the reap reference, to track
   visibility.  It looks like the reap on device last put is occurring too
   late.  I think we should reap immediately after doing the sdev
   device_del, so does this fix the warn on? (I'm not sure because no-one
   has actually posted a backtrace, but it sounds like this is the
   problem).
   
   James
   
   ---
   
   diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
   index 8ff62c2..98d4eb3 100644
   --- a/drivers/scsi/scsi_sysfs.c
   +++ b/drivers/scsi/scsi_sysfs.c
   @@ -399,8 +399,6 @@ static void 
   scsi_device_dev_release_usercontext(struct work_struct *work)
 /* NULL queue means the device can't be used */
 sdev-request_queue = NULL;

   - scsi_target_reap(scsi_target(sdev));
   -
 kfree(sdev-inquiry);
 kfree(sdev);

   @@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
 } else
 put_device(sdev-sdev_dev);

   + scsi_target_reap(scsi_target(sdev));
   +
 /*
  * Stop accepting new requests and wait until all queuecommand() and
  * scsi_run_queue() invocations have finished before tearing down the
  
  This is not right.  The problem is that you don't keep track explicitly 
  of the number of references to a target; you rely implicitly on 
  starget-devices being non-empty.  starget-reap_ref is only a count of 
  local operations that should block removal.
 
 No, it was supposed explicitly to be a visibility counter to answer the
 question when can we delete the target.  It's incremented every time we
 add a device to the target (and when we do an operation that may remove
 one to keep an atomic context before we blow it away) and decremented
 every time we remove one.

Sorry, but you're wrong.  starget-reap_ref is _not_ incremented every 
time we add a device to the target.  That's one of the things we need to 
fix.

  Consider, for example, what would happen if there is more than one LUN.  
  What if one of them is removed while the other remains?
 
 Then the reap reference remains above zero and the target stays.
 
  A more invasive change is needed.
 
 I think you might be right in that we need to kill the list_empty check,
 but I think that should be it.

That, plus a one or two other things.  Look over the patch below.

Alan Stern



Index: usb-3.13/drivers/scsi/scsi_scan.c
===
--- usb-3.13.orig/drivers/scsi/scsi_scan.c
+++ usb-3.13/drivers/scsi/scsi_scan.c
@@ -334,6 +334,7 @@ static void scsi_target_dev_release(stru
struct device *parent = dev-parent;
struct scsi_target *starget = to_scsi_target(dev);
 
+   WARN_ON(!list_empty(starget-devices));
kfree(starget);
put_device(parent);
 }
@@ -481,7 +482,7 @@ void scsi_target_reap(struct scsi_target
 
spin_lock_irqsave(shost-host_lock, flags);
state = starget-state;
-   if (--starget-reap_ref == 0  list_empty(starget-devices)) {
+   if (--starget-reap_ref == 0) {
empty = 1;
starget-state = STARGET_DEL;
}
Index: usb-3.13/drivers/scsi/scsi_sysfs.c
===
--- usb-3.13.orig/drivers/scsi/scsi_sysfs.c
+++ usb-3.13/drivers/scsi/scsi_sysfs.c
@@ -369,17 +369,13 @@ static void scsi_device_dev_release_user
 {
struct scsi_device *sdev;
struct device *parent;
-   struct scsi_target *starget;
struct list_head *this, *tmp;
unsigned long flags;
 
sdev = container_of(work, struct scsi_device, ew.work);
-
parent = sdev-sdev_gendev.parent;
-   starget = to_scsi_target(parent);
 
spin_lock_irqsave(sdev-host-host_lock, flags);
-   starget-reap_ref++;
list_del(sdev-siblings);
list_del(sdev-same_target_siblings);
list_del(sdev-starved_entry);
@@ -399,13 +395,10 @@ static void scsi_device_dev_release_user
/* NULL queue means the device can't be used */
sdev-request_queue = NULL;
 
-   scsi_target_reap(scsi_target(sdev));
-
kfree(sdev-inquiry);
kfree(sdev);
 
-   if (parent)
-   put_device(parent);
+   put_device(parent);
 }
 
 static void scsi_device_dev_release(struct device *dev)
@@ -1044,6 +1037,8 @@ void __scsi_remove_device(struct scsi_de
} else
put_device(sdev-sdev_dev);
 
+   

Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread James Bottomley
On Fri, 2013-12-13 at 19:48 -0500, Alan Stern wrote:
 On Fri, 13 Dec 2013, James Bottomley wrote:
 
  On Fri, 2013-12-13 at 16:06 -0500, Alan Stern wrote:
   On Fri, 13 Dec 2013, James Bottomley wrote:
   
Actually, I think I have this figured out.  There's a thinko in one of
the scsi_target_reap() cases.  The original (and still existing) problem
with targets is that nothing creates them and nothing destroys them, so,
while we could rely on the refcounting of the device model to preserve
the actual target object, we had no idea when to remove it from
visibility.  That was the job of the reap reference, to track
visibility.  It looks like the reap on device last put is occurring too
late.  I think we should reap immediately after doing the sdev
device_del, so does this fix the warn on? (I'm not sure because no-one
has actually posted a backtrace, but it sounds like this is the
problem).

James

---

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 8ff62c2..98d4eb3 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -399,8 +399,6 @@ static void 
scsi_device_dev_release_usercontext(struct work_struct *work)
/* NULL queue means the device can't be used */
sdev-request_queue = NULL;
 
-   scsi_target_reap(scsi_target(sdev));
-
kfree(sdev-inquiry);
kfree(sdev);
 
@@ -1044,6 +1042,8 @@ void __scsi_remove_device(struct scsi_device 
*sdev)
} else
put_device(sdev-sdev_dev);
 
+   scsi_target_reap(scsi_target(sdev));
+
/*
 * Stop accepting new requests and wait until all 
queuecommand() and
 * scsi_run_queue() invocations have finished before tearing 
down the
   
   This is not right.  The problem is that you don't keep track explicitly 
   of the number of references to a target; you rely implicitly on 
   starget-devices being non-empty.  starget-reap_ref is only a count of 
   local operations that should block removal.
  
  No, it was supposed explicitly to be a visibility counter to answer the
  question when can we delete the target.  It's incremented every time we
  add a device to the target (and when we do an operation that may remove
  one to keep an atomic context before we blow it away) and decremented
  every time we remove one.
 
 Sorry, but you're wrong.  starget-reap_ref is _not_ incremented every 
 time we add a device to the target.  That's one of the things we need to 
 fix.

Well, then we would have a pretty astonishing cockup in the code.  The
found case of scsi_alloc_target increments the reference each time it's
called, so scsi_add_device() definitely behaves like this.  I suppose
it's possible the list_empty() check is covering a miscount in some of
the other probing routines, but that would mean we have stale targets
for a lot of our use cases.  I'll audit the code.

James


--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-13 Thread Alan Stern
On Fri, 13 Dec 2013, James Bottomley wrote:

  Sorry, but you're wrong.  starget-reap_ref is _not_ incremented every 
  time we add a device to the target.  That's one of the things we need to 
  fix.
 
 Well, then we would have a pretty astonishing cockup in the code.  The
 found case of scsi_alloc_target increments the reference each time it's
 called, so scsi_add_device() definitely behaves like this.

You forgot that __scsi_add_device() calls scsi_target_reap() at the 
end.  So the reference count is incremented and then decremented again.

It's easy enough to check that the scsi_probe_and_add_lun pathway
doesn't elevate the refcount.  Print out the value of starget-reap_ref
just after __scsi_add_device() calls scsi_alloc_target() and just
before it calls scsi_target_reap().

  I suppose
 it's possible the list_empty() check is covering a miscount in some of
 the other probing routines, but that would mean we have stale targets
 for a lot of our use cases.  I'll audit the code.

That's probably right; whenever a target has more than one LUN we must 
end up leaking the target.  In the common case of one LUN it works out, 
because the list is empty by the time the scsi_device is released.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [usb-storage] UAS hangs khubd on USB disconnect

2013-12-12 Thread Alan Stern
On Wed, 11 Dec 2013, Sarah Sharp wrote:

 Hi Hans,
 
 I've been testing the UAS code you sent a pull request for against
 3.13-rc1, and I've run into a rather nasty issue with USB disconnect.
 
 I ran some tests with a USB 3.0 storage device under xHCI.  The disk has
 three 10GB partitions: ext3 (sdb1), ext4 (sdb2), and fat32 (sdb4).
 There was a btrfs partition on sdb3, but I deleted it.
 
 If I start to play a movie on the ext4 partition, and then yank the USB
 cable, the uas driver is unbound from the device.  It looks like
 something goes wrong in the SCSI layer shortly after that, causing an
 oops in sysfs_remove_group().

I did a little testing.  It turns out this WARN (not an oops) is the 
result of recent changes to sysfs, combined with the peculiar way the 
SCSI layer handles targets.

In the new kernel, when you call device_del for some object, the 
object's directory and everything beneath it get removed from sysfs.  
This wasn't true in the past.

When a USB drive is unplugged, almost everything below it gets
unregistered.  But not the SCSI target -- it remains registered until
the number of reap references drops to 0.  This doesn't happen until
all the devices beneath it are released, which happens when all the
open file references are closed and the filesystem is unmounted.

So scsi_target_reap_usercontext ends up calling device_del for the
target after everything else has been removed from sysfs.  As part of
normal device_del processing, attribute groups get removed.  In
particular the power/ subdirectory is removed from the target's sysfs
directory.  But the sysfs directories are long gone by this time, so
sysfs_remove_group complains that it was asked to remove a non-existent
subdirectory.

Given the way things work now, I suspect these warnings are truly 
harmless.  We could simply get rid of the WARN in sysfs_remove_group.

The alternative is to call device_del for SCSI targets earlier on, such 
as when their hosts are unregistered.  I don't know how James would 
feel about this approach.  It would be difficult because targets use 
their own reference counts instead of relying on the usual device 
refcounting mechanism.

Alan Stern

--
To unsubscribe from this list: send the line unsubscribe linux-usb in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html