Re: [PATCH] Add support for asynchronous scans to libata

2007-04-02 Thread James Smart

Matt,

Have your resolved the unload race conditions yet?

We'd like to update lpfc for the async scans, but our testing gets blocked
very quickly by the bugs. The bugs are not necessarily specific to lpfc
or to FC.

Stack traces are below. Simple ismod/rmmod loop can trigger them

-- james s



First one is an rmmod while lpfc is active in it's scan_start function.
I think we're in a sleep waiting for the board to be ready when the driver
is yanked out from underneath us.

Second one is an rmmod while do_scsi_scan_host is in a loop calling
scan_finished.

Third one is an rmmod immediately after completion of do_scan_async.


STACK TRACE 1
-

BUG: unable to handle kernel paging request at virtual address f8941836
 printing eip:
f8941836
*pde = 37e59067
Oops:  [#1]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror 
dm_mod video sbs i2c_e6
 printing eip:
f8941836
*pde = 37e59067
 chipreg pcspkr tg3 scsi_transport_fc mptspi mptscsih mptbase 
scsi_transport_spi sd_mod scsi_mod exd
CPU:2
EIP:0060:[f8941836]Not tainted VLI
EFLAGS: 00010246   (2.6.20-rc7-jimp #1)
EIP is at 0xf8941836
eax:    ebx:    ecx: c190   edx: 0286
esi: f7ab6000   edi: f887158d   ebp: f7ab62e8   esp: ecec9f30
ds: 007b   es: 007b   ss: 0068
Process scsi_scan_6 (pid: 15927, ti=ecec9000 task=f2309000 task.ti=ecec9000)
Stack: c04a2183 f65b6cc4 f8963c80 f8963c80 f71e38d0 ffef c04a1cef 8180
   0008 f6e4aa00  0046  f7ab62e8 f7ab6000 f887158d
   f75fa3e0 f7ab62e8 f7ab6000 f887158d f75fa3e0 f894d6dc f89593e4 f7ab6000
Call Trace:
 [c04a2183] sysfs_dirent_exist+0x20/0x5f
 [c04a1cef] sysfs_add_file+0x66/0x70
 [f887158d] do_scan_async+0x0/0x126 [scsi_mod]
 [f887158d] do_scan_async+0x0/0x126 [scsi_mod]
 [f88713e0] do_scsi_scan_host+0x30/0x8b [scsi_mod]
 [f88715aa] do_scan_async+0x1d/0x126 [scsi_mod]
 [c041f7b4] complete+0x39/0x48
 [f887158d] do_scan_async+0x0/0x126 [scsi_mod]
 [c0436924] kthread+0xb0/0xd8
 [c0436874] kthread+0x0/0xd8
 [c0404977] kernel_thread_helper+0x7/0x10
 ===
Code:  Bad EIP value.
EIP: [f8941836] 0xf8941836 SS:ESP 0068:ecec9f30
 0Oops:  [#2]
SMP
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror 
dm_mod video sbs i2c_ed
CPU:3
EIP:0060:[f8941836]Not tainted VLI
EFLAGS: 00010246   (2.6.20-rc7-jimp #1)
EIP is at 0xf8941836
eax:    ebx:    ecx: c1902000   edx: 0286
esi: f74ff000   edi: f887158d   ebp: f74ff2e8   esp: ec3c1f30
ds: 007b   es: 007b   ss: 0068
Process scsi_scan_5 (pid: 15919, ti=ec3c1000 task=f7b92000 task.ti=ec3c1000)
Stack: c04a2183 ece3821c f8963c80 f8963c80 f6b3942c ffef c04a1cef 8180
   0008 f6efea00  0046 0002 f74ff2e8 f74ff000 f887158d
   f7897de0 f74ff2e8 f74ff000 f887158d f7897de0 f894d6dc f89593e4 f74ff000
Call Trace:
 [c04a2183] sysfs_dirent_exist+0x20/0x5f
 [c04a1cef] sysfs_add_file+0x66/0x70
 [f887158d] do_scan_async+0x0/0x126 [scsi_mod]
 [f887158d] do_scan_async+0x0/0x126 [scsi_mod]
 [f88713e0] do_scsi_scan_host+0x30/0x8b [scsi_mod]
 [f88715aa] do_scan_async+0x1d/0x126 [scsi_mod]
 [f887158d] do_scan_async+0x0/0x126 [scsi_mod]
 [c0436924] kthread+0xb0/0xd8
 [c0436874] kthread+0x0/0xd8
 [c0404977] kernel_thread_helper+0x7/0x10
 ===

STACK TRACE 2
-

BUG: unable to handle kernel NULL pointer dereference at virtual address 

 printing eip:
c0548435
*pde = 
Oops:  [#3]
SMP
Modules linked in: lpfc autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 
dm_mirror dm_mod video sbs d
CPU:3
EIP:0060:[c0548435]Not tainted VLI
EFLAGS: 00010202   (2.6.20-rc7-jimp #1)
EIP is at make_class_name+0x27/0x7d
eax:    ebx:    ecx:    edx: 000b
esi: f887c29a   edi:    ebp:    esp: edd27ec4
ds: 007b   es: 007b   ss: 0068
Process fc_wq_7 (pid: 16050, ti=edd27000 task=f68b4550 task.ti=edd27000)
Stack: f65d35f8 f8889e20 f65d35f8 f65d35f0 f8889da0 c0548616  f65d35f0
   f69f0030 f6fdc000  c054869c f65d3400 f88720f8 f65d3400 f69f0030
   f8872152 f65d3400 f69f f88721c9 f6fdc014 f6d1d848 f8872233 f8872247
Call Trace:
 [c0548616] class_device_del+0x8c/0x10a
 [c054869c] class_device_unregister+0x8/0x10
 [f88720f8] __scsi_remove_device+0x1d/0x60 [scsi_mod]
 [f8872152] scsi_remove_device+0x17/0x20 [scsi_mod]
 [f88721c9] __scsi_remove_target+0x6e/0xa1 [scsi_mod]
 [f8872233] __remove_child+0x0/0x18 [scsi_mod]
 [f8872247] __remove_child+0x14/0x18 [scsi_mod]
 [c0545dc0] device_for_each_child+0x1a/0x3c
 [f887222a] scsi_remove_target+0x2e/0x37 [scsi_mod]
 [f88b7ae6] fc_rport_final_delete+0x51/0x9a [scsi_transport_fc]
 [c04339ff] run_workqueue+0x85/0x125
 [c042eb72] do_sigaction+0x10f/0x149
 [f88b7a95] fc_rport_final_delete+0x0/0x9a [scsi_transport_fc]
 [c0434359] worker_thread+0xf9/0x124
 [c0421072] default_wake_function+0x0/0xc
 [c0434260] worker_thread+0x0/0x124
 [c0436924] 

Re: [PATCH] Add support for asynchronous scans to libata

2006-12-12 Thread Jeff Garzik

Matthew Wilcox wrote:

On Mon, Dec 11, 2006 at 10:58:06AM -0500, Jeff Garzik wrote:
The time-consuming portion already takes place in a thread.  Do you mean 
multiple threads?  Or, ATA's scan is in one thread, while work continues 
in other threads?


Patch seems sane, provided that I am educated a bit :)


Each host will be scanned in its own thread.  So boot-up and
initialisation of other devices (eg, USB, networking, PS/2, etc) will
continue.  Drive ordering and numbering is maintained, and the default
configuration right now is to not enable this feature.

We have infrastructure in place to make sure that all discs are found
before we try to start init or autorun RAID.  If there's anywhere we're
missing a call to scsi_complete_async_scans(), it'll affect SCSI as much
as it will IDE, and I'm extremely interested in fixing it.


oh, err, hum.

libata uses one scsihost per port, not one-per-controller, so it won't 
work like you think.


This will conflict with AHCI's staggered spin-up, at least.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for asynchronous scans to libata

2006-12-11 Thread Jeff Garzik

Matthew Wilcox wrote:

Some of the drivers (AHCI was mentioned to me as a culprit) take a long
time to discover all the devices attached to them.  Even for ones which
are relatively quick, if you put a lot of them in a machine, it will
take a long time in aggregate.  This can be fixed by adding support for
asynchronous scsi scans, which causes the time-consuming portions of
initialisation to take place in threads.

Signed-off-by: Matthew Wilcox [EMAIL PROTECTED]


The time-consuming portion already takes place in a thread.  Do you mean 
multiple threads?  Or, ATA's scan is in one thread, while work continues 
in other threads?


Patch seems sane, provided that I am educated a bit :)

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for asynchronous scans to libata

2006-12-11 Thread Matthew Wilcox
On Mon, Dec 11, 2006 at 10:58:06AM -0500, Jeff Garzik wrote:
 The time-consuming portion already takes place in a thread.  Do you mean 
 multiple threads?  Or, ATA's scan is in one thread, while work continues 
 in other threads?
 
 Patch seems sane, provided that I am educated a bit :)

Each host will be scanned in its own thread.  So boot-up and
initialisation of other devices (eg, USB, networking, PS/2, etc) will
continue.  Drive ordering and numbering is maintained, and the default
configuration right now is to not enable this feature.

We have infrastructure in place to make sure that all discs are found
before we try to start init or autorun RAID.  If there's anywhere we're
missing a call to scsi_complete_async_scans(), it'll affect SCSI as much
as it will IDE, and I'm extremely interested in fixing it.

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for asynchronous scans to libata

2006-12-11 Thread Jeff Garzik

Matthew Wilcox wrote:

Some of the drivers (AHCI was mentioned to me as a culprit) take a long
time to discover all the devices attached to them.  Even for ones which
are relatively quick, if you put a lot of them in a machine, it will
take a long time in aggregate.  This can be fixed by adding support for
asynchronous scsi scans, which causes the time-consuming portions of
initialisation to take place in threads.

Signed-off-by: Matthew Wilcox [EMAIL PROTECTED]


ACK.  I tried to apply the patch, but git-applymbox choked on every 
single file modified.  Quite possibly, its due to a whitespace cleanup 
in Alan territory.


If you would either (a) wait several hours for libata-dev.git#upstream 
to mirror out, and resend, or (b) supply a git:// URL to pull from, I'll 
merge.


Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for asynchronous scans to libata

2006-12-11 Thread Matthew Wilcox
On Mon, Dec 11, 2006 at 11:18:17AM -0500, Jeff Garzik wrote:
 ACK.  I tried to apply the patch, but git-applymbox choked on every 
 single file modified.  Quite possibly, its due to a whitespace cleanup 
 in Alan territory.
 
 If you would either (a) wait several hours for libata-dev.git#upstream 
 to mirror out, and resend, or (b) supply a git:// URL to pull from, I'll 
 merge.

Here's a merge against your upstream head.  It's perhaps more of a
conceptual merge as it adds entries to drivers that have been added
since I wrote the patch.  I also added a couple of missing slave_destroy
methods to the newly-added drivers.

Signed-off-by: Matthew Wilcox [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Add support for asynchronous scans to libata

2006-12-11 Thread Matthew Wilcox
On Mon, Dec 11, 2006 at 10:02:40PM -0700, Matthew Wilcox wrote:
 On Mon, Dec 11, 2006 at 11:18:17AM -0500, Jeff Garzik wrote:
  ACK.  I tried to apply the patch, but git-applymbox choked on every 
  single file modified.  Quite possibly, its due to a whitespace cleanup 
  in Alan territory.
  
  If you would either (a) wait several hours for libata-dev.git#upstream 
  to mirror out, and resend, or (b) supply a git:// URL to pull from, I'll 
  merge.
 
 Here's a merge against your upstream head.  It's perhaps more of a
 conceptual merge as it adds entries to drivers that have been added
 since I wrote the patch.  I also added a couple of missing slave_destroy
 methods to the newly-added drivers.
 
 Signed-off-by: Matthew Wilcox [EMAIL PROTECTED]

... this time with patch ...

Add support for asynchronous scans to libata

Some of the drivers (AHCI was mentioned to me as a culprit) take a long
time to discover all the devices attached to them.  Even for ones which
are relatively quick, if you put a lot of them in a machine, it will
take a long time in aggregate.  This can be fixed by adding support for
asynchronous scsi scans, which causes the time-consuming portions of
initialisation to take place in threads.

Signed-off-by: Matthew Wilcox [EMAIL PROTECTED]

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index f36da48..bbbec6d 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -242,6 +242,8 @@ static struct scsi_host_template ahci_sht = {
.dma_boundary   = AHCI_DMA_BOUNDARY,
.slave_configure= ata_scsi_slave_config,
.slave_destroy  = ata_scsi_slave_destroy,
+   .scan_start = ata_scsi_scan_start,
+   .scan_finished  = ata_scsi_scan_finished,
.bios_param = ata_std_bios_param,
.suspend= ata_scsi_device_suspend,
.resume = ata_scsi_device_resume,
diff --git a/drivers/ata/ata_generic.c b/drivers/ata/ata_generic.c
index 908751d..82c182b 100644
--- a/drivers/ata/ata_generic.c
+++ b/drivers/ata/ata_generic.c
@@ -116,6 +116,8 @@ static struct scsi_host_template generic_sht = {
.dma_boundary   = ATA_DMA_BOUNDARY,
.slave_configure= ata_scsi_slave_config,
.slave_destroy  = ata_scsi_slave_destroy,
+   .scan_start = ata_scsi_scan_start,
+   .scan_finished  = ata_scsi_scan_finished,
.bios_param = ata_std_bios_param,
.resume = ata_scsi_device_resume,
.suspend= ata_scsi_device_suspend,
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index 7959e4c..0d5b132 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -274,6 +274,8 @@ static struct scsi_host_template piix_sht = {
.dma_boundary   = ATA_DMA_BOUNDARY,
.slave_configure= ata_scsi_slave_config,
.slave_destroy  = ata_scsi_slave_destroy,
+   .scan_start = ata_scsi_scan_start,
+   .scan_finished  = ata_scsi_scan_finished,
.bios_param = ata_std_bios_param,
.resume = ata_scsi_device_resume,
.suspend= ata_scsi_device_suspend,
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 5fffccd..0d80396 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5712,6 +5712,13 @@ static struct ata_port * ata_port_add(const struct 
ata_probe_ent *ent,
return NULL;
}
 
+   if (!ent-sht-scan_start) {
+   printk(KERN_WARNING ata%u: Overriding NULL scan_start\n,
+   port_no);
+   ent-sht-scan_start = ata_scsi_scan_start;
+   ent-sht-scan_finished = ata_scsi_scan_finished;
+   }
+
shost = scsi_host_alloc(ent-sht, sizeof(struct ata_port));
if (!shost)
return NULL;
@@ -5747,6 +5754,51 @@ void ata_host_init(struct ata_host *host, struct device 
*dev,
host-ops = ops;
 }
 
+void ata_scsi_scan_start(struct Scsi_Host *shost)
+{
+   struct ata_port *ap = ata_shost_to_port(shost);
+   int rc;
+
+   if (ap-ops-error_handler) {
+   struct ata_eh_info *ehi = ap-eh_info;
+   unsigned long flags;
+
+   ata_port_probe(ap);
+
+   /* kick EH for boot probing */
+   spin_lock_irqsave(ap-lock, flags);
+
+   ehi-probe_mask = (1  ATA_MAX_DEVICES) - 1;
+   ehi-action |= ATA_EH_SOFTRESET;
+   ehi-flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
+
+   ap-pflags |= ATA_PFLAG_LOADING;
+   ata_port_schedule_eh(ap);
+
+   spin_unlock_irqrestore(ap-lock, flags);
+
+   /* wait for EH to finish */
+   ata_port_wait_eh(ap);
+   } else {
+   DPRINTK(ata%u: bus probe begin\n, ap-id);
+   rc = ata_bus_probe