Re: [PATCH] Add support for asynchronous scans to libata
Matt, Have your resolved the unload race conditions yet? We'd like to update lpfc for the async scans, but our testing gets blocked very quickly by the bugs. The bugs are not necessarily specific to lpfc or to FC. Stack traces are below. Simple ismod/rmmod loop can trigger them -- james s First one is an rmmod while lpfc is active in it's scan_start function. I think we're in a sleep waiting for the board to be ready when the driver is yanked out from underneath us. Second one is an rmmod while do_scsi_scan_host is in a loop calling scan_finished. Third one is an rmmod immediately after completion of do_scan_async. STACK TRACE 1 - BUG: unable to handle kernel paging request at virtual address f8941836 printing eip: f8941836 *pde = 37e59067 Oops: [#1] SMP Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror dm_mod video sbs i2c_e6 printing eip: f8941836 *pde = 37e59067 chipreg pcspkr tg3 scsi_transport_fc mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod exd CPU:2 EIP:0060:[f8941836]Not tainted VLI EFLAGS: 00010246 (2.6.20-rc7-jimp #1) EIP is at 0xf8941836 eax: ebx: ecx: c190 edx: 0286 esi: f7ab6000 edi: f887158d ebp: f7ab62e8 esp: ecec9f30 ds: 007b es: 007b ss: 0068 Process scsi_scan_6 (pid: 15927, ti=ecec9000 task=f2309000 task.ti=ecec9000) Stack: c04a2183 f65b6cc4 f8963c80 f8963c80 f71e38d0 ffef c04a1cef 8180 0008 f6e4aa00 0046 f7ab62e8 f7ab6000 f887158d f75fa3e0 f7ab62e8 f7ab6000 f887158d f75fa3e0 f894d6dc f89593e4 f7ab6000 Call Trace: [c04a2183] sysfs_dirent_exist+0x20/0x5f [c04a1cef] sysfs_add_file+0x66/0x70 [f887158d] do_scan_async+0x0/0x126 [scsi_mod] [f887158d] do_scan_async+0x0/0x126 [scsi_mod] [f88713e0] do_scsi_scan_host+0x30/0x8b [scsi_mod] [f88715aa] do_scan_async+0x1d/0x126 [scsi_mod] [c041f7b4] complete+0x39/0x48 [f887158d] do_scan_async+0x0/0x126 [scsi_mod] [c0436924] kthread+0xb0/0xd8 [c0436874] kthread+0x0/0xd8 [c0404977] kernel_thread_helper+0x7/0x10 === Code: Bad EIP value. EIP: [f8941836] 0xf8941836 SS:ESP 0068:ecec9f30 0Oops: [#2] SMP Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror dm_mod video sbs i2c_ed CPU:3 EIP:0060:[f8941836]Not tainted VLI EFLAGS: 00010246 (2.6.20-rc7-jimp #1) EIP is at 0xf8941836 eax: ebx: ecx: c1902000 edx: 0286 esi: f74ff000 edi: f887158d ebp: f74ff2e8 esp: ec3c1f30 ds: 007b es: 007b ss: 0068 Process scsi_scan_5 (pid: 15919, ti=ec3c1000 task=f7b92000 task.ti=ec3c1000) Stack: c04a2183 ece3821c f8963c80 f8963c80 f6b3942c ffef c04a1cef 8180 0008 f6efea00 0046 0002 f74ff2e8 f74ff000 f887158d f7897de0 f74ff2e8 f74ff000 f887158d f7897de0 f894d6dc f89593e4 f74ff000 Call Trace: [c04a2183] sysfs_dirent_exist+0x20/0x5f [c04a1cef] sysfs_add_file+0x66/0x70 [f887158d] do_scan_async+0x0/0x126 [scsi_mod] [f887158d] do_scan_async+0x0/0x126 [scsi_mod] [f88713e0] do_scsi_scan_host+0x30/0x8b [scsi_mod] [f88715aa] do_scan_async+0x1d/0x126 [scsi_mod] [f887158d] do_scan_async+0x0/0x126 [scsi_mod] [c0436924] kthread+0xb0/0xd8 [c0436874] kthread+0x0/0xd8 [c0404977] kernel_thread_helper+0x7/0x10 === STACK TRACE 2 - BUG: unable to handle kernel NULL pointer dereference at virtual address printing eip: c0548435 *pde = Oops: [#3] SMP Modules linked in: lpfc autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_mirror dm_mod video sbs d CPU:3 EIP:0060:[c0548435]Not tainted VLI EFLAGS: 00010202 (2.6.20-rc7-jimp #1) EIP is at make_class_name+0x27/0x7d eax: ebx: ecx: edx: 000b esi: f887c29a edi: ebp: esp: edd27ec4 ds: 007b es: 007b ss: 0068 Process fc_wq_7 (pid: 16050, ti=edd27000 task=f68b4550 task.ti=edd27000) Stack: f65d35f8 f8889e20 f65d35f8 f65d35f0 f8889da0 c0548616 f65d35f0 f69f0030 f6fdc000 c054869c f65d3400 f88720f8 f65d3400 f69f0030 f8872152 f65d3400 f69f f88721c9 f6fdc014 f6d1d848 f8872233 f8872247 Call Trace: [c0548616] class_device_del+0x8c/0x10a [c054869c] class_device_unregister+0x8/0x10 [f88720f8] __scsi_remove_device+0x1d/0x60 [scsi_mod] [f8872152] scsi_remove_device+0x17/0x20 [scsi_mod] [f88721c9] __scsi_remove_target+0x6e/0xa1 [scsi_mod] [f8872233] __remove_child+0x0/0x18 [scsi_mod] [f8872247] __remove_child+0x14/0x18 [scsi_mod] [c0545dc0] device_for_each_child+0x1a/0x3c [f887222a] scsi_remove_target+0x2e/0x37 [scsi_mod] [f88b7ae6] fc_rport_final_delete+0x51/0x9a [scsi_transport_fc] [c04339ff] run_workqueue+0x85/0x125 [c042eb72] do_sigaction+0x10f/0x149 [f88b7a95] fc_rport_final_delete+0x0/0x9a [scsi_transport_fc] [c0434359] worker_thread+0xf9/0x124 [c0421072] default_wake_function+0x0/0xc [c0434260] worker_thread+0x0/0x124 [c0436924]
Re: [PATCH] Add support for asynchronous scans to libata
Matthew Wilcox wrote: On Mon, Dec 11, 2006 at 10:58:06AM -0500, Jeff Garzik wrote: The time-consuming portion already takes place in a thread. Do you mean multiple threads? Or, ATA's scan is in one thread, while work continues in other threads? Patch seems sane, provided that I am educated a bit :) Each host will be scanned in its own thread. So boot-up and initialisation of other devices (eg, USB, networking, PS/2, etc) will continue. Drive ordering and numbering is maintained, and the default configuration right now is to not enable this feature. We have infrastructure in place to make sure that all discs are found before we try to start init or autorun RAID. If there's anywhere we're missing a call to scsi_complete_async_scans(), it'll affect SCSI as much as it will IDE, and I'm extremely interested in fixing it. oh, err, hum. libata uses one scsihost per port, not one-per-controller, so it won't work like you think. This will conflict with AHCI's staggered spin-up, at least. Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support for asynchronous scans to libata
Matthew Wilcox wrote: Some of the drivers (AHCI was mentioned to me as a culprit) take a long time to discover all the devices attached to them. Even for ones which are relatively quick, if you put a lot of them in a machine, it will take a long time in aggregate. This can be fixed by adding support for asynchronous scsi scans, which causes the time-consuming portions of initialisation to take place in threads. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] The time-consuming portion already takes place in a thread. Do you mean multiple threads? Or, ATA's scan is in one thread, while work continues in other threads? Patch seems sane, provided that I am educated a bit :) Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support for asynchronous scans to libata
On Mon, Dec 11, 2006 at 10:58:06AM -0500, Jeff Garzik wrote: The time-consuming portion already takes place in a thread. Do you mean multiple threads? Or, ATA's scan is in one thread, while work continues in other threads? Patch seems sane, provided that I am educated a bit :) Each host will be scanned in its own thread. So boot-up and initialisation of other devices (eg, USB, networking, PS/2, etc) will continue. Drive ordering and numbering is maintained, and the default configuration right now is to not enable this feature. We have infrastructure in place to make sure that all discs are found before we try to start init or autorun RAID. If there's anywhere we're missing a call to scsi_complete_async_scans(), it'll affect SCSI as much as it will IDE, and I'm extremely interested in fixing it. - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support for asynchronous scans to libata
Matthew Wilcox wrote: Some of the drivers (AHCI was mentioned to me as a culprit) take a long time to discover all the devices attached to them. Even for ones which are relatively quick, if you put a lot of them in a machine, it will take a long time in aggregate. This can be fixed by adding support for asynchronous scsi scans, which causes the time-consuming portions of initialisation to take place in threads. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] ACK. I tried to apply the patch, but git-applymbox choked on every single file modified. Quite possibly, its due to a whitespace cleanup in Alan territory. If you would either (a) wait several hours for libata-dev.git#upstream to mirror out, and resend, or (b) supply a git:// URL to pull from, I'll merge. Jeff - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support for asynchronous scans to libata
On Mon, Dec 11, 2006 at 11:18:17AM -0500, Jeff Garzik wrote: ACK. I tried to apply the patch, but git-applymbox choked on every single file modified. Quite possibly, its due to a whitespace cleanup in Alan territory. If you would either (a) wait several hours for libata-dev.git#upstream to mirror out, and resend, or (b) supply a git:// URL to pull from, I'll merge. Here's a merge against your upstream head. It's perhaps more of a conceptual merge as it adds entries to drivers that have been added since I wrote the patch. I also added a couple of missing slave_destroy methods to the newly-added drivers. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-ide in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Add support for asynchronous scans to libata
On Mon, Dec 11, 2006 at 10:02:40PM -0700, Matthew Wilcox wrote: On Mon, Dec 11, 2006 at 11:18:17AM -0500, Jeff Garzik wrote: ACK. I tried to apply the patch, but git-applymbox choked on every single file modified. Quite possibly, its due to a whitespace cleanup in Alan territory. If you would either (a) wait several hours for libata-dev.git#upstream to mirror out, and resend, or (b) supply a git:// URL to pull from, I'll merge. Here's a merge against your upstream head. It's perhaps more of a conceptual merge as it adds entries to drivers that have been added since I wrote the patch. I also added a couple of missing slave_destroy methods to the newly-added drivers. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] ... this time with patch ... Add support for asynchronous scans to libata Some of the drivers (AHCI was mentioned to me as a culprit) take a long time to discover all the devices attached to them. Even for ones which are relatively quick, if you put a lot of them in a machine, it will take a long time in aggregate. This can be fixed by adding support for asynchronous scsi scans, which causes the time-consuming portions of initialisation to take place in threads. Signed-off-by: Matthew Wilcox [EMAIL PROTECTED] diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index f36da48..bbbec6d 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -242,6 +242,8 @@ static struct scsi_host_template ahci_sht = { .dma_boundary = AHCI_DMA_BOUNDARY, .slave_configure= ata_scsi_slave_config, .slave_destroy = ata_scsi_slave_destroy, + .scan_start = ata_scsi_scan_start, + .scan_finished = ata_scsi_scan_finished, .bios_param = ata_std_bios_param, .suspend= ata_scsi_device_suspend, .resume = ata_scsi_device_resume, diff --git a/drivers/ata/ata_generic.c b/drivers/ata/ata_generic.c index 908751d..82c182b 100644 --- a/drivers/ata/ata_generic.c +++ b/drivers/ata/ata_generic.c @@ -116,6 +116,8 @@ static struct scsi_host_template generic_sht = { .dma_boundary = ATA_DMA_BOUNDARY, .slave_configure= ata_scsi_slave_config, .slave_destroy = ata_scsi_slave_destroy, + .scan_start = ata_scsi_scan_start, + .scan_finished = ata_scsi_scan_finished, .bios_param = ata_std_bios_param, .resume = ata_scsi_device_resume, .suspend= ata_scsi_device_suspend, diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c index 7959e4c..0d5b132 100644 --- a/drivers/ata/ata_piix.c +++ b/drivers/ata/ata_piix.c @@ -274,6 +274,8 @@ static struct scsi_host_template piix_sht = { .dma_boundary = ATA_DMA_BOUNDARY, .slave_configure= ata_scsi_slave_config, .slave_destroy = ata_scsi_slave_destroy, + .scan_start = ata_scsi_scan_start, + .scan_finished = ata_scsi_scan_finished, .bios_param = ata_std_bios_param, .resume = ata_scsi_device_resume, .suspend= ata_scsi_device_suspend, diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index 5fffccd..0d80396 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -5712,6 +5712,13 @@ static struct ata_port * ata_port_add(const struct ata_probe_ent *ent, return NULL; } + if (!ent-sht-scan_start) { + printk(KERN_WARNING ata%u: Overriding NULL scan_start\n, + port_no); + ent-sht-scan_start = ata_scsi_scan_start; + ent-sht-scan_finished = ata_scsi_scan_finished; + } + shost = scsi_host_alloc(ent-sht, sizeof(struct ata_port)); if (!shost) return NULL; @@ -5747,6 +5754,51 @@ void ata_host_init(struct ata_host *host, struct device *dev, host-ops = ops; } +void ata_scsi_scan_start(struct Scsi_Host *shost) +{ + struct ata_port *ap = ata_shost_to_port(shost); + int rc; + + if (ap-ops-error_handler) { + struct ata_eh_info *ehi = ap-eh_info; + unsigned long flags; + + ata_port_probe(ap); + + /* kick EH for boot probing */ + spin_lock_irqsave(ap-lock, flags); + + ehi-probe_mask = (1 ATA_MAX_DEVICES) - 1; + ehi-action |= ATA_EH_SOFTRESET; + ehi-flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET; + + ap-pflags |= ATA_PFLAG_LOADING; + ata_port_schedule_eh(ap); + + spin_unlock_irqrestore(ap-lock, flags); + + /* wait for EH to finish */ + ata_port_wait_eh(ap); + } else { + DPRINTK(ata%u: bus probe begin\n, ap-id); + rc = ata_bus_probe