[PATCH] Bug 4940 Repeatable Kernel Panic on Adaptec 2015S I20 device on bootup
Suitable for both 2.4 and 2.6 version of the driver. Applies to scsi-misc-2.6 git tree. Prevent driver from loading if another driver (i2o) has already claimed the resources associated with the card. Discussion associated with this bug can be referenced at http://bugzilla.kernel.org/show_bug.cgi?id=4940 where it was agreed to use pci_request_regions in both the dpt_i2o and the i2o driver to prevent both drivers loading on the same adapter(s). Signed-off-by: Mark Salyzyn [EMAIL PROTECTED] Index: a/drivers/scsi/dpt_i2o.c === --- a/drivers/scsi/dpt_i2o.c2005-07-26 11:42:03.0 -0400 +++ b/drivers/scsi/dpt_i2o.c2005-08-08 09:50:33.247595544 -0400 @@ -908,8 +908,12 @@ } - + if (pci_request_regions(pDev)) { + PERROR(dpti: adpt_config_hba: pci request region failed\n); + return -EINVAL; + } base_addr_virt = ioremap(base_addr0_phys,hba_map0_area_size); if (!base_addr_virt) { + pci_release_regions(pDev); PERROR(dpti: adpt_config_hba: io remap failed\n); return -EINVAL; } @@ -919,6 +924,7 @@ if (!msg_addr_virt) { PERROR(dpti: adpt_config_hba: io remap failed on BAR1\n); iounmap(base_addr_virt); + pci_release_regions(pDev); return -EINVAL; } } else { @@ -932,6 +938,7 @@ iounmap(msg_addr_virt); } iounmap(base_addr_virt); + pci_release_regions(pDev); return -ENOMEM; } memset(pHba, 0, sizeof(adpt_hba)); @@ -1027,6 +1034,7 @@ up(adpt_configuration_lock); iounmap(pHba-base_addr_virt); + pci_release_regions(pHba-pDev); if(pHba-msg_addr_virt != pHba-base_addr_virt){ iounmap(pHba-msg_addr_virt); } Sincerely -- Mark Salyzyn - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Bug 4940 Repeatable Kernel Panic on Adaptec 2015S I20 device on bootup
On Mon, Aug 08, 2005 at 03:16:18PM +0100, Christoph Hellwig wrote: Please either update the driver to use the pci_driver model or even better remove it completely and let everyone use the i2o drivers now that they have full 64bit dma and managment support. In the mean time. I ack the fix for what we have now. I don't see the point of fixing dpt_i2o much further given in another 6 months your wish can probably come true. Alan - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.13-rc5-mm1 doesnt boot on x86_64
On Mon, Aug 08, 2005 at 07:11:26PM +0200, Andi Kleen wrote: On Mon, Aug 08, 2005 at 09:48:19AM -0700, Ashok Raj wrote: Folks, Iam getting this on the recent 2.6.12-rc5-mm1 kernel built with defconfig. Cheers, Ashok Raj --- [cut here ] - [please bite here ] - Kernel BUG at include/linux/list.h:165 invalid operand: [1] SMP CPU 2 Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.13-rc5-mm1 RIP: 0010:[802b9ef4] 802b9ef4{attribute_container_unregist}RSP: 0018:8100bfb63f00 EFLAGS: 00010283 RAX: 8100bfbd4c58 RBX: 8100bfbd4c00 RCX: 804e6600 RDX: 00200200 RSI: RDI: 804e6600 RBP: R08: 8100bfbd4c48 R09: 0020 R10: R11: 8019baa0 R12: 80100190 R13: R14: 8010 R15: 80627fb0 FS: () GS:80616980() knlGS: CS: 0010 DS: 0018 ES: 0018 CR0: 8005003b CR2: CR3: 00101000 CR4: 06e0 Process swapper (pid: 1, threadinfo 8100bfb62000, task 8100bfb614d0) Stack: 8032643d 8064499f 80100190 80651288 8010b249 0246 00020800 804ae180 Call Trace:8032643d{spi_release_transport+13} 8064499f{ahd} 8010b249{init+505} 8010e896{child_rip+8} 8010b050{init+0} 8010e88e{child_rip+0} Looks like a SCSI problem. The machine has an Adaptec SCSI adapter, right? Yep, its adaptec problem Actually i dont need AIX7XXX, since my system requires only CONFIG_FUSION. I turned that option off, and it seems to boot fine now. Ashok -AndI Code: 0f 0b a3 e1 d9 44 80 ff ff ff ff c2 a5 00 49 8b 00 4c 39 40 RIP 802b9ef4{attribute_container_unregister+52} RSP 8100bfb63f0 0Kernel panic - not syncing: Attempted to kill init! -- Cheers, Ashok Raj - Open Source Technology Center - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.13-rc5-mm1 doesnt boot on x86_64
On Mon, Aug 08, 2005 at 12:33:29PM -0500, James Bottomley wrote: On Mon, 2005-08-08 at 19:11 +0200, Andi Kleen wrote: Looks like a SCSI problem. The machine has an Adaptec SCSI adapter, right? The traceback looks pretty meaningless. What was happening on the machine before this. i.e. was it booting up, in which case can we have the prior dmesg file; or was the aic79xxx driver being removed? I can get the trace again, but basically the system was booting. AIC_7XXX was defined in defconfig, but my system doesnt have it. Seems like the senario was the driver tried to probe, found nothing, and tries to de-reg resulting in the BUG(). I will try to get the recompile and entire dmesg log in the meantime. James -- Cheers, Ashok Raj - Open Source Technology Center - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.13-rc5-mm1 doesnt boot on x86_64
James Bottomley [EMAIL PROTECTED] wrote: On Mon, 2005-08-08 at 19:11 +0200, Andi Kleen wrote: Looks like a SCSI problem. The machine has an Adaptec SCSI adapter, right? The traceback looks pretty meaningless. What was happening on the machine before this. i.e. was it booting up, in which case can we have the prior dmesg file; or was the aic79xxx driver being removed? -mm has extra list_head debugging goodies. I'd be suspecting a list_head corruption detected somewhere under spi_release_transport(). --- 25/include/linux/list.h~list_del-debug 2005-03-08 11:40:27.0 -0800 +++ 25-akpm/include/linux/list.h2005-03-08 11:40:49.0 -0800 @@ -5,7 +5,9 @@ #include linux/stddef.h #include linux/prefetch.h +#include linux/kernel.h #include asm/system.h +#include asm/bug.h /* * These are non-NULL pointers that will result in page faults @@ -160,6 +162,8 @@ static inline void __list_del(struct lis */ static inline void list_del(struct list_head *entry) { + BUG_ON(entry-prev-next != entry); + BUG_ON(entry-next-prev != entry); __list_del(entry-prev, entry-next); entry-next = LIST_POISON1; entry-prev = LIST_POISON2; _ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
On 08/07/05 11:36, James Bottomley wrote: On Sun, 2005-08-07 at 10:59 -0400, Alan Stern wrote: What sort of synchronization is there between scsi_remove_host and the error-handler thread? Offhand I can see two possible problems, depending on how the LLD is written: There isn't any by design. I think you're not thinking about how this works correctly. What remove host does is loop over the active devices removing them from visibility and trying to do a final put on the generic devices before removing the host from visibility and doing a final put on it. However, any outstanding user will have a reference and will keep all the bits of the hierarchy in place until that reference is relinquished. Which automatically implies that any such entity (holding a ref) trying to do any kind of action to what it is holding, should get an error result, else it would be misled to believe that things are ok, when in fact the whole thing is coming down... Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
Luben Tuikov [EMAIL PROTECTED] wrote: On 08/07/05 17:57, James Bottomley wrote: Alan Stern wrote: The only resource that matters for this discussion is associated with the LLD itself, not with any of its hosts: the host template. Once the SCSI core has released all references to the template, it can't call the LLD any more. The problem is that the LLD has no way to know when all the references have been dropped. This suggests that the entire problem could be solved by adding a kref to struct scsi_host_template. Would you agree to a patch adding such a kref? Well, not really. The host template usually exists as a variable in the module, so its lifetime is tied to the lifetime of the module. Adding a kref wouldn't help because it will still be freed when the module is removed. If there's a case where the template is being freed prematurely because the module is being removed, then we have the module refcounting wrong somewhere. Have you run across such a case? Hmm, I think Alan has a point. From object point of view, who is the parent and who is the child when talking about LLDD/module and the host template? The reason is that the host template could be simulated on behalf of the underlying transport (as is the case for USB). So, it could be the case that the host template _should_ be removed but the entity which removed it should stay, and that entity wants to know when the host template is to be removed. (Actually it doesn't, but setting the release method in the kobj_ktype would do wonders. ;-) ) In which case, it does make sense to include a kset to the host template (since it has many children), and anyone using/manipulating it does kset_get(). When that entity is done with the host template it does a kset_put(). Such entities could be the managing layer above, error handling, etc. Another solution, just as good, is to use the template _only_ for the registration call, as a _template_, and as soon as the registration call has returned, the caller (LLDD/module) can free the template. After all, it is only a template. While you could do this I believe this would not solve Alan's question as if I read correctly it involved the functions called through the host template (i.e., through shost-hostt). Then the managing layer allocates the actual host struct and gives it to the LLDD. The LLDD goes a kset_get() on it while it lives, and when it is to die, it does a kset_put(). And if it gets a method call, it would error it out, after the put... We already have kobject for the scsi_host object so it is unclear why switch to a kset. In looking at the kernel tree I do not see any users of ksets outside the driver model subsystems. I believe James suggestion of adding to scsi_host_dev_release would allow the LLDD to get a better indicator of release cleanup without changing the LLDD interface to use ksets. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2005-08-07 at 14:43 -0400, Alan Stern wrote: What host device release method? scsi_host_template-release is marked OBSOLETE and for use only with old-style drivers. scsi_host_dev_release ^ This is the one I was thinking of adding to. Is the thought here that if a LLDD provided some new scsi_host_template function we would call this from scsi_host_dev_release? -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Expose the underlying physical disks of Fusion Integrated RAID devices
This patch is actually deceptively simple for what it does. For all fusion devices with integrated raid devices, we make the card pretend it has two channels, then all I/O on channel 1 is directed to the underlying physical discs of the integrated raid assembly. The net effect is that all the physical discs show up correctly on virtual channel 1 (we also specify no_uld_attach for anything on virtual channel 1 so that they can only be accessed using ioctls to the sg device). The net effect is something like this: mptbase: Initiating ioc22 bringup ioc22: 53C1030: Capabilities={Initiator} scsi27 : ioc22: LSI53C1030, FwRev=01032920h, Ports=1, MaxQ=222, IRQ=57 Vendor: LSILOGIC Model: 1030 IM Rev: 1000 Type: Direct-Access ANSI SCSI revision: 02 SCSI device sdb: 17813504 512-byte hdwr sectors (9121 MB) SCSI device sdb: drive cache: write through SCSI device sdb: 17813504 512-byte hdwr sectors (9121 MB) SCSI device sdb: drive cache: write through sdb: unknown partition table Attached scsi disk sdb at scsi27, channel 0, id 0, lun 0 Attached scsi generic sg1 at scsi27, channel 0, id 0, lun 0, type 0 Vendor: QUANTUM Model: ATLAS IV 9 WLSRev: 0B0B Type: Direct-Access ANSI SCSI revision: 03 Attached scsi generic sg2 at scsi27, channel 1, id 0, lun 0, type 0 Vendor: QUANTUM Model: ATLAS IV 9 WLSRev: 0B0B Type: Direct-Access ANSI SCSI revision: 03 Attached scsi generic sg3 at scsi27, channel 1, id 1, lun 0, type 0 where you can see that sdb is the RAID device and sg2 and sg3 the underlying SCSI discs. James diff --git a/drivers/message/fusion/mptscsih.c b/drivers/message/fusion/mptscsih.c --- a/drivers/message/fusion/mptscsih.c +++ b/drivers/message/fusion/mptscsih.c @@ -171,6 +171,7 @@ static void mptscsih_fillbuf(char *buffe void mptscsih_remove(struct pci_dev *); void mptscsih_shutdown(struct pci_dev *); +static int mptscsih_is_raid_volume(MPT_SCSI_HOST *hd, uint id); #ifdef CONFIG_PM intmptscsih_suspend(struct pci_dev *pdev, pm_message_t state); intmptscsih_resume(struct pci_dev *pdev); @@ -1274,6 +1275,12 @@ mptscsih_qcmd(struct scsi_cmnd *SCpnt, v return SCSI_MLQUEUE_HOST_BUSY; } + if (SCpnt-device-channel !mptscsih_is_raid_volume(hd, target)) { + SCpnt-result = DID_NO_CONNECT 16; + done(SCpnt); + return 0; + } + /* * Put together a MPT SCSI request... */ @@ -1318,9 +1325,12 @@ mptscsih_qcmd(struct scsi_cmnd *SCpnt, v /* Use the above information to set up the message frame */ pScsiReq-TargetID = (u8) target; - pScsiReq-Bus = (u8) SCpnt-device-channel; + pScsiReq-Bus = 0; pScsiReq-ChainOffset = 0; - pScsiReq-Function = MPI_FUNCTION_SCSI_IO_REQUEST; + if (SCpnt-device-channel) + pScsiReq-Function = MPI_FUNCTION_RAID_SCSI_IO_PASSTHROUGH; + else + pScsiReq-Function = MPI_FUNCTION_SCSI_IO_REQUEST; pScsiReq-CDBLength = SCpnt-cmd_len; pScsiReq-SenseBufferLength = MPT_SENSE_BUFFER_SIZE; pScsiReq-Reserved = 0; @@ -2145,6 +2155,9 @@ mptscsih_slave_alloc(struct scsi_device if (hd == NULL) return -ENODEV; + if (device-channel) + device-no_uld_attach = 1; + if ((vdev = hd-Targets[target]) != NULL) goto out; diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c --- a/drivers/message/fusion/mptspi.c +++ b/drivers/message/fusion/mptspi.c @@ -237,7 +237,10 @@ mptspi_probe(struct pci_dev *pdev, const sh-max_id = MPT_MAX_SCSI_DEVICES; sh-max_lun = MPT_LAST_LUN + 1; - sh-max_channel = 0; + if (ioc-spi_data.isRaid) + sh-max_channel = 1; + else + sh-max_channel = 0; sh-this_id = ioc-pfacts[0].PortSCSIID; /* Required entry. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] aacraid: adapter support update
Received from Mark Salyzyn This patch adds the product ID for the ICP9067MA adapter. The entries for the ICP9085LI, ICP5085BR, IBM8k ASR4810SAS were incorrect and would not initialize the adapters correctly. Applies to scsi-misc-2.6 git tree Signed-off-by: Mark Haverkamp [EMAIL PROTECTED] Index: scsi-misc-aac-1/drivers/scsi/aacraid/linit.c === --- scsi-misc-aac-1.orig/drivers/scsi/aacraid/linit.c 2005-08-08 09:11:56.0 -0700 +++ scsi-misc-aac-1/drivers/scsi/aacraid/linit.c2005-08-08 11:04:51.0 -0700 @@ -120,36 +120,39 @@ { 0x9005, 0x0286, 0x9005, 0x02a3, 0, 0, 29 }, /* ICP5085AU (Hurricane) */ { 0x9005, 0x0285, 0x9005, 0x02a4, 0, 0, 30 }, /* ICP9085LI (Marauder-X) */ { 0x9005, 0x0285, 0x9005, 0x02a5, 0, 0, 31 }, /* ICP5085BR (Marauder-E) */ - { 0x9005, 0x0287, 0x9005, 0x0800, 0, 0, 32 }, /* Themisto Jupiter Platform */ - { 0x9005, 0x0200, 0x9005, 0x0200, 0, 0, 32 }, /* Themisto Jupiter Platform */ - { 0x9005, 0x0286, 0x9005, 0x0800, 0, 0, 33 }, /* Callisto Jupiter Platform */ - { 0x9005, 0x0285, 0x9005, 0x028e, 0, 0, 34 }, /* ASR-2020SA SATA PCI-X ZCR (Skyhawk) */ - { 0x9005, 0x0285, 0x9005, 0x028f, 0, 0, 35 }, /* ASR-2025SA SATA SO-DIMM PCI-X ZCR (Terminator) */ - { 0x9005, 0x0285, 0x9005, 0x0290, 0, 0, 36 }, /* AAR-2410SA PCI SATA 4ch (Jaguar II) */ - { 0x9005, 0x0285, 0x1028, 0x0291, 0, 0, 37 }, /* CERC SATA RAID 2 PCI SATA 6ch (DellCorsair) */ - { 0x9005, 0x0285, 0x9005, 0x0292, 0, 0, 38 }, /* AAR-2810SA PCI SATA 8ch (Corsair-8) */ - { 0x9005, 0x0285, 0x9005, 0x0293, 0, 0, 39 }, /* AAR-21610SA PCI SATA 16ch (Corsair-16) */ - { 0x9005, 0x0285, 0x9005, 0x0294, 0, 0, 40 }, /* ESD SO-DIMM PCI-X SATA ZCR (Prowler) */ - { 0x9005, 0x0285, 0x103C, 0x3227, 0, 0, 41 }, /* AAR-2610SA PCI SATA 6ch */ - { 0x9005, 0x0285, 0x9005, 0x0296, 0, 0, 42 }, /* ASR-2240S (SabreExpress) */ - { 0x9005, 0x0285, 0x9005, 0x0297, 0, 0, 43 }, /* ASR-4005SAS */ - { 0x9005, 0x0285, 0x1014, 0x02F2, 0, 0, 44 }, /* IBM 8i (AvonPark) */ - { 0x9005, 0x0285, 0x1014, 0x0312, 0, 0, 44 }, /* IBM 8i (AvonPark Lite) */ - { 0x9005, 0x0285, 0x9005, 0x0298, 0, 0, 45 }, /* ASR-4000SAS (BlackBird) */ - { 0x9005, 0x0285, 0x9005, 0x0299, 0, 0, 46 }, /* ASR-4800SAS (Marauder-X) */ - { 0x9005, 0x0285, 0x9005, 0x029a, 0, 0, 47 }, /* ASR-4805SAS (Marauder-E) */ - { 0x9005, 0x0286, 0x9005, 0x02a2, 0, 0, 48 }, /* ASR-4810SAS (Hurricane */ - - { 0x9005, 0x0285, 0x1028, 0x0287, 0, 0, 49 }, /* Perc 320/DC*/ - { 0x1011, 0x0046, 0x9005, 0x0365, 0, 0, 50 }, /* Adaptec 5400S (Mustang)*/ - { 0x1011, 0x0046, 0x9005, 0x0364, 0, 0, 51 }, /* Adaptec 5400S (Mustang)*/ - { 0x1011, 0x0046, 0x9005, 0x1364, 0, 0, 52 }, /* Dell PERC2/QC */ - { 0x1011, 0x0046, 0x103c, 0x10c2, 0, 0, 53 }, /* HP NetRAID-4M */ - - { 0x9005, 0x0285, 0x1028, PCI_ANY_ID, 0, 0, 54 }, /* Dell Catchall */ - { 0x9005, 0x0285, 0x17aa, PCI_ANY_ID, 0, 0, 55 }, /* Legend Catchall */ - { 0x9005, 0x0285, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 56 }, /* Adaptec Catch All */ - { 0x9005, 0x0286, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 57 }, /* Adaptec Rocket Catch All */ + { 0x9005, 0x0286, 0x9005, 0x02a6, 0, 0, 32 }, /* ICP9067MA (Intruder-6) */ + { 0x9005, 0x0287, 0x9005, 0x0800, 0, 0, 33 }, /* Themisto Jupiter Platform */ + { 0x9005, 0x0200, 0x9005, 0x0200, 0, 0, 33 }, /* Themisto Jupiter Platform */ + { 0x9005, 0x0286, 0x9005, 0x0800, 0, 0, 34 }, /* Callisto Jupiter Platform */ + { 0x9005, 0x0285, 0x9005, 0x028e, 0, 0, 35 }, /* ASR-2020SA SATA PCI-X ZCR (Skyhawk) */ + { 0x9005, 0x0285, 0x9005, 0x028f, 0, 0, 36 }, /* ASR-2025SA SATA SO-DIMM PCI-X ZCR (Terminator) */ + { 0x9005, 0x0285, 0x9005, 0x0290, 0, 0, 37 }, /* AAR-2410SA PCI SATA 4ch (Jaguar II) */ + { 0x9005, 0x0285, 0x1028, 0x0291, 0, 0, 38 }, /* CERC SATA RAID 2 PCI SATA 6ch (DellCorsair) */ + { 0x9005, 0x0285, 0x9005, 0x0292, 0, 0, 39 }, /* AAR-2810SA PCI SATA 8ch (Corsair-8) */ + { 0x9005, 0x0285, 0x9005, 0x0293, 0, 0, 40 }, /* AAR-21610SA PCI SATA 16ch (Corsair-16) */ + { 0x9005, 0x0285, 0x9005, 0x0294, 0, 0, 41 }, /* ESD SO-DIMM PCI-X SATA ZCR (Prowler) */ + { 0x9005, 0x0285, 0x103C, 0x3227, 0, 0, 42 }, /* AAR-2610SA PCI SATA 6ch */ + { 0x9005, 0x0285, 0x9005, 0x0296, 0, 0, 43 }, /* ASR-2240S (SabreExpress) */ + { 0x9005, 0x0285, 0x9005, 0x0297, 0, 0, 44 }, /* ASR-4005SAS */ + { 0x9005, 0x0285, 0x1014, 0x02F2, 0, 0, 45 }, /* IBM 8i (AvonPark) */ + { 0x9005, 0x0285, 0x1014, 0x0312, 0, 0, 45 }, /* IBM 8i (AvonPark Lite) */ + { 0x9005, 0x0286, 0x1014, 0x9580, 0, 0, 46 }, /* IBM 8k/8k-l8 (Aurora) */ + { 0x9005, 0x0286, 0x1014, 0x9540, 0, 0, 47 }, /* IBM 8k/8k-l4 (Aurora Lite) */ + { 0x9005, 0x0285, 0x9005, 0x0298, 0, 0, 48 }, /*
Re: Netlink allocation for iSCSI and others
From: Patrick McHardy [EMAIL PROTECTED] Date: Mon, 08 Aug 2005 23:19:37 +0200 David S. Miller wrote: So we can increase MAX_LINKS to 256 and that's what I think I will do for 2.6.14 unless there is a very serious objection. The tables sized by MAX_LINKS in af_netlink.c are dynamically allocated, and the only linear iterations over MAX_LINKS are for the netlink socket procfs seq-file dumper, so it's not a performance issue either. I think we should increase it when allocating new numbers to save the unused memory for the larger nltable and additional pid hashes. Userspace shouldn't care if we change it. Agreed. So we have 17 netlink numbers to allocate at this point, and that should be good for a while. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
Alan Stern [EMAIL PROTECTED] wrote: Here you are wrong. In fact the core makes no such guarantees. It _will_ try to enter the host (for things like telling disk drives to flush their caches) for as long as it retains a reference to the host structure. Well post scsi_remove_host returning no commands can be sent through the LLDD's queuecommand. There is still the issue you previously mentioned about a possible race of the scsi error handler and a call to scsi_remove_host. I tried to provoke such a case, but failed. Evidently I've been barking up the wrong tree. Normally the SCSI core doesn't call the LLD unless some process has opened a device file for something on that host. This will automatically do try_module_get on the LLD, making it impossible for the LLD to be removed from memory. I'm not certain this reasoning is 100% reliable, though -- does it cover _every_ case where the core calls the LLD? Maybe something like this little patch would be a good idea: In USB is a driver controlling more than one host instance. As I believe the suggestion that James indicated as a possible addition would be at per host instance granularity and your solution below looks like a notification once all instances controlled by a driver are gone. --- usb-2.6.orig/drivers/scsi/hosts.c +++ usb-2.6/drivers/scsi/hosts.c @@ -205,6 +205,9 @@ int scsi_add_host(struct Scsi_Host *shos goto out_destroy_host; scsi_proc_host_add(shost); + shost-parent_driver = dev-driver; + if (shost-parent_driver) + get_driver(shost-parent_driver); Just a nit, for a general case solution dev can be null for drivers who live on a bus that is not converted to the driver model and may not have a parent. But if you think this isn't needed, then okay. There is an exceptional case: device scanning. During scanning, there is nothing to directly prevent the LLD from being unloaded. There is an indirect protection, however, because the scanning thread will own shost-scan_mutex, and scsi_remove_host acquires the mutex before returning. I think in the past someone has indicated that we really should be getting refs during scanning to hold things in place. I guess as you note above and below scan_mutex is holding things in place. memory. In fact, I've been using the template as kind of a surrogate, standing for all of the LLD's module -- so long as one is present, so is the other. I still have one related question. This is a little bit off to one side, but maybe you folks can suggest possible solutions. The question concerns a deadlock I _was_ able to generate earlier today with a patched usb-storage. My USB mass-storage test device doesn't respond to TEST UNIT READY, so it causes a timeout and kicks the error handler into action. This happens during device scanning, just prior to reading the partition table. The error handler goes through various stages of processing, leading up to a bus reset. I disconnected the USB device just before the bus reset routine was called. Now, usb-storage implements a SCSI bus reset by actually performing a USB port reset. The USB subsystem requires the caller to acquire a device-specific semaphore before doing a port reset, and the subsystem itself acquires this same semaphore when notifying drivers about a disconnection. (The idea is that we don't want drivers trying to handle a disconnect and a reset on the same device at the same time.) So here's how things end up. The scanning thread owns shost-scan_mutex and is waiting for the error handler to finish. The EH thread is executing usb-storage's bus_reset routine and is waiting to acquire the device semaphore. USB's khubd thread owns the device semaphore and has invoked the usb-storage disconnect routine. Among other things, this routine calls scsi_remove_host, which tries to acquire the scan_mutex. How should this deadlock be resolved? The current code has an extremely inelegant solution, and I would like to find a better one. Any ideas? Alan Stern I do not mean to push everything down into the LLDD, but in this case it is unclear what type of protection the scsi mid layer could add to protect against unknown future events while scanning. Is the current inelegant solution in the usb storage driver using some form of state model. It would appear that if it is not that a state model that uses a spin lock or something other than device semaphore to determine a disconnect was happening during a reset or vis-versa would be a good idea. -andmike -- Michael Anderson [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
On 08/08/05 16:41, Alan Stern wrote: I still have one related question. This is a little bit off to one side, but maybe you folks can suggest possible solutions. The question concerns a deadlock I _was_ able to generate earlier today with a patched usb-storage. My USB mass-storage test device doesn't respond to TEST UNIT READY, so it causes a timeout and kicks the error handler into action. This happens during device scanning, just prior to reading the partition table. The error handler goes through various stages of processing, leading up to a bus reset. I disconnected the USB device just before the bus reset routine was called. I think that scanning is a special process which should involve the minimum of error handling, by either ignoring errors and trying to connect to the device anyway, or on the first error, give up the device. Which policy would one follow depends on the transport. If the latter, you need to blacklist the device as not supporting TUR. Then on any error, like you pulling the cable during scanning, the scanning process will give up and all will be well. Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
On Mon, 2005-08-08 at 16:41 -0400, Alan Stern wrote: Would you agree to a patch adding such a kref? Well, not really. The host template usually exists as a variable in the module, so its lifetime is tied to the lifetime of the module. Adding a kref wouldn't help because it will still be freed when the module is removed. If there's a case where the template is being freed prematurely because the module is being removed, then we have the module refcounting wrong somewhere. Have you run across such a case? I tried to provoke such a case, but failed. Evidently I've been barking up the wrong tree. Normally the SCSI core doesn't call the LLD unless some process has opened a device file for something on that host. This will automatically do try_module_get on the LLD, making it impossible for the LLD to be removed from memory. I'm not certain this reasoning is 100% reliable, though -- does it cover _every_ case where the core calls the LLD? Maybe something like this little patch would be a good idea: Yes, I was considering something similar, since all the last put of a driver does is send the completion that driver_unregister() should be waiting for + struct device_driver*parent_driver; I don't think we need this. The underlying device has to be the parent of shost_gendev, so you can get the parent driver as shost_gendev.parent-driver + if (shost-parent_driver) + put_driver(shost-parent_driver); And just before this would be the place to do the final release of all the resources the HBA is holding. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
On Mon, 8 Aug 2005, Luben Tuikov wrote: On 08/08/05 16:41, Alan Stern wrote: I still have one related question. This is a little bit off to one side, but maybe you folks can suggest possible solutions. The question concerns a deadlock I _was_ able to generate earlier today with a patched usb-storage. My USB mass-storage test device doesn't respond to TEST UNIT READY, so it causes a timeout and kicks the error handler into action. This happens during device scanning, just prior to reading the partition table. The error handler goes through various stages of processing, leading up to a bus reset. I disconnected the USB device just before the bus reset routine was called. I think that scanning is a special process which should involve the minimum of error handling, by either ignoring errors and trying to connect to the device anyway, or on the first error, give up the device. Which policy would one follow depends on the transport. No, that's not feasible. We can't just ignore errors, and we do have to cope with them. Scanning is a particular vulnerable time, since it involves sending commands that don't occur most of the time during normal operation. If the latter, you need to blacklist the device as not supporting TUR. Then on any error, like you pulling the cable during scanning, the scanning process will give up and all will be well. You can't blacklist devices you don't know about. The kernel should work regardless. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
On Mon, 8 Aug 2005, Stefan Richter wrote: Here you are wrong. In fact the core makes no such guarantees. It _will_ try to enter the host (for things like telling disk drives to flush their caches) for as long as it retains a reference to the host structure. Sure. But after all high-level drivers were detached (and that should have happend right before scsi_remove_host returns) I don't see why the host's ref count might not be down to zero. There might still be outstanding references: processes holding files open, that sort of thing. It happens all the time in refcounting systems. On Mon, 8 Aug 2005, Mike Anderson wrote: Well post scsi_remove_host returning no commands can be sent through the LLDD's queuecommand. There is still the issue you previously mentioned about a possible race of the scsi error handler and a call to scsi_remove_host. Yes, and there's other pathway too: the procfs interface. In USB is a driver controlling more than one host instance. As I believe the suggestion that James indicated as a possible addition would be at per host instance granularity and your solution below looks like a notification once all instances controlled by a driver are gone. That's right. They are not the same thing. --- usb-2.6.orig/drivers/scsi/hosts.c +++ usb-2.6/drivers/scsi/hosts.c @@ -205,6 +205,9 @@ int scsi_add_host(struct Scsi_Host *shos goto out_destroy_host; scsi_proc_host_add(shost); + shost-parent_driver = dev-driver; + if (shost-parent_driver) + get_driver(shost-parent_driver); Just a nit, for a general case solution dev can be null for drivers who live on a bus that is not converted to the driver model and may not have a parent. Okay, so the patch needs to test for (dev != NULL) before making the assignment. The idea is still sound. So here's how things end up. The scanning thread owns shost-scan_mutex and is waiting for the error handler to finish. The EH thread is executing usb-storage's bus_reset routine and is waiting to acquire the device semaphore. USB's khubd thread owns the device semaphore and has invoked the usb-storage disconnect routine. Among other things, this routine calls scsi_remove_host, which tries to acquire the scan_mutex. How should this deadlock be resolved? The current code has an extremely inelegant solution, and I would like to find a better one. Any ideas? Alan Stern I do not mean to push everything down into the LLDD, but in this case it is unclear what type of protection the scsi mid layer could add to protect against unknown future events while scanning. Is the current inelegant solution in the usb storage driver using some form of state model. It would appear that if it is not that a state model that uses a spin lock or something other than device semaphore to determine a disconnect was happening during a reset or vis-versa would be a good idea. On Mon, 8 Aug 2005, Stefan Richter wrote: Can't the eh_*_reset_handler use down_*_trylock? If the semaphore was already down, there should be a means for the reset handler to figure out the reason so that it can back out in an appropriate way. I hope there is a limited set of reasons... To answer both of you at once... The current code does use down_*_trylock, and it does check the device state for disconnect-in-progress. If things don't work out, it sleeps for a short while and then tries again. Like I said, it's inelegant. The underlying problem is that as things stand, the natural order of locking is device semaphore first, scan_mutex second -- that's what happens during a disconnect. (It's also what would happen during a probe, if we performed device scanning from the probe routine.) But when the error handler tries to do a bus reset during scanning, the locking order is reversed. It's a hard problem. The only reason I've kept the current code is because USB port resets occur very infrequently. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
Alan Stern wrote: On Mon, 8 Aug 2005, Stefan Richter wrote: But after all high-level drivers were detached (and that should have happend right before scsi_remove_host returns) I don't see why the host's ref count might not be down to zero. There might still be outstanding references: processes holding files open, that sort of thing. But as long as files are open etc., scsi high-level drivers are 'in use', cannot be detached, and scsi_remove_host does not return. Or that's what I assumed so far. -- Stefan Richter -=-=-=-= =--- -=--= http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
Alan Stern wrote: The current code does use down_*_trylock, and it does check the device state for disconnect-in-progress. If things don't work out, it sleeps for a short while and then tries again. I think the error handler should rather give SUCCESS back to scsi if it knows the device was or is being disconnected. After that, commands would be enqueued again, and these commands would be immediately done() with an appropriate result code. IMO the error handler should return SUCCESS rather than FAILED because, if the device is diconnected, there is simply nothing left to do that could advance recovery from error. Retries (in the error handler) will not help with error recovery either. Or do you retry in hope that the device might be reconnected and then a device reset should happen? A more general thought: It appears from the example you explained that maybe the USB highlevel carries too much of a burden to watch out for concurrency. Tasks like to mutually exclude port resets and disconnects --- which seems to me to be a task unspecific to highlevel protocols, correct me if I'm wrong --- might be better moved into the USB core. Then the highlevel would have to worry less about how exactly to avoid races and deadlocks, only about _what_ to do if there is a conflict. (The kind of conflict should be indicated by the core by means of return values or status flags or the like.) But I am of course speaking without any knowledge about USB and Linux' USB stack. -- Stefan Richter -=-=-=-= =--- -=--= http://arcgraph.de/sr/ - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] SPI transport class and generic Domain Validation for fusion
On Monday, August 08, 2005 3:47 PM, James Bottomley wrote: Eric, This attached patch should do DV on both physical devices and the underlying devices of fusion IM assemblies (providing you apply it on top of the prior underlying device exposure patch), which, I believe was your only outstanding concern with the last generic DV patch. There is one slight unsightly piece: the IM device still attaches to the transport class however all the parameters it shows actually belong to the underlying device at that id ... I could do with finding a way of persuading the SPI transport class not to attach to RAID devices. Since this addresses all of LSI's prior concerns, may I now apply it? Roy/Laura/Roy/Steve, and others - This is a patch to remove our internal dv(domain validation) code and replace with generic dv implemention in the linux kernel. This is a year old push from kernel.org , for the drivers in upstream kernel. James: I've not been able to review this patch, nor the other one you sent. On concern is whether spi transport handle asyn events? Meaning will it do domain validation on RAID1 volume - for a new drive that was hot swapped with a good disk? In the driver look at mptscsih_event_process - this code is handling a aync event from the firmware telling the driver to perform dv on disk that was just added. Pls don't rush this into the kernel untill we have time to review, and/or talk to our internal test teams on test effort, and/or talk to our customers explaning the risk in removing this code. Eric Moore - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.13-rc5-mm1 doesnt boot on x86_64
On Mon, 2005-08-08 at 10:42 -0700, Andrew Morton wrote: -mm has extra list_head debugging goodies. I'd be suspecting a list_head corruption detected somewhere under spi_release_transport(). Aha, looking in wrong driver ... the problem actually appears to be a double release of the transport template in aic79xx. Try this patch James diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c b/drivers/scsi/aic7xxx/aic79xx_osm.c --- a/drivers/scsi/aic7xxx/aic79xx_osm.c +++ b/drivers/scsi/aic7xxx/aic79xx_osm.c @@ -2326,8 +2326,6 @@ done: return (retval); } -static void ahd_linux_exit(void); - static void ahd_linux_set_width(struct scsi_target *starget, int width) { struct Scsi_Host *shost = dev_to_shost(starget-dev.parent); @@ -2772,7 +2770,7 @@ ahd_linux_init(void) if (ahd_linux_detect(aic79xx_driver_template) 0) return 0; spi_release_transport(ahd_linux_transport_template); - ahd_linux_exit(); + return -ENODEV; } diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c b/drivers/scsi/aic7xxx/aic7xxx_osm.c --- a/drivers/scsi/aic7xxx/aic7xxx_osm.c +++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c @@ -2331,8 +2331,6 @@ ahc_platform_dump_card_state(struct ahc_ { } -static void ahc_linux_exit(void); - static void ahc_linux_set_width(struct scsi_target *starget, int width) { struct Scsi_Host *shost = dev_to_shost(starget-dev.parent); - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Synchronizing scsi_remove_host and the error handler
--- Alan Stern [EMAIL PROTECTED] wrote: I think that scanning is a special process which should involve the minimum of error handling, by either ignoring errors and trying to connect to the device anyway, or on the first error, give up the device. Which policy would one follow depends on the transport. No, that's not feasible. We can't just ignore errors, and we do have to cope with them. Scanning is a particular vulnerable time, since it involves sending commands that don't occur most of the time during normal operation. Your reasoning is correct, but your conclusion is not. Exactly, scanning is a particularly vulnerable time and this is why we do not want the full I_T nexus error recovery. At scanning time we're just probing here and there. If the latter, you need to blacklist the device as not supporting TUR. Then on any error, like you pulling the cable during scanning, the scanning process will give up and all will be well. You can't blacklist devices you don't know about. The kernel should work regardless. Hmm, I thought there was a black listing somewhere in scsi, maybe /proc/scsi/device_info? So then you do scheme #1 as I described: ignore as much as possible and try to establish an I_T nexus and _then_ try to poke around with the device. Luben - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 5003] New: Problem with symbios driver on recent -mm trees
On Fri, 2005-08-05 at 07:36 -0700, Martin J. Bligh wrote: Howcome it works on all mainline kernels, and not -mm then? ;-) Did we fix an error path to detect failures, maybe? Well, OK, it might be something to do with your drives trying to negotiate IU and QAS. Support for this was added to the sym2 driver but never verified (because no-one seemed to have drives that could do it). The attached should stop the driver from negotiating these two parameters, if you could try it (it will produce complaints about static functions defined but not used, but you can ignore them). Nope, is the same as before with this patch M. James diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c --- a/drivers/scsi/sym53c8xx_2/sym_glue.c +++ b/drivers/scsi/sym53c8xx_2/sym_glue.c @@ -2122,10 +2122,12 @@ static struct spi_function_template sym2 .show_width = 1, .set_dt = sym2_set_dt, .show_dt= 1, +#if 0 .set_iu = sym2_set_iu, .show_iu= 1, .set_qas= sym2_set_qas, .show_qas = 1, +#endif .get_signalling = sym2_get_signalling, }; - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html