[PATCH] Bug 4940 Repeatable Kernel Panic on Adaptec 2015S I20 device on bootup

2005-08-08 Thread Salyzyn, Mark
Suitable for both 2.4 and 2.6 version of the driver. Applies to
scsi-misc-2.6 git tree.

Prevent driver from loading if another driver (i2o) has already claimed
the resources associated with the card. Discussion associated with this
bug can be referenced at http://bugzilla.kernel.org/show_bug.cgi?id=4940
where it was agreed to use pci_request_regions in both the dpt_i2o and
the i2o driver to prevent both drivers loading on the same adapter(s).

Signed-off-by: Mark Salyzyn [EMAIL PROTECTED]

Index: a/drivers/scsi/dpt_i2o.c
===
--- a/drivers/scsi/dpt_i2o.c2005-07-26 11:42:03.0 -0400
+++ b/drivers/scsi/dpt_i2o.c2005-08-08 09:50:33.247595544 -0400
@@ -908,8 +908,12 @@
}
 
-
+   if (pci_request_regions(pDev)) {
+   PERROR(dpti: adpt_config_hba: pci request region
failed\n);
+   return -EINVAL;
+   }
base_addr_virt = ioremap(base_addr0_phys,hba_map0_area_size);
if (!base_addr_virt) {
+   pci_release_regions(pDev);
PERROR(dpti: adpt_config_hba: io remap failed\n);
return -EINVAL;
}
@@ -919,6 +924,7 @@
if (!msg_addr_virt) {
PERROR(dpti: adpt_config_hba: io remap failed
on BAR1\n);
iounmap(base_addr_virt);
+   pci_release_regions(pDev);
return -EINVAL;
}
} else {
@@ -932,6 +938,7 @@
iounmap(msg_addr_virt);
}
iounmap(base_addr_virt);
+   pci_release_regions(pDev);
return -ENOMEM;
}
memset(pHba, 0, sizeof(adpt_hba));
@@ -1027,6 +1034,7 @@
up(adpt_configuration_lock);
 
iounmap(pHba-base_addr_virt);
+   pci_release_regions(pHba-pDev);
if(pHba-msg_addr_virt != pHba-base_addr_virt){
iounmap(pHba-msg_addr_virt);
}

Sincerely -- Mark Salyzyn
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Bug 4940 Repeatable Kernel Panic on Adaptec 2015S I20 device on bootup

2005-08-08 Thread Alan Cox
On Mon, Aug 08, 2005 at 03:16:18PM +0100, Christoph Hellwig wrote:
 Please either update the driver to use the pci_driver model or even
 better remove it completely and let everyone use the i2o drivers now
 that they have full 64bit dma and managment support.

In the mean time. I ack the fix for what we have now. I don't see the point
of fixing dpt_i2o much further given in another 6 months your wish can probably
come true.

Alan

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.13-rc5-mm1 doesnt boot on x86_64

2005-08-08 Thread Ashok Raj
On Mon, Aug 08, 2005 at 07:11:26PM +0200, Andi Kleen wrote:
 On Mon, Aug 08, 2005 at 09:48:19AM -0700, Ashok Raj wrote:
  Folks,
  
  Iam getting this on the recent 2.6.12-rc5-mm1 kernel built with defconfig. 
  
  Cheers,
  Ashok Raj
  
  --- [cut here ] - [please bite here ] -
  Kernel BUG at include/linux/list.h:165
  invalid operand:  [1] SMP
  CPU 2
  Modules linked in:
  Pid: 1, comm: swapper Not tainted 2.6.13-rc5-mm1
  RIP: 0010:[802b9ef4] 
  802b9ef4{attribute_container_unregist}RSP: 0018:8100bfb63f00  
  EFLAGS: 00010283
  RAX: 8100bfbd4c58 RBX: 8100bfbd4c00 RCX: 804e6600
  RDX: 00200200 RSI:  RDI: 804e6600
  RBP:  R08: 8100bfbd4c48 R09: 0020
  R10:  R11: 8019baa0 R12: 80100190
  R13:  R14: 8010 R15: 80627fb0
  FS:  () GS:80616980() knlGS:
  CS:  0010 DS: 0018 ES: 0018 CR0: 8005003b
  CR2:  CR3: 00101000 CR4: 06e0
  Process swapper (pid: 1, threadinfo 8100bfb62000, task 8100bfb614d0)
  Stack: 8032643d  8064499f 80100190
 80651288  8010b249 0246
 00020800 804ae180
  Call Trace:8032643d{spi_release_transport+13} 
  8064499f{ahd}   8010b249{init+505} 
  8010e896{child_rip+8}
 8010b050{init+0} 8010e88e{child_rip+0}
 
 Looks like a SCSI problem. The machine has an Adaptec SCSI adapter, right?

Yep, its adaptec problem

Actually i dont need AIX7XXX, since my system requires only CONFIG_FUSION.
I turned that option off, and it seems to boot fine now.

Ashok


 
 -AndI
  
  
  Code: 0f 0b a3 e1 d9 44 80 ff ff ff ff c2 a5 00 49 8b 00 4c 39 40
  RIP 802b9ef4{attribute_container_unregister+52} RSP 
  8100bfb63f0 0Kernel panic - not syncing: Attempted to kill init!
  

-- 
Cheers,
Ashok Raj
- Open Source Technology Center
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.13-rc5-mm1 doesnt boot on x86_64

2005-08-08 Thread Ashok Raj
On Mon, Aug 08, 2005 at 12:33:29PM -0500, James Bottomley wrote:
 On Mon, 2005-08-08 at 19:11 +0200, Andi Kleen wrote:
  Looks like a SCSI problem. The machine has an Adaptec SCSI adapter, right?
 
 The traceback looks pretty meaningless.
 
 What was happening on the machine before this.  i.e. was it booting up,
 in which case can we have the prior dmesg file; or was the aic79xxx
 driver being removed?

I can get the trace again, but basically the system was booting. 

AIC_7XXX was defined in defconfig, but my system doesnt have it. Seems like
the senario was the driver tried to probe, found nothing, and tries
to de-reg resulting in the BUG().

I will try to get the recompile and entire dmesg log in the meantime.
 
 James
 
 

-- 
Cheers,
Ashok Raj
- Open Source Technology Center
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.13-rc5-mm1 doesnt boot on x86_64

2005-08-08 Thread Andrew Morton
James Bottomley [EMAIL PROTECTED] wrote:

 On Mon, 2005-08-08 at 19:11 +0200, Andi Kleen wrote:
  Looks like a SCSI problem. The machine has an Adaptec SCSI adapter, right?
 
 The traceback looks pretty meaningless.
 
 What was happening on the machine before this.  i.e. was it booting up,
 in which case can we have the prior dmesg file; or was the aic79xxx
 driver being removed?
 

-mm has extra list_head debugging goodies.  I'd be suspecting a list_head
corruption detected somewhere under spi_release_transport().


--- 25/include/linux/list.h~list_del-debug  2005-03-08 11:40:27.0 
-0800
+++ 25-akpm/include/linux/list.h2005-03-08 11:40:49.0 -0800
@@ -5,7 +5,9 @@
 
 #include linux/stddef.h
 #include linux/prefetch.h
+#include linux/kernel.h
 #include asm/system.h
+#include asm/bug.h
 
 /*
  * These are non-NULL pointers that will result in page faults
@@ -160,6 +162,8 @@ static inline void __list_del(struct lis
  */
 static inline void list_del(struct list_head *entry)
 {
+   BUG_ON(entry-prev-next != entry);
+   BUG_ON(entry-next-prev != entry);
__list_del(entry-prev, entry-next);
entry-next = LIST_POISON1;
entry-prev = LIST_POISON2;
_

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Luben Tuikov
On 08/07/05 11:36, James Bottomley wrote:
 On Sun, 2005-08-07 at 10:59 -0400, Alan Stern wrote:
 
What sort of synchronization is there between scsi_remove_host and the 
error-handler thread?  Offhand I can see two possible problems, depending 
on how the LLD is written:
 
 
 There isn't any by design.
 
 I think you're not thinking about how this works correctly.  What remove
 host does is loop over the active devices removing them from visibility
 and trying to do a final put on the generic devices before removing the
 host from visibility and doing a final put on it.
 
 However, any outstanding user will have a reference and will keep all
 the bits of the hierarchy in place until that reference is relinquished.

Which automatically implies that any such entity (holding a ref)
trying to do any kind of action to what it is holding, should get
an error result, else it would be misled to believe that things are ok,
when in fact the whole thing is coming down...

Luben
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Mike Anderson
Luben Tuikov [EMAIL PROTECTED] wrote:
 On 08/07/05 17:57, James Bottomley wrote:
  Alan Stern wrote:
 The only resource that matters for this discussion is associated with the
 LLD itself, not with any of its hosts: the host template.  Once the SCSI
 core has released all references to the template, it can't call the LLD
 any more.  The problem is that the LLD has no way to know when all the
 references have been dropped.  This suggests that the entire problem could
 be solved by adding a kref to struct scsi_host_template.
 
 Would you agree to a patch adding such a kref?
  
  
  Well, not really.  The host template usually exists as a variable in the
  module, so its lifetime is tied to the lifetime of the module.  Adding a
  kref wouldn't help because it will still be freed when the module is
  removed.  If there's a case where the template is being freed
  prematurely because the module is being removed, then we have the module
  refcounting wrong somewhere.  Have you run across such a case?
 
 Hmm, I think Alan has a point.
 
 From object point of view, who is the parent and who is the child
 when talking about LLDD/module and the host template?
 
 The reason is that the host template could be simulated on behalf
 of the underlying transport (as is the case for USB).
 
 So, it could be the case that the host template _should_ be
 removed but the entity which removed it should stay, and that
 entity wants to know when the host template is to be removed.
 
 (Actually it doesn't, but setting the release method in the
 kobj_ktype would do wonders. ;-) )
 
 In which case, it does make sense to include a kset to the
 host template (since it has many children), and anyone
 using/manipulating it does kset_get().  When that entity
 is done with the host template it does a kset_put().
 
 Such entities could be the managing layer above, error
 handling, etc.
 
 Another solution, just as good, is to use the template
 _only_ for the registration call, as a _template_, and
 as soon as the registration call has returned, the
 caller (LLDD/module) can free the template.  After all,
 it is only a template.

While you could do this I believe this would not solve Alan's question as
if I read correctly it involved the functions called through the host
template (i.e., through shost-hostt).

 
 Then the managing layer allocates the actual host struct
 and gives it to the LLDD.  The LLDD goes a kset_get()
 on it while it lives, and when it is to die, it does a
 kset_put().  And if it gets a method call, it would
 error it out, after the put...
 

We already have kobject for the scsi_host object so it is unclear why
switch to a kset. In looking at the kernel tree I do not see any users of
ksets outside the driver model subsystems. I believe James suggestion of
adding to scsi_host_dev_release would allow the LLDD to get a better
indicator of release cleanup without changing the LLDD interface to use
ksets.



-andmike
--
Michael Anderson
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Mike Anderson
James Bottomley [EMAIL PROTECTED] wrote:
 On Sun, 2005-08-07 at 14:43 -0400, Alan Stern wrote:
  What host device release method?  scsi_host_template-release is marked 
  OBSOLETE and for use only with old-style drivers.  scsi_host_dev_release 
  ^
 This is the one I was thinking of adding to.

Is the thought here that if a LLDD provided some new scsi_host_template
function we would call this from scsi_host_dev_release?

-andmike
--
Michael Anderson
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Expose the underlying physical disks of Fusion Integrated RAID devices

2005-08-08 Thread James Bottomley
This patch is actually deceptively simple for what it does.

For all fusion devices with integrated raid devices, we make the card
pretend it has two channels, then all I/O on channel 1 is directed to
the underlying physical discs of the integrated raid assembly.  The net
effect is that all the physical discs show up correctly on virtual
channel 1 (we also specify no_uld_attach for anything on virtual channel
1 so that they can only be accessed using ioctls to the sg device).

The net effect is something like this:

mptbase: Initiating ioc22 bringup
ioc22: 53C1030: Capabilities={Initiator}
scsi27 : ioc22: LSI53C1030, FwRev=01032920h, Ports=1, MaxQ=222, IRQ=57
  Vendor: LSILOGIC  Model: 1030  IM  Rev: 1000
  Type:   Direct-Access  ANSI SCSI revision: 02
SCSI device sdb: 17813504 512-byte hdwr sectors (9121 MB)
SCSI device sdb: drive cache: write through
SCSI device sdb: 17813504 512-byte hdwr sectors (9121 MB)
SCSI device sdb: drive cache: write through
 sdb: unknown partition table
Attached scsi disk sdb at scsi27, channel 0, id 0, lun 0
Attached scsi generic sg1 at scsi27, channel 0, id 0, lun 0,  type 0
  Vendor: QUANTUM   Model: ATLAS IV 9 WLSRev: 0B0B
  Type:   Direct-Access  ANSI SCSI revision: 03
Attached scsi generic sg2 at scsi27, channel 1, id 0, lun 0,  type 0
  Vendor: QUANTUM   Model: ATLAS IV 9 WLSRev: 0B0B
  Type:   Direct-Access  ANSI SCSI revision: 03
Attached scsi generic sg3 at scsi27, channel 1, id 1, lun 0,  type 0

where you can see that sdb is the RAID device and sg2 and sg3 the
underlying SCSI discs.

James

diff --git a/drivers/message/fusion/mptscsih.c 
b/drivers/message/fusion/mptscsih.c
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -171,6 +171,7 @@ static void mptscsih_fillbuf(char *buffe
 
 void   mptscsih_remove(struct pci_dev *);
 void   mptscsih_shutdown(struct pci_dev *);
+static int mptscsih_is_raid_volume(MPT_SCSI_HOST *hd, uint id);
 #ifdef CONFIG_PM
 intmptscsih_suspend(struct pci_dev *pdev, pm_message_t state);
 intmptscsih_resume(struct pci_dev *pdev);
@@ -1274,6 +1275,12 @@ mptscsih_qcmd(struct scsi_cmnd *SCpnt, v
return SCSI_MLQUEUE_HOST_BUSY;
}
 
+   if (SCpnt-device-channel  !mptscsih_is_raid_volume(hd, target)) {
+   SCpnt-result = DID_NO_CONNECT  16;
+   done(SCpnt);
+   return 0;
+   }
+
/*
 *  Put together a MPT SCSI request...
 */
@@ -1318,9 +1325,12 @@ mptscsih_qcmd(struct scsi_cmnd *SCpnt, v
/* Use the above information to set up the message frame
 */
pScsiReq-TargetID = (u8) target;
-   pScsiReq-Bus = (u8) SCpnt-device-channel;
+   pScsiReq-Bus = 0;
pScsiReq-ChainOffset = 0;
-   pScsiReq-Function = MPI_FUNCTION_SCSI_IO_REQUEST;
+   if (SCpnt-device-channel)
+   pScsiReq-Function = MPI_FUNCTION_RAID_SCSI_IO_PASSTHROUGH;
+   else
+   pScsiReq-Function = MPI_FUNCTION_SCSI_IO_REQUEST;
pScsiReq-CDBLength = SCpnt-cmd_len;
pScsiReq-SenseBufferLength = MPT_SENSE_BUFFER_SIZE;
pScsiReq-Reserved = 0;
@@ -2145,6 +2155,9 @@ mptscsih_slave_alloc(struct scsi_device 
if (hd == NULL)
return -ENODEV;
 
+   if (device-channel)
+   device-no_uld_attach = 1;
+
if ((vdev = hd-Targets[target]) != NULL)
goto out;
 
diff --git a/drivers/message/fusion/mptspi.c b/drivers/message/fusion/mptspi.c
--- a/drivers/message/fusion/mptspi.c
+++ b/drivers/message/fusion/mptspi.c
@@ -237,7 +237,10 @@ mptspi_probe(struct pci_dev *pdev, const
sh-max_id = MPT_MAX_SCSI_DEVICES;
 
sh-max_lun = MPT_LAST_LUN + 1;
-   sh-max_channel = 0;
+   if (ioc-spi_data.isRaid)
+   sh-max_channel = 1;
+   else
+   sh-max_channel = 0;
sh-this_id = ioc-pfacts[0].PortSCSIID;
 
/* Required entry.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] aacraid: adapter support update

2005-08-08 Thread Mark Haverkamp
Received from Mark Salyzyn

This patch adds the product ID for the ICP9067MA adapter.

The entries for the ICP9085LI, ICP5085BR, IBM8k  ASR4810SAS were
incorrect and would not initialize the adapters correctly.

Applies to scsi-misc-2.6 git tree

Signed-off-by: Mark Haverkamp [EMAIL PROTECTED]


Index: scsi-misc-aac-1/drivers/scsi/aacraid/linit.c
===
--- scsi-misc-aac-1.orig/drivers/scsi/aacraid/linit.c   2005-08-08 
09:11:56.0 -0700
+++ scsi-misc-aac-1/drivers/scsi/aacraid/linit.c2005-08-08 
11:04:51.0 -0700
@@ -120,36 +120,39 @@
{ 0x9005, 0x0286, 0x9005, 0x02a3, 0, 0, 29 }, /* ICP5085AU (Hurricane) 
*/
{ 0x9005, 0x0285, 0x9005, 0x02a4, 0, 0, 30 }, /* ICP9085LI (Marauder-X) 
*/
{ 0x9005, 0x0285, 0x9005, 0x02a5, 0, 0, 31 }, /* ICP5085BR (Marauder-E) 
*/
-   { 0x9005, 0x0287, 0x9005, 0x0800, 0, 0, 32 }, /* Themisto Jupiter 
Platform */
-   { 0x9005, 0x0200, 0x9005, 0x0200, 0, 0, 32 }, /* Themisto Jupiter 
Platform */
-   { 0x9005, 0x0286, 0x9005, 0x0800, 0, 0, 33 }, /* Callisto Jupiter 
Platform */
-   { 0x9005, 0x0285, 0x9005, 0x028e, 0, 0, 34 }, /* ASR-2020SA SATA PCI-X 
ZCR (Skyhawk) */
-   { 0x9005, 0x0285, 0x9005, 0x028f, 0, 0, 35 }, /* ASR-2025SA SATA 
SO-DIMM PCI-X ZCR (Terminator) */
-   { 0x9005, 0x0285, 0x9005, 0x0290, 0, 0, 36 }, /* AAR-2410SA PCI SATA 
4ch (Jaguar II) */
-   { 0x9005, 0x0285, 0x1028, 0x0291, 0, 0, 37 }, /* CERC SATA RAID 2 PCI 
SATA 6ch (DellCorsair) */
-   { 0x9005, 0x0285, 0x9005, 0x0292, 0, 0, 38 }, /* AAR-2810SA PCI SATA 
8ch (Corsair-8) */
-   { 0x9005, 0x0285, 0x9005, 0x0293, 0, 0, 39 }, /* AAR-21610SA PCI SATA 
16ch (Corsair-16) */
-   { 0x9005, 0x0285, 0x9005, 0x0294, 0, 0, 40 }, /* ESD SO-DIMM PCI-X SATA 
ZCR (Prowler) */
-   { 0x9005, 0x0285, 0x103C, 0x3227, 0, 0, 41 }, /* AAR-2610SA PCI SATA 
6ch */
-   { 0x9005, 0x0285, 0x9005, 0x0296, 0, 0, 42 }, /* ASR-2240S 
(SabreExpress) */
-   { 0x9005, 0x0285, 0x9005, 0x0297, 0, 0, 43 }, /* ASR-4005SAS */
-   { 0x9005, 0x0285, 0x1014, 0x02F2, 0, 0, 44 }, /* IBM 8i (AvonPark) */
-   { 0x9005, 0x0285, 0x1014, 0x0312, 0, 0, 44 }, /* IBM 8i (AvonPark Lite) 
*/
-   { 0x9005, 0x0285, 0x9005, 0x0298, 0, 0, 45 }, /* ASR-4000SAS 
(BlackBird) */
-   { 0x9005, 0x0285, 0x9005, 0x0299, 0, 0, 46 }, /* ASR-4800SAS 
(Marauder-X) */
-   { 0x9005, 0x0285, 0x9005, 0x029a, 0, 0, 47 }, /* ASR-4805SAS 
(Marauder-E) */
-   { 0x9005, 0x0286, 0x9005, 0x02a2, 0, 0, 48 }, /* ASR-4810SAS (Hurricane 
*/
-
-   { 0x9005, 0x0285, 0x1028, 0x0287, 0, 0, 49 }, /* Perc 320/DC*/
-   { 0x1011, 0x0046, 0x9005, 0x0365, 0, 0, 50 }, /* Adaptec 5400S 
(Mustang)*/
-   { 0x1011, 0x0046, 0x9005, 0x0364, 0, 0, 51 }, /* Adaptec 5400S 
(Mustang)*/
-   { 0x1011, 0x0046, 0x9005, 0x1364, 0, 0, 52 }, /* Dell PERC2/QC */
-   { 0x1011, 0x0046, 0x103c, 0x10c2, 0, 0, 53 }, /* HP NetRAID-4M */
-
-   { 0x9005, 0x0285, 0x1028, PCI_ANY_ID, 0, 0, 54 }, /* Dell Catchall */
-   { 0x9005, 0x0285, 0x17aa, PCI_ANY_ID, 0, 0, 55 }, /* Legend Catchall */
-   { 0x9005, 0x0285, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 56 }, /* Adaptec Catch 
All */
-   { 0x9005, 0x0286, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 57 }, /* Adaptec Rocket 
Catch All */
+   { 0x9005, 0x0286, 0x9005, 0x02a6, 0, 0, 32 }, /* ICP9067MA (Intruder-6) 
*/
+   { 0x9005, 0x0287, 0x9005, 0x0800, 0, 0, 33 }, /* Themisto Jupiter 
Platform */
+   { 0x9005, 0x0200, 0x9005, 0x0200, 0, 0, 33 }, /* Themisto Jupiter 
Platform */
+   { 0x9005, 0x0286, 0x9005, 0x0800, 0, 0, 34 }, /* Callisto Jupiter 
Platform */
+   { 0x9005, 0x0285, 0x9005, 0x028e, 0, 0, 35 }, /* ASR-2020SA SATA PCI-X 
ZCR (Skyhawk) */
+   { 0x9005, 0x0285, 0x9005, 0x028f, 0, 0, 36 }, /* ASR-2025SA SATA 
SO-DIMM PCI-X ZCR (Terminator) */
+   { 0x9005, 0x0285, 0x9005, 0x0290, 0, 0, 37 }, /* AAR-2410SA PCI SATA 
4ch (Jaguar II) */
+   { 0x9005, 0x0285, 0x1028, 0x0291, 0, 0, 38 }, /* CERC SATA RAID 2 PCI 
SATA 6ch (DellCorsair) */
+   { 0x9005, 0x0285, 0x9005, 0x0292, 0, 0, 39 }, /* AAR-2810SA PCI SATA 
8ch (Corsair-8) */
+   { 0x9005, 0x0285, 0x9005, 0x0293, 0, 0, 40 }, /* AAR-21610SA PCI SATA 
16ch (Corsair-16) */
+   { 0x9005, 0x0285, 0x9005, 0x0294, 0, 0, 41 }, /* ESD SO-DIMM PCI-X SATA 
ZCR (Prowler) */
+   { 0x9005, 0x0285, 0x103C, 0x3227, 0, 0, 42 }, /* AAR-2610SA PCI SATA 
6ch */
+   { 0x9005, 0x0285, 0x9005, 0x0296, 0, 0, 43 }, /* ASR-2240S 
(SabreExpress) */
+   { 0x9005, 0x0285, 0x9005, 0x0297, 0, 0, 44 }, /* ASR-4005SAS */
+   { 0x9005, 0x0285, 0x1014, 0x02F2, 0, 0, 45 }, /* IBM 8i (AvonPark) */
+   { 0x9005, 0x0285, 0x1014, 0x0312, 0, 0, 45 }, /* IBM 8i (AvonPark Lite) 
*/
+   { 0x9005, 0x0286, 0x1014, 0x9580, 0, 0, 46 }, /* IBM 8k/8k-l8 (Aurora) 
*/
+   { 0x9005, 0x0286, 0x1014, 0x9540, 0, 0, 47 }, /* IBM 8k/8k-l4 (Aurora 
Lite) */
+   { 0x9005, 0x0285, 0x9005, 0x0298, 0, 0, 48 }, /* 

Re: Netlink allocation for iSCSI and others

2005-08-08 Thread David S. Miller
From: Patrick McHardy [EMAIL PROTECTED]
Date: Mon, 08 Aug 2005 23:19:37 +0200

 David S. Miller wrote:
  So we can increase MAX_LINKS to 256 and that's what I think I will do
  for 2.6.14 unless there is a very serious objection.  The tables sized
  by MAX_LINKS in af_netlink.c are dynamically allocated, and the only
  linear iterations over MAX_LINKS are for the netlink socket procfs
  seq-file dumper, so it's not a performance issue either.
 
 I think we should increase it when allocating new numbers to save the
 unused memory for the larger nltable and additional pid hashes.
 Userspace shouldn't care if we change it.

Agreed.  So we have 17 netlink numbers to allocate at this
point, and that should be good for a while.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Mike Anderson
Alan Stern [EMAIL PROTECTED] wrote:
 Here you are wrong.  In fact the core makes no such guarantees.  It _will_
 try to enter the host (for things like telling disk drives to flush
 their caches) for as long as it retains a reference to the host structure.

Well post scsi_remove_host returning no commands can be sent through the
LLDD's queuecommand. There is still the issue you previously mentioned
about a possible race of the scsi error handler and a call to
scsi_remove_host.


 I tried to provoke such a case, but failed.  Evidently I've been barking
 up the wrong tree.  Normally the SCSI core doesn't call the LLD unless
 some process has opened a device file for something on that host.  This
 will automatically do try_module_get on the LLD, making it impossible for
 the LLD to be removed from memory.
 
 I'm not certain this reasoning is 100% reliable, though -- does it cover
 _every_ case where the core calls the LLD?  Maybe something like this 
 little patch would be a good idea:

In USB is a driver controlling more than one host instance. As I believe
the suggestion that James indicated as a possible addition would be at per
host instance granularity and your solution below looks like a
notification once all instances controlled by a driver are gone.

 --- usb-2.6.orig/drivers/scsi/hosts.c
 +++ usb-2.6/drivers/scsi/hosts.c
 @@ -205,6 +205,9 @@ int scsi_add_host(struct Scsi_Host *shos
   goto out_destroy_host;
  
   scsi_proc_host_add(shost);
 + shost-parent_driver = dev-driver;
 + if (shost-parent_driver)
 + get_driver(shost-parent_driver);

Just a nit, for a general case solution dev can be null for drivers who live on 
a
bus that is not converted to the driver model and may not have a parent.

 But if you think this isn't needed, then okay.
 
 There is an exceptional case: device scanning.  During scanning, there is
 nothing to directly prevent the LLD from being unloaded.  There is an
 indirect protection, however, because the scanning thread will own
 shost-scan_mutex, and scsi_remove_host acquires the mutex before
 returning.

I think in the past someone has indicated that we really should be getting
refs during scanning to hold things in place. I guess as you note above
and below scan_mutex is holding things in place.


 memory.  In fact, I've been using the template as kind of a surrogate,
 standing for all of the LLD's module -- so long as one is present, so is
 the other.
 
 
 I still have one related question.  This is a little bit off to one 
 side, but maybe you folks can suggest possible solutions.  The question 
 concerns a deadlock I _was_ able to generate earlier today with a patched
 usb-storage.
 
 My USB mass-storage test device doesn't respond to TEST UNIT READY, so it
 causes a timeout and kicks the error handler into action.  This happens 
 during device scanning, just prior to reading the partition table.  The 
 error handler goes through various stages of processing, leading up to a 
 bus reset.  I disconnected the USB device just before the bus reset 
 routine was called.
 
 Now, usb-storage implements a SCSI bus reset by actually performing a
 USB port reset.  The USB subsystem requires the caller to acquire a
 device-specific semaphore before doing a port reset, and the subsystem
 itself acquires this same semaphore when notifying drivers about a
 disconnection.  (The idea is that we don't want drivers trying to handle
 a disconnect and a reset on the same device at the same time.)
 
 So here's how things end up.
 
   The scanning thread owns shost-scan_mutex and is waiting
   for the error handler to finish.
 
   The EH thread is executing usb-storage's bus_reset routine
   and is waiting to acquire the device semaphore.
 
   USB's khubd thread owns the device semaphore and has invoked
   the usb-storage disconnect routine.  Among other things, this
   routine calls scsi_remove_host, which tries to acquire the
   scan_mutex.
 
 How should this deadlock be resolved?  The current code has an extremely 
 inelegant solution, and I would like to find a better one.  Any ideas?
 
 Alan Stern
 

I do not mean to push everything down into the LLDD, but in this case it
is unclear what type of protection the scsi mid layer could add to protect
against unknown future events while scanning. Is the current inelegant
solution in the usb storage driver using some form of state model. It
would appear that if it is not that a state model that uses a spin lock or
something other than device semaphore to determine a disconnect was
happening during a reset or vis-versa would be a good idea.

-andmike
--
Michael Anderson
[EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Luben Tuikov
On 08/08/05 16:41, Alan Stern wrote:
 I still have one related question.  This is a little bit off to one 
 side, but maybe you folks can suggest possible solutions.  The question 
 concerns a deadlock I _was_ able to generate earlier today with a patched
 usb-storage.
 
 My USB mass-storage test device doesn't respond to TEST UNIT READY, so it
 causes a timeout and kicks the error handler into action.  This happens 
 during device scanning, just prior to reading the partition table.  The 
 error handler goes through various stages of processing, leading up to a 
 bus reset.  I disconnected the USB device just before the bus reset 
 routine was called.

I think that scanning is a special process which should involve
the minimum of error handling, by either ignoring errors and trying
to connect to the device anyway, or on the first error, give up
the device.  Which policy would one follow depends on the transport.

If the latter, you need to blacklist the device as not supporting
TUR.  Then on any error, like you pulling the cable during scanning,
the scanning process will give up and all will be well.

Luben
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread James Bottomley
On Mon, 2005-08-08 at 16:41 -0400, Alan Stern wrote:
   Would you agree to a patch adding such a kref?
  
  Well, not really.  The host template usually exists as a variable in the
  module, so its lifetime is tied to the lifetime of the module.  Adding a
  kref wouldn't help because it will still be freed when the module is
  removed.  If there's a case where the template is being freed
  prematurely because the module is being removed, then we have the module
  refcounting wrong somewhere.  Have you run across such a case?
 
 I tried to provoke such a case, but failed.  Evidently I've been barking
 up the wrong tree.  Normally the SCSI core doesn't call the LLD unless
 some process has opened a device file for something on that host.  This
 will automatically do try_module_get on the LLD, making it impossible for
 the LLD to be removed from memory.
 
 I'm not certain this reasoning is 100% reliable, though -- does it cover
 _every_ case where the core calls the LLD?  Maybe something like this 
 little patch would be a good idea:

Yes, I was considering something similar, since all the last put of a
driver does is send the completion that driver_unregister() should be
waiting for

 + struct device_driver*parent_driver;

I don't think we need this.  The underlying device has to be the parent
of shost_gendev, so you can get the parent driver as

shost_gendev.parent-driver

 + if (shost-parent_driver)
 + put_driver(shost-parent_driver);

And just before this would be the place to do the final release of all
the resources the HBA is holding.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Alan Stern
On Mon, 8 Aug 2005, Luben Tuikov wrote:

 On 08/08/05 16:41, Alan Stern wrote:
  I still have one related question.  This is a little bit off to one 
  side, but maybe you folks can suggest possible solutions.  The question 
  concerns a deadlock I _was_ able to generate earlier today with a patched
  usb-storage.
  
  My USB mass-storage test device doesn't respond to TEST UNIT READY, so it
  causes a timeout and kicks the error handler into action.  This happens 
  during device scanning, just prior to reading the partition table.  The 
  error handler goes through various stages of processing, leading up to a 
  bus reset.  I disconnected the USB device just before the bus reset 
  routine was called.
 
 I think that scanning is a special process which should involve
 the minimum of error handling, by either ignoring errors and trying
 to connect to the device anyway, or on the first error, give up
 the device.  Which policy would one follow depends on the transport.

No, that's not feasible.  We can't just ignore errors, and we do have to 
cope with them.  Scanning is a particular vulnerable time, since it 
involves sending commands that don't occur most of the time during normal 
operation.

 If the latter, you need to blacklist the device as not supporting
 TUR.  Then on any error, like you pulling the cable during scanning,
 the scanning process will give up and all will be well.

You can't blacklist devices you don't know about.  The kernel should work 
regardless.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Alan Stern
On Mon, 8 Aug 2005, Stefan Richter wrote:

  Here you are wrong.  In fact the core makes no such guarantees.  It _will_
  try to enter the host (for things like telling disk drives to flush
  their caches) for as long as it retains a reference to the host structure.
 
 Sure. But after all high-level drivers were detached (and that should
 have happend right before scsi_remove_host returns) I don't see why  
 the host's ref count might not be down to zero.

There might still be outstanding references: processes holding files open, 
that sort of thing.  It happens all the time in refcounting systems.


On Mon, 8 Aug 2005, Mike Anderson wrote:

 Well post scsi_remove_host returning no commands can be sent through the
 LLDD's queuecommand. There is still the issue you previously mentioned
 about a possible race of the scsi error handler and a call to
 scsi_remove_host.

Yes, and there's other pathway too: the procfs interface.


 In USB is a driver controlling more than one host instance. As I believe
 the suggestion that James indicated as a possible addition would be at per
 host instance granularity and your solution below looks like a
 notification once all instances controlled by a driver are gone.

That's right.  They are not the same thing.

  --- usb-2.6.orig/drivers/scsi/hosts.c
  +++ usb-2.6/drivers/scsi/hosts.c
  @@ -205,6 +205,9 @@ int scsi_add_host(struct Scsi_Host *shos
  goto out_destroy_host;
   
  scsi_proc_host_add(shost);
  +   shost-parent_driver = dev-driver;
  +   if (shost-parent_driver)
  +   get_driver(shost-parent_driver);
 
 Just a nit, for a general case solution dev can be null for drivers who live 
 on a
 bus that is not converted to the driver model and may not have a parent.

Okay, so the patch needs to test for (dev != NULL) before making the 
assignment.  The idea is still sound.


  So here's how things end up.
  
  The scanning thread owns shost-scan_mutex and is waiting
  for the error handler to finish.
  
  The EH thread is executing usb-storage's bus_reset routine
  and is waiting to acquire the device semaphore.
  
  USB's khubd thread owns the device semaphore and has invoked
  the usb-storage disconnect routine.  Among other things, this
  routine calls scsi_remove_host, which tries to acquire the
  scan_mutex.
  
  How should this deadlock be resolved?  The current code has an extremely 
  inelegant solution, and I would like to find a better one.  Any ideas?
  
  Alan Stern
  
 
 I do not mean to push everything down into the LLDD, but in this case it
 is unclear what type of protection the scsi mid layer could add to protect
 against unknown future events while scanning. Is the current inelegant
 solution in the usb storage driver using some form of state model. It
 would appear that if it is not that a state model that uses a spin lock or
 something other than device semaphore to determine a disconnect was
 happening during a reset or vis-versa would be a good idea.

On Mon, 8 Aug 2005, Stefan Richter wrote:

 Can't the eh_*_reset_handler use down_*_trylock? If the semaphore was
 already down, there should be a means for the reset handler to figure
 out the reason so that it can back out in an appropriate way. I hope 
 there is a limited set of reasons...

To answer both of you at once...  The current code does use
down_*_trylock, and it does check the device state for
disconnect-in-progress.  If things don't work out, it sleeps for a short
while and then tries again.  Like I said, it's inelegant.

The underlying problem is that as things stand, the natural order of
locking is device semaphore first, scan_mutex second -- that's what
happens during a disconnect.  (It's also what would happen during a probe,
if we performed device scanning from the probe routine.)  But when the
error handler tries to do a bus reset during scanning, the locking order
is reversed.  It's a hard problem.  The only reason I've kept the current
code is because USB port resets occur very infrequently.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Stefan Richter

Alan Stern wrote:

On Mon, 8 Aug 2005, Stefan Richter wrote:

But after all high-level drivers were detached (and that should
have happend right before scsi_remove_host returns) I don't see why  
the host's ref count might not be down to zero.


There might still be outstanding references: processes holding files 
open, that sort of thing.


But as long as files are open etc., scsi high-level drivers are
'in use', cannot be detached, and scsi_remove_host does not return.

Or that's what I assumed so far.
--
Stefan Richter
-=-=-=-= =--- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Stefan Richter

Alan Stern wrote:

The current code does use
down_*_trylock, and it does check the device state for
disconnect-in-progress.  If things don't work out, it sleeps
for a short while and then tries again.


I think the error handler should rather give SUCCESS back to
scsi if it knows the device was or is being disconnected.

After that, commands would be enqueued again, and these
commands would be immediately done() with an appropriate
result code.

IMO the error handler should return SUCCESS rather than
FAILED because, if the device is diconnected, there is
simply nothing left to do that could advance recovery
from error. Retries (in the error handler) will not help
with error recovery either. Or do you retry in hope that
the device might be reconnected and then a device reset
should happen?


A more general thought: It appears from the example you
explained that maybe the USB highlevel carries too much of a
burden to watch out for concurrency. Tasks like to mutually
exclude port resets and disconnects --- which seems to me to
be a task unspecific to highlevel protocols, correct me if
I'm wrong --- might be better moved into the USB core. Then
the highlevel would have to worry less about how exactly to
avoid races and deadlocks, only about _what_ to do if there
is a conflict. (The kind of conflict should be indicated by
the core by means of return values or status flags or the
like.)

But I am of course speaking without any knowledge about USB
and Linux' USB stack.
--
Stefan Richter
-=-=-=-= =--- -=--=
http://arcgraph.de/sr/
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] SPI transport class and generic Domain Validation for fusion

2005-08-08 Thread Moore, Eric Dean
On Monday, August 08, 2005 3:47 PM, James Bottomley wrote:
 Eric,
 
 This attached patch should do DV on both physical devices and the
 underlying devices of fusion IM assemblies (providing you apply it on
 top of the prior underlying device exposure patch), which, I 
 believe was
 your only outstanding concern with the last generic DV patch.
 
 There is one slight unsightly piece: the IM device still 
 attaches to the
 transport class however all the parameters it shows actually belong to
 the underlying device at that id ... I could do with finding a way of
 persuading the SPI transport class not to attach to RAID devices.
 
 Since this addresses all of LSI's prior concerns, may I now apply it?
 

Roy/Laura/Roy/Steve, and others - This is a patch to remove our 
internal dv(domain validation) code and replace with generic 
dv implemention in the linux kernel.  This is a year old push from 
kernel.org , for the drivers in upstream kernel.

James:  I've not been able to review this patch, nor the other one you sent.
On concern is whether spi transport handle asyn events?  Meaning will it do
domain
validation on RAID1 volume - for a new drive that was hot swapped with a 
good disk?  In the driver look at mptscsih_event_process - this code is 
handling a aync event from the firmware telling the driver to perform 
dv on disk that was just added.  

Pls don't rush this into the kernel untill we have time to review, and/or 
talk to our internal test teams on test effort, and/or talk to our customers
explaning the risk in removing this code.

Eric Moore

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 2.6.13-rc5-mm1 doesnt boot on x86_64

2005-08-08 Thread James Bottomley
On Mon, 2005-08-08 at 10:42 -0700, Andrew Morton wrote:
 -mm has extra list_head debugging goodies.  I'd be suspecting a list_head
 corruption detected somewhere under spi_release_transport().

Aha, looking in wrong driver ... the problem actually appears to be a
double release of the transport template in aic79xx.  Try this patch

James

diff --git a/drivers/scsi/aic7xxx/aic79xx_osm.c 
b/drivers/scsi/aic7xxx/aic79xx_osm.c
--- a/drivers/scsi/aic7xxx/aic79xx_osm.c
+++ b/drivers/scsi/aic7xxx/aic79xx_osm.c
@@ -2326,8 +2326,6 @@ done:
return (retval);
 }
 
-static void ahd_linux_exit(void);
-
 static void ahd_linux_set_width(struct scsi_target *starget, int width)
 {
struct Scsi_Host *shost = dev_to_shost(starget-dev.parent);
@@ -2772,7 +2770,7 @@ ahd_linux_init(void)
if (ahd_linux_detect(aic79xx_driver_template)  0)
return 0;
spi_release_transport(ahd_linux_transport_template);
-   ahd_linux_exit();
+
return -ENODEV;
 }
 
diff --git a/drivers/scsi/aic7xxx/aic7xxx_osm.c 
b/drivers/scsi/aic7xxx/aic7xxx_osm.c
--- a/drivers/scsi/aic7xxx/aic7xxx_osm.c
+++ b/drivers/scsi/aic7xxx/aic7xxx_osm.c
@@ -2331,8 +2331,6 @@ ahc_platform_dump_card_state(struct ahc_
 {
 }
 
-static void ahc_linux_exit(void);
-
 static void ahc_linux_set_width(struct scsi_target *starget, int width)
 {
struct Scsi_Host *shost = dev_to_shost(starget-dev.parent);


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Synchronizing scsi_remove_host and the error handler

2005-08-08 Thread Luben Tuikov
--- Alan Stern [EMAIL PROTECTED] wrote:
  I think that scanning is a special process which should involve
  the minimum of error handling, by either ignoring errors and trying
  to connect to the device anyway, or on the first error, give up
  the device.  Which policy would one follow depends on the transport.
 
 No, that's not feasible.  We can't just ignore errors, and we do have to 
 cope with them.  Scanning is a particular vulnerable time, since it 
 involves sending commands that don't occur most of the time during normal 
 operation.

Your reasoning is correct, but your conclusion is not.

Exactly, scanning is a particularly vulnerable time and this is why
we do not want the full I_T nexus error recovery.  At scanning time
we're just probing here and there.
 
  If the latter, you need to blacklist the device as not supporting
  TUR.  Then on any error, like you pulling the cable during scanning,
  the scanning process will give up and all will be well.
 
 You can't blacklist devices you don't know about.  The kernel should work 
 regardless.

Hmm, I thought there was a black listing somewhere in scsi, maybe
/proc/scsi/device_info?

So then you do scheme #1 as I described: ignore as much as possible and try
to establish an I_T nexus and _then_ try to poke around with the device.

  Luben

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 5003] New: Problem with symbios driver on recent -mm trees

2005-08-08 Thread Martin J. Bligh
 On Fri, 2005-08-05 at 07:36 -0700, Martin J. Bligh wrote:
 Howcome it works on all mainline kernels, and not -mm then? ;-)
 Did we fix an error path to detect failures, maybe?
 
 Well, OK, it might be something to do with your drives trying to
 negotiate IU and QAS.  Support for this was added to the sym2 driver but
 never verified (because no-one seemed to have drives that could do it).
 
 The attached should stop the driver from negotiating these two
 parameters, if you could try it (it will produce complaints about static
 functions defined but not used, but you can ignore them).

Nope, is the same as before with this patch 

M.
 
 James
 
 diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c 
 b/drivers/scsi/sym53c8xx_2/sym_glue.c
 --- a/drivers/scsi/sym53c8xx_2/sym_glue.c
 +++ b/drivers/scsi/sym53c8xx_2/sym_glue.c
 @@ -2122,10 +2122,12 @@ static struct spi_function_template sym2
   .show_width = 1,
   .set_dt = sym2_set_dt,
   .show_dt= 1,
 +#if 0
   .set_iu = sym2_set_iu,
   .show_iu= 1,
   .set_qas= sym2_set_qas,
   .show_qas   = 1,
 +#endif
   .get_signalling = sym2_get_signalling,
  };
  
 
 
 
 


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html