Re: [patch 0/3] clean gendisk out of scsi ULD structs

2007-07-07 Thread James Bottomley
On Fri, 2007-07-06 at 23:25 -0400, Douglas Gilbert wrote:
 James Bottomley wrote:
  On Thu, 2007-07-05 at 14:06 -0700, Kristen Carlson Accardi wrote:
  Since gendisk will now become part of struct scsi_device, we don't need
  to store this value in any private data structs where they already store
  scsi_device.  This series cleans up a few drivers which did this.
  
  Actually, as Al pointed out, we do have lifetime rules issues with doing
  this.  The problem is that gendisk itself always has a shorter lifetime
  than scsi_device (not much shorter, usually, but if you execute a legal
  ULD unbind manoeuvre you'll end up with a dangling gendisk pointer).
 
 What about having short-lived scsi_device objects? For example:
 one that lives long enough for a pass-through to send a
 SCSI command (and receive its response) to one of a target's
 well known logical units.

This is sort of what we already do for REPORT_LUNS (except that we use
lun 0 instead of the REPORT LUN well known lun).  What additions do you
want to see?

  The other problem with taking gendisk out of the ULD structure and
  putting it into the scsi_device is that for the sg driver, we have two
  of them (one for the attached ULD and one for the sg driver).
 
 Add the bsg driver and that would make three of them. Or; if
 the lu's peripheral device type was not of interest to sd, st,
 sr, and osst; back to two gendisk objects (i.e. one each
 for sg and bsg).

gendisk is actually used to facilitate the SCSI ULD infrastructure; bsg,
being block, doesn't actually use it, so we'd still only have two.

  The fundamental issue seems to be that the gendisk is the holder of all
  the other info (queue, ULD etc) not vice versa ... and this patch is
  trying to reverse that relationship.
 
 A minor issue is the name gendisk ... unless, of course,
 you go and look at its definition in linux/genhd.h in
 which case the name looks somewhat appropriate. It looks
 like a mess [queue, ULD name, major/minor(s), partitions,
 capacity, disk_stats, kobjects, etc]. That is a considerable
 amount of superfluous information for just a tag for
 requests coming into (a) given queue when that queue leads
 to a non-block device.

The benefits outweigh the costs ... in the same way that we have a block
queue above all the character devices so we can treat SCSI queueing in a
uniform fashion regardless of device.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-07-07 Thread Jeff Garzik

James Bottomley wrote:

On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote:

Jeff Garzik wrote:

Boaz Harrosh wrote:

FUJITA Tomonori wrote:

From: Boaz Harrosh [EMAIL PROTECTED]
Subject: Re: [PATCH v2] add bidi support for block pc requests
Date: Thu, 17 May 2007 17:00:21 +0300


Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
fixed it. Now it works with 127.

I think that we can just remove blk_queue_max_phys_segments since the
ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.


Who should send a patch upstream? (I cc'ed Jeff Garzik)

Boaz

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index dd81fa7..3660f3e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)

ata_scsi_sdev_config(sdev);

-   blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD);
-
sdev-manage_start_stop = 1;

I don't mind the patch, but could someone refresh me as to the context?

Is there something wrong with the above code, or is it simply redundant 
to the scsi_host_template settings in each LLDD?


Jeff




Hi Jeff
What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128)
than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But 
what
happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg
count. My first Patch was an if  but Tomo said that it is redundant since
drivers do that already. So I guess it is your call. Can it be removed or we 
need
a: if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD)


Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to
set because what a driver wants to see is what comes out the other side
of dma_map_sg() (which is controlled by hw_segments) not what goes in.
However, I could see libata having an interest in the phys_segments if
the physical region descriptors are going to the device via PIO ... is
that what this is for?


LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements 
permitted by the HBA's DMA engine, for a single ATA command.


The PIO code doesn't really care about segments.

Jeff



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-07-07 Thread James Bottomley
On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote:
 James Bottomley wrote:
  On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote:
  Jeff Garzik wrote:
  Boaz Harrosh wrote:
  FUJITA Tomonori wrote:
  From: Boaz Harrosh [EMAIL PROTECTED]
  Subject: Re: [PATCH v2] add bidi support for block pc requests
  Date: Thu, 17 May 2007 17:00:21 +0300
 
  Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I
  fixed it. Now it works with 127.
  I think that we can just remove blk_queue_max_phys_segments since the
  ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD.
 
  Who should send a patch upstream? (I cc'ed Jeff Garzik)
 
  Boaz
 
  diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
  index dd81fa7..3660f3e 100644
  --- a/drivers/ata/libata-scsi.c
  +++ b/drivers/ata/libata-scsi.c
  @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev)
 
   ata_scsi_sdev_config(sdev);
 
  -blk_queue_max_phys_segments(sdev-request_queue, 
  LIBATA_MAX_PRD);
  -
   sdev-manage_start_stop = 1;
  I don't mind the patch, but could someone refresh me as to the context?
 
  Is there something wrong with the above code, or is it simply redundant 
  to the scsi_host_template settings in each LLDD?
 
Jeff
 
 
 
  Hi Jeff
  What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD 
  (=128)
  than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. 
  But what
  happens if SCSI-ml sets a lower value? It will than crash on unexpected 
  high sg
  count. My first Patch was an if  but Tomo said that it is redundant 
  since
  drivers do that already. So I guess it is your call. Can it be removed or 
  we need
  a: if ( sdev-request_queue-max_phys_segments  LIBATA_MAX_PRD)
  
  Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to
  set because what a driver wants to see is what comes out the other side
  of dma_map_sg() (which is controlled by hw_segments) not what goes in.
  However, I could see libata having an interest in the phys_segments if
  the physical region descriptors are going to the device via PIO ... is
  that what this is for?
 
 LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements 
 permitted by the HBA's DMA engine, for a single ATA command.

Then it's the wrong parameter you're setting:  phys_segments is what you
have going into a dma_map_sg() but hw_segments is what you end up after
it (mathemtaically, the mapping never splits segments, so hw_segments =
phys_segments always, but it's still the wrong way to bound this); so if
you want to limit the SG elements in the HBA, you should set hw_segments
not phys_segments (which is what the sg_tablesize parameter of the host
structure corresponds to).

However, I suspect this has to do with our iommu problem, doesn't it?
The IOMMU may merge the segments beyond your 64k DMA boundary, so you
need to split them again ... in that case you need phys_segments
bounded, not hw_segments?  I really wish we could get this iommu
parameter problem fixed so we didn't have to have all of this confusing
code.

 The PIO code doesn't really care about segments.

OK ... I just thought it was PIO because the only time you should worry
about phys_segments is if you're not going to do a dma_map_sg() on the
scatterlist.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] add bidi support for block pc requests

2007-07-07 Thread Jeff Garzik

James Bottomley wrote:

On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote:
LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements 
permitted by the HBA's DMA engine, for a single ATA command.



Then it's the wrong parameter you're setting:  phys_segments is what you
have going into a dma_map_sg() but hw_segments is what you end up after
it (mathemtaically, the mapping never splits segments, so hw_segments =
phys_segments always, but it's still the wrong way to bound this); so if
you want to limit the SG elements in the HBA, you should set hw_segments
not phys_segments (which is what the sg_tablesize parameter of the host
structure corresponds to).


Honestly I'm betting the code is a result of paranoia^H^H^Hconservatism 
on my part:  setting every DMA and segment limit I can find, in the 
hopes that somehow, somewhere, that information will get filtered down 
to the IOMMU and whatnot.  :)


Given what you say above, and the fact that I set sg_tablesize, 
presumably Boaz's patch to remove phys_segment limiting in libata-scsi 
is the right thing to do.  Your knowledge in this area is most likely 
stronger than mine.




However, I suspect this has to do with our iommu problem, doesn't it?
The IOMMU may merge the segments beyond your 64k DMA boundary, so you
need to split them again ... in that case you need phys_segments
bounded, not hw_segments?


This is why I set every segment limitation I could find :)  And then Ben 
tells me IOMMU may ignore all of that anyway, and so I have code to 
split inside libata as well :)


I just know what IDE needs:  S/G ent shouldn't cross 64k boundary, nor 
exceed 64k in size.  The maximum number of S/G ents is actually a guess. 
 Hard data is tough to come by.  Some DMA engines will cycle through 
all of memory until they hit a stop bit.  Others have limits yet to be 
discovered.  :)


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Problem: NULL pointer dereference when disconnecting SAS expander-expander link

2007-07-07 Thread Alex Winawer
Hello,

While testing some new SAS hardware, I have encountered an issue that results 
in an Unable to handle kernel NULL pointer dereference message from the 
kernel. The stack trace taken from syslog output is attached.

The problem occurs when connecting then disconnecting an external cable 
between two JBOD disk boxes. The problem does not seem to occur when 
connecting and disconnecting a single disk box directly to the HBA.

To reproduce:
1. Boot with the hardware connected as pictured below. All 32 external disks
   are found and no problems are noticed.
2. Disconnect cable B. The 16 disks and enclosure target from disk box 2 are
   removed with no errors noticed. There are some 'failed to synchronize
   cache' messages if the disks are not removed through /sys first but the
   the error will occur either way.
3. Reconnect cable B. No indications that anything has happened from the OS. I 
   have tried waiting for over 2 minutes after connecting the cable.
4. Disconnect cable B again and the attached messages are logged. A hard reset
   is then required to recover.

+---Host w/LSI3801E HBA+
|  LSI1068E|
+--+
    Cable A
+---Disk box 1-+
|  |
| LSISASx12A   |
|   ||\`== LSISASx12A  8 HDDs |
|   || |
|   \` LSISASx12A  8 HDDs |
+--+
    Cable B
+---Disk box 2-+
|  |
| LSISASx12A   |
|   ||\`== LSISASx12A  8 HDDs |
|   || |
|   \` LSISASx12A  8 HDDs |
+--+


For the attached error, the disk boxes were full of SATA disks and the system 
was running the Debian backports.org 2.6.21-1-amd64 (2.6.21-4~bpo.1) kernel. 
The problem also seems to exist with the Debian etch 2.6.18-4-amd64 kernel. 
Happy to try any kernel versions and configs that would be useful.

The diagram represents my current understanding of the expander setup in the 
disk boxes but I could be mistaken. I can provide further details of the view 
of the hardware from the host if it is of interest.

The server also has an on-board LSI1064 connected to 4 internal SAS HDDs:
$ cat /proc/mpt/summary
ioc0: LSISAS1068E, FwRev=0112h, Ports=1, MaxQ=511, IRQ=19
ioc1: LSISAS1064, FwRev=01102800h, Ports=1, MaxQ=511, IRQ=58

I will continue to investigate and will report any findings but any help in 
resolving the issue would be greatly appreciated.

-- 
Alex Winawer, Unix Systems Programmer
Systems Development  Support, Oxford University Computing Services
Jul  6 09:46:05 just-read-the-instructions kernel: mptbase: ioc0: 
LogInfo(0x3005): Originator={IOP}, Code={Task Terminated}, SubCode(0x)
Jul  6 09:48:09 just-read-the-instructions kernel: Unable to handle kernel NULL 
pointer dereference at 02c0 RIP: 
Jul  6 09:48:09 just-read-the-instructions kernel:  [8025a762] 
mutex_lock+0x0/0xb
Jul  6 09:48:09 just-read-the-instructions kernel: PGD 10ce07067 PUD 10ea3e067 
PMD 0 
Jul  6 09:48:09 just-read-the-instructions kernel: Oops: 0002 [1] SMP 
Jul  6 09:48:09 just-read-the-instructions kernel: CPU 0 
Jul  6 09:48:09 just-read-the-instructions kernel: Modules linked in: raid456 
xor ipv6 iptable_mangle iptable_nat nf_nat xt_tcpudp nf_conntrack_ipv4 xt_state 
nf_conntrack nfnetlink ipt_owner ipt_REJECT xt_limit ipt_LOG xt_hashlimit 
ip6_tables ipt_addrtype iptable_filter ip_tables x_tables 8021q serio_raw 
psmouse i2c_nforce2 shpchp i2c_core pci_hotplug pcspkr k8temp sg sr_mod cdrom 
joydev evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod 
ide_generic ata_generic sata_nv libata sd_mod generic usb_storage usbhid hid 
mptsas mptscsih mptbase scsi_transport_sas amd74xx e1000 scsi_mod forcedeth 
ide_core ohci_hcd ehci_hcd thermal processor fan
Jul  6 09:48:09 just-read-the-instructions kernel: Pid: 14, comm: events/0 Not 
tainted 2.6.21-1-amd64 #1
Jul  6 09:48:09 just-read-the-instructions kernel: RIP: 
0010:[8025a762]  [8025a762] mutex_lock+0x0/0xb
Jul  6 09:48:09 just-read-the-instructions kernel: RSP: 0018:810120201c88  
EFLAGS: 00010246
Jul  6 09:48:09 just-read-the-instructions kernel: RAX:  RBX: 
81011b1f3000 RCX: 
Jul  6 09:48:09 just-read-the-instructions kernel: RDX: 81011a9784c0 RSI: 
81011b1f3000 RDI: 02c0
Jul  6 09:48:09 just-read-the-instructions kernel: RBP: 0004 R08: 
000c R09: 81011b3392a0
Jul  6 09:48:09 just-read-the-instructions kernel: R10: fff4 R11: 
810120201ca8 R12: 
Jul  6 09:48:09 just-read-the-instructions kernel: R13: 02c0 R14: 
 R15: 05b0
Jul  6 09:48:09 just-read-the-instructions kernel: FS:  2b86508c56d0() 
GS:804d9000()