Re: [patch 0/3] clean gendisk out of scsi ULD structs
On Fri, 2007-07-06 at 23:25 -0400, Douglas Gilbert wrote: James Bottomley wrote: On Thu, 2007-07-05 at 14:06 -0700, Kristen Carlson Accardi wrote: Since gendisk will now become part of struct scsi_device, we don't need to store this value in any private data structs where they already store scsi_device. This series cleans up a few drivers which did this. Actually, as Al pointed out, we do have lifetime rules issues with doing this. The problem is that gendisk itself always has a shorter lifetime than scsi_device (not much shorter, usually, but if you execute a legal ULD unbind manoeuvre you'll end up with a dangling gendisk pointer). What about having short-lived scsi_device objects? For example: one that lives long enough for a pass-through to send a SCSI command (and receive its response) to one of a target's well known logical units. This is sort of what we already do for REPORT_LUNS (except that we use lun 0 instead of the REPORT LUN well known lun). What additions do you want to see? The other problem with taking gendisk out of the ULD structure and putting it into the scsi_device is that for the sg driver, we have two of them (one for the attached ULD and one for the sg driver). Add the bsg driver and that would make three of them. Or; if the lu's peripheral device type was not of interest to sd, st, sr, and osst; back to two gendisk objects (i.e. one each for sg and bsg). gendisk is actually used to facilitate the SCSI ULD infrastructure; bsg, being block, doesn't actually use it, so we'd still only have two. The fundamental issue seems to be that the gendisk is the holder of all the other info (queue, ULD etc) not vice versa ... and this patch is trying to reverse that relationship. A minor issue is the name gendisk ... unless, of course, you go and look at its definition in linux/genhd.h in which case the name looks somewhat appropriate. It looks like a mess [queue, ULD name, major/minor(s), partitions, capacity, disk_stats, kobjects, etc]. That is a considerable amount of superfluous information for just a tag for requests coming into (a) given queue when that queue leads to a non-block device. The benefits outweigh the costs ... in the same way that we have a block queue above all the character devices so we can treat SCSI queueing in a uniform fashion regardless of device. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote: Jeff Garzik wrote: Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Boaz Harrosh [EMAIL PROTECTED] Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 17:00:21 +0300 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I fixed it. Now it works with 127. I think that we can just remove blk_queue_max_phys_segments since the ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. Who should send a patch upstream? (I cc'ed Jeff Garzik) Boaz diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index dd81fa7..3660f3e 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) ata_scsi_sdev_config(sdev); - blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD); - sdev-manage_start_stop = 1; I don't mind the patch, but could someone refresh me as to the context? Is there something wrong with the above code, or is it simply redundant to the scsi_host_template settings in each LLDD? Jeff Hi Jeff What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128) than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But what happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg count. My first Patch was an if but Tomo said that it is redundant since drivers do that already. So I guess it is your call. Can it be removed or we need a: if ( sdev-request_queue-max_phys_segments LIBATA_MAX_PRD) Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to set because what a driver wants to see is what comes out the other side of dma_map_sg() (which is controlled by hw_segments) not what goes in. However, I could see libata having an interest in the phys_segments if the physical region descriptors are going to the device via PIO ... is that what this is for? LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements permitted by the HBA's DMA engine, for a single ATA command. The PIO code doesn't really care about segments. Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote: James Bottomley wrote: On Sun, 2007-06-03 at 10:45 +0300, Boaz Harrosh wrote: Jeff Garzik wrote: Boaz Harrosh wrote: FUJITA Tomonori wrote: From: Boaz Harrosh [EMAIL PROTECTED] Subject: Re: [PATCH v2] add bidi support for block pc requests Date: Thu, 17 May 2007 17:00:21 +0300 Yes Tomo found it at ata_scsi_slave_config(). Attached below the way I fixed it. Now it works with 127. I think that we can just remove blk_queue_max_phys_segments since the ata drivers seem to set sg_tablesize to LIBATA_MAX_PRD. Who should send a patch upstream? (I cc'ed Jeff Garzik) Boaz diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c index dd81fa7..3660f3e 100644 --- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -800,8 +800,6 @@ int ata_scsi_slave_config(struct scsi_device *sdev) ata_scsi_sdev_config(sdev); -blk_queue_max_phys_segments(sdev-request_queue, LIBATA_MAX_PRD); - sdev-manage_start_stop = 1; I don't mind the patch, but could someone refresh me as to the context? Is there something wrong with the above code, or is it simply redundant to the scsi_host_template settings in each LLDD? Jeff Hi Jeff What happens is that if SCSI-ml sets an higher value than LIBATA_MAX_PRD (=128) than every thing is OK and libata-scsi will only see its LIBATA_MAX_PRD. But what happens if SCSI-ml sets a lower value? It will than crash on unexpected high sg count. My first Patch was an if but Tomo said that it is redundant since drivers do that already. So I guess it is your call. Can it be removed or we need a: if ( sdev-request_queue-max_phys_segments LIBATA_MAX_PRD) Ordinarily, in SCSI, phys_segments is the wrong thing for a driver to set because what a driver wants to see is what comes out the other side of dma_map_sg() (which is controlled by hw_segments) not what goes in. However, I could see libata having an interest in the phys_segments if the physical region descriptors are going to the device via PIO ... is that what this is for? LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements permitted by the HBA's DMA engine, for a single ATA command. Then it's the wrong parameter you're setting: phys_segments is what you have going into a dma_map_sg() but hw_segments is what you end up after it (mathemtaically, the mapping never splits segments, so hw_segments = phys_segments always, but it's still the wrong way to bound this); so if you want to limit the SG elements in the HBA, you should set hw_segments not phys_segments (which is what the sg_tablesize parameter of the host structure corresponds to). However, I suspect this has to do with our iommu problem, doesn't it? The IOMMU may merge the segments beyond your 64k DMA boundary, so you need to split them again ... in that case you need phys_segments bounded, not hw_segments? I really wish we could get this iommu parameter problem fixed so we didn't have to have all of this confusing code. The PIO code doesn't really care about segments. OK ... I just thought it was PIO because the only time you should worry about phys_segments is if you're not going to do a dma_map_sg() on the scatterlist. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] add bidi support for block pc requests
James Bottomley wrote: On Sat, 2007-07-07 at 11:27 -0400, Jeff Garzik wrote: LIBATA_MAX_PRD is the maximum number of DMA scatter/gather elements permitted by the HBA's DMA engine, for a single ATA command. Then it's the wrong parameter you're setting: phys_segments is what you have going into a dma_map_sg() but hw_segments is what you end up after it (mathemtaically, the mapping never splits segments, so hw_segments = phys_segments always, but it's still the wrong way to bound this); so if you want to limit the SG elements in the HBA, you should set hw_segments not phys_segments (which is what the sg_tablesize parameter of the host structure corresponds to). Honestly I'm betting the code is a result of paranoia^H^H^Hconservatism on my part: setting every DMA and segment limit I can find, in the hopes that somehow, somewhere, that information will get filtered down to the IOMMU and whatnot. :) Given what you say above, and the fact that I set sg_tablesize, presumably Boaz's patch to remove phys_segment limiting in libata-scsi is the right thing to do. Your knowledge in this area is most likely stronger than mine. However, I suspect this has to do with our iommu problem, doesn't it? The IOMMU may merge the segments beyond your 64k DMA boundary, so you need to split them again ... in that case you need phys_segments bounded, not hw_segments? This is why I set every segment limitation I could find :) And then Ben tells me IOMMU may ignore all of that anyway, and so I have code to split inside libata as well :) I just know what IDE needs: S/G ent shouldn't cross 64k boundary, nor exceed 64k in size. The maximum number of S/G ents is actually a guess. Hard data is tough to come by. Some DMA engines will cycle through all of memory until they hit a stop bit. Others have limits yet to be discovered. :) Jeff - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Problem: NULL pointer dereference when disconnecting SAS expander-expander link
Hello, While testing some new SAS hardware, I have encountered an issue that results in an Unable to handle kernel NULL pointer dereference message from the kernel. The stack trace taken from syslog output is attached. The problem occurs when connecting then disconnecting an external cable between two JBOD disk boxes. The problem does not seem to occur when connecting and disconnecting a single disk box directly to the HBA. To reproduce: 1. Boot with the hardware connected as pictured below. All 32 external disks are found and no problems are noticed. 2. Disconnect cable B. The 16 disks and enclosure target from disk box 2 are removed with no errors noticed. There are some 'failed to synchronize cache' messages if the disks are not removed through /sys first but the the error will occur either way. 3. Reconnect cable B. No indications that anything has happened from the OS. I have tried waiting for over 2 minutes after connecting the cable. 4. Disconnect cable B again and the attached messages are logged. A hard reset is then required to recover. +---Host w/LSI3801E HBA+ | LSI1068E| +--+ Cable A +---Disk box 1-+ | | | LSISASx12A | | ||\`== LSISASx12A 8 HDDs | | || | | \` LSISASx12A 8 HDDs | +--+ Cable B +---Disk box 2-+ | | | LSISASx12A | | ||\`== LSISASx12A 8 HDDs | | || | | \` LSISASx12A 8 HDDs | +--+ For the attached error, the disk boxes were full of SATA disks and the system was running the Debian backports.org 2.6.21-1-amd64 (2.6.21-4~bpo.1) kernel. The problem also seems to exist with the Debian etch 2.6.18-4-amd64 kernel. Happy to try any kernel versions and configs that would be useful. The diagram represents my current understanding of the expander setup in the disk boxes but I could be mistaken. I can provide further details of the view of the hardware from the host if it is of interest. The server also has an on-board LSI1064 connected to 4 internal SAS HDDs: $ cat /proc/mpt/summary ioc0: LSISAS1068E, FwRev=0112h, Ports=1, MaxQ=511, IRQ=19 ioc1: LSISAS1064, FwRev=01102800h, Ports=1, MaxQ=511, IRQ=58 I will continue to investigate and will report any findings but any help in resolving the issue would be greatly appreciated. -- Alex Winawer, Unix Systems Programmer Systems Development Support, Oxford University Computing Services Jul 6 09:46:05 just-read-the-instructions kernel: mptbase: ioc0: LogInfo(0x3005): Originator={IOP}, Code={Task Terminated}, SubCode(0x) Jul 6 09:48:09 just-read-the-instructions kernel: Unable to handle kernel NULL pointer dereference at 02c0 RIP: Jul 6 09:48:09 just-read-the-instructions kernel: [8025a762] mutex_lock+0x0/0xb Jul 6 09:48:09 just-read-the-instructions kernel: PGD 10ce07067 PUD 10ea3e067 PMD 0 Jul 6 09:48:09 just-read-the-instructions kernel: Oops: 0002 [1] SMP Jul 6 09:48:09 just-read-the-instructions kernel: CPU 0 Jul 6 09:48:09 just-read-the-instructions kernel: Modules linked in: raid456 xor ipv6 iptable_mangle iptable_nat nf_nat xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink ipt_owner ipt_REJECT xt_limit ipt_LOG xt_hashlimit ip6_tables ipt_addrtype iptable_filter ip_tables x_tables 8021q serio_raw psmouse i2c_nforce2 shpchp i2c_core pci_hotplug pcspkr k8temp sg sr_mod cdrom joydev evdev ext3 jbd mbcache dm_mirror dm_snapshot dm_mod raid1 md_mod ide_generic ata_generic sata_nv libata sd_mod generic usb_storage usbhid hid mptsas mptscsih mptbase scsi_transport_sas amd74xx e1000 scsi_mod forcedeth ide_core ohci_hcd ehci_hcd thermal processor fan Jul 6 09:48:09 just-read-the-instructions kernel: Pid: 14, comm: events/0 Not tainted 2.6.21-1-amd64 #1 Jul 6 09:48:09 just-read-the-instructions kernel: RIP: 0010:[8025a762] [8025a762] mutex_lock+0x0/0xb Jul 6 09:48:09 just-read-the-instructions kernel: RSP: 0018:810120201c88 EFLAGS: 00010246 Jul 6 09:48:09 just-read-the-instructions kernel: RAX: RBX: 81011b1f3000 RCX: Jul 6 09:48:09 just-read-the-instructions kernel: RDX: 81011a9784c0 RSI: 81011b1f3000 RDI: 02c0 Jul 6 09:48:09 just-read-the-instructions kernel: RBP: 0004 R08: 000c R09: 81011b3392a0 Jul 6 09:48:09 just-read-the-instructions kernel: R10: fff4 R11: 810120201ca8 R12: Jul 6 09:48:09 just-read-the-instructions kernel: R13: 02c0 R14: R15: 05b0 Jul 6 09:48:09 just-read-the-instructions kernel: FS: 2b86508c56d0() GS:804d9000()