Re: MPT Fusion initialization in 2.6.23.12
On Tue, Jan 15, 2008 at 08:53:05AM -0800, Grant Grundler wrote: On Jan 15, 2008 8:37 AM, Karen Shaeffer [EMAIL PROTECTED] wrote: This is a Sun X4200 M2 Netra server. The LSI SAS1064 RAID enabled Ultra320 SCSI controller supports RAID0, RAID1, and RAID1E modes. The implementation uses RAID1, so this chip is RAID-enabled. http://www.sun.com/servers/netra/x4200/specs.xml says you only have SAS and NOT u320 SCSI (unless an add-on u320 SCSI card is installed). LSI u320 SCSI cards can support RAID0/1 but HP-UX didn't ship with RAID enabled because of Firmware bugs (which may have been fixed in the mean time). If you post lspci -vt and/or lspci output, it would be clear what's in the box. Hi, ~$ lspci -vt -+-[80]-+-00.0 nVidia Corporation: Unknown device 005e | +-01.0 nVidia Corporation: Unknown device 00d3 | +-0a.0 nVidia Corporation: Unknown device 0057 | +-0b.0-[81]-- | +-0c.0-[82]-- | +-0d.0-[83-85]--+-00.0-[85]-- | | \-00.2-[84]-- | +-0e.0-[86]-- | +-10.0-[87]-- | +-10.1 Advanced Micro Devices [AMD]: Unknown device 7459 | +-11.0-[8e]--+-01.0 Intel Corp. 82546EB Gigabit Ethernet Controller (Copper) | |+-01.1 Intel Corp. 82546EB Gigabit Ethernet Controller (Copper) | |\-02.0 LSI Logic / Symbios Logic: Unknown device 0050 | \-11.1 Advanced Micro Devices [AMD]: Unknown device 7459 \-[00]-+-00.0 nVidia Corporation: Unknown device 005e +-01.0 nVidia Corporation: Unknown device 0051 +-01.1 nVidia Corporation: Unknown device 0052 +-02.0 nVidia Corporation: Unknown device 005a +-02.1 nVidia Corporation: Unknown device 005b +-06.0 nVidia Corporation: Unknown device 0053 +-09.0-[01]03.0 ATI Technologies Inc Rage XL +-0a.0 nVidia Corporation: Unknown device 0057 +-0b.0-[02]-- +-0c.0-[03]-- +-0d.0-[04-06]--+-00.0-[06]-- | \-00.2-[05]-- +-0e.0-[07]-- +-18.0 Advanced Micro Devices [AMD] K8 NorthBridge +-18.1 Advanced Micro Devices [AMD] K8 NorthBridge +-18.2 Advanced Micro Devices [AMD] K8 NorthBridge +-18.3 Advanced Micro Devices [AMD] K8 NorthBridge +-19.0 Advanced Micro Devices [AMD] K8 NorthBridge +-19.1 Advanced Micro Devices [AMD] K8 NorthBridge +-19.2 Advanced Micro Devices [AMD] K8 NorthBridge \-19.3 Advanced Micro Devices [AMD] K8 NorthBridge 8e:02.0 SCSI storage controller: LSI Logic / Symbios Logic: Unknown device 0050 (rev 02) Subsystem: LSI Logic / Symbios Logic: Unknown device 3060 Flags: bus master, 66Mhz, medium devsel, latency 72, IRQ 18 I/O ports at e400 [disabled] [size=256] Memory at fe9bc000 (64-bit, non-prefetchable) [size=16K] Memory at fe9a (64-bit, non-prefetchable) [size=64K] Expansion ROM at fe60 [disabled] [size=2M] Capabilities: [50] Power Management version 2 Capabilities: [98] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable- Capabilities: [68] PCI-X non-bridge device. Capabilities: [b0] #11 [] This is, of course, on the system board. I will be following up on this issue soon. I was swamped with other issues today. Thanks, Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 [EMAIL PROTECTED] http://www.neuralscape.com - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] use dynamically allocated sense buffer
On Jan. 15, 2008, 17:20 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:23 +0900, FUJITA Tomonori wrote: This is the second version of http://marc.info/?l=linux-scsim=119933628210006w=2 I gave up once, but I found that the performance loss is negligible (within 1%) by using kmem_cache_alloc instead of mempool. I use scsi_debug with fake_rw=1 and disktest (DIO reads with 8 threads) again: scsi-misc (slub) | 486.9 MB/s IOPS 124652.9/s dynamic sense buf (slub) | 483.2 MB/s IOPS 123704.1/s scsi-misc (slab) | 467.0 MB/s IOPS 119544.3/s dynamic sense buf (slab) | 468.7 MB/s IOPS 119986.0/s The results are the averages of three runs with a server using two dual-core 1.60 GHz Xeon processors with DDR2 memory. I doubt think that someone will complain about the performance regression due to this patch. In addition, unlike scsi_debug, the real LLDs allocate the own data structure per scsi_cmnd so the performance differences would be smaller (and with the real hard disk overheads). Here's the full results: http://www.kernel.org/pub/linux/kernel/people/tomo/sense/results.txt Heh, that's one of those good news, bad news things. Certainly good news for you. The bad news for the rest of us is that you just implicated mempool in a performance problem and since they're the core of the SCSI scatterlist allocations and sit at the heart of the critical path in SCSI, we have a potential performance issue in the whole of SCSI. Looking at mempool's code this is peculiar as what seems to be its critical path for alloc and free looks pretty harmless and lightweight. Maybe an extra memory barrier, spin_{,un}lock_* and two extra function call (one of them can be eliminated BTW if the order of arguments to the mempool_{alloc,free}_t functions were the same as for kmem_cache_{alloc,free}). Benny James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? James I have an old branch here where I've swiped through the scsi drivers just to remove the SG_ALL limit. Unfortunately some drivers mean laterally 255 when using SG_ALL. So I passed driver by driver and carfully inspected the code to change it to something driver specific if they really meant 255. I have used sg_tablesize = ~0; to indicate, I don't care any will do, and some driver constant if there is a real limit. Though removing SG_ALL at the end. Should I freshen up this branch and send it. By all means; however, I think having the defined constant SG_ALL is useful (even if it is eventually just set to ~0) it means I can support any scatterlist size. Having the drivers set sg_tablesize correctly that can't support SG_ALL is pretty vital. Thanks, James OK will do. I have found the old branch and am looking. I agree with you about the SG_ALL. I will fix it to have a patch per changed driver, with out changing SG_ALL, and then final patch to just change SG_ALL. Boaz James hi reinspecting the code, what should I do with drivers that do not support chaining do to SW that still do sglist++? should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put a FIXME: in the submit message? or should we fix them first and serialize this effort on top of those fixes. (also in light of the other email where you removed the chaining flag) Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: INITIO scsi driver fails to work properly
On Wed, 2008-01-16 at 14:59 +0900, FUJITA Tomonori wrote: On Tue, 15 Jan 2008 09:16:06 -0600 James Bottomley [EMAIL PROTECTED] wrote: On Sun, 2008-01-13 at 14:28 +0200, Filippos Papadopoulos wrote: On 1/11/08, James Bottomley [EMAIL PROTECTED] wrote: On Fri, 2008-01-11 at 18:44 +0200, Filippos Papadopoulos wrote: On Jan 11, 2008 5:44 PM, James Bottomley [EMAIL PROTECTED] wrote: I havent reported initio: I/O port range 0x0 is busy. Sorry ... we appear to have several reporters of different bugs in this thread. That message was copied by Chuck Ebbert from a Red Hat bugzilla ... I was assuming it was the same problem. I applied the patch on 2.6.24-rc6-git9 but unfortunatelly same thing happens. First off, has this driver ever worked for you in 2.6? Just booting SLES9 (2.6.5) or RHEL4 (2.6.9) ... or one of their open equivalents to check a really old kernel would be helpful. If you can get it to work, then we can proceed with a patch reversion regime based on the assumption that the problem is a recent commit. Yes it works under 2.6.16.13. See the beginning of this thread, i mention there some things about newer versions. Thanks, actually, I see this: I tried to install OpenSUSE 10.3 (kernel 2.6.22.5) and the latest OpenSUSE 11.0 Alpha 0 (kernel 2.6.24-rc4) but although the initio drivergets loaded during the installation process, yast reports that no hard disk is found. Could you try with a vanilla 2.6.22 kernel? The reason for all of this is that 2.6.22 predates Alan's conversion of this driver (which was my 95% candidate for the source of the bug). I want you to try the vanilla kernel just in case the opensuse one contains a backport. Yes you are right. I compiled the vanilla 2.6.22 and initio driver works. Tell me if you want to apply any patch to it. That's good news ... at least we know where the issue lies; now the problem comes: there are two candidate patches for this issue: Alan's driver update patch and Tomo's accessors patch. Unfortunately, due to merge conflicts the two are pretty hopelessly intertwined. I think I already spotted one bug in the accessor conversion, so I'll look at that again. Alan's also going to acquire an inito board and retest his conversions. I'm afraid it might be a while before we have anything for you to test. Can you try this patch? Thanks, diff --git a/drivers/scsi/initio.c b/drivers/scsi/initio.c index 01bf018..6891d2b 100644 --- a/drivers/scsi/initio.c +++ b/drivers/scsi/initio.c @@ -2609,6 +2609,7 @@ static void initio_build_scb(struct initio_host * host, struct scsi_ctrl_blk * c cblk-bufptr = cpu_to_le32((u32)dma_addr); cmnd-SCp.dma_handle = dma_addr; + cblk-sglen = nseg; cblk-flags |= SCF_SG; /* Turn on SG list flag */ total_len = 0; We already tried a variant of this here: http://marc.info/?l=linux-scsim=120002863806103w=2 The answer was negative. Although I've saved the patch because it's clearly one of the bugs. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Tue, Jan 15 2008, James Bottomley wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? That function is meant for low level drivers to set their hw limits. So ideally it should just set -max_hw_sectors to what the driver asks for. As Jeff mentions, a long time ago we experimentally decided that going above 512k typically didn't yield any benefit, so Linux should not generate commands larger than that for normal fs io. That is what BLK_DEF_MAX_SECTORS does. IOW, the driver calls blk_queue_max_sectors() with its real limit - 64mb for instance. Linux then sets that as the hw limit, and puts a reasonable limit on the generated size based on a throughput/latency/memory concern. I think that is quite reasonable, and there's nothing preventing users from setting a larger size using sysfs by echoing something into queue/max_sectors_kb. You can set 512kb there easily, as long as the max_hw_sectors_kb is honored. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? James I have an old branch here where I've swiped through the scsi drivers just to remove the SG_ALL limit. Unfortunately some drivers mean laterally 255 when using SG_ALL. So I passed driver by driver and carfully inspected the code to change it to something driver specific if they really meant 255. I have used sg_tablesize = ~0; to indicate, I don't care any will do, and some driver constant if there is a real limit. Though removing SG_ALL at the end. Should I freshen up this branch and send it. By all means; however, I think having the defined constant SG_ALL is useful (even if it is eventually just set to ~0) it means I can support any scatterlist size. Having the drivers set sg_tablesize correctly that can't support SG_ALL is pretty vital. Thanks, James OK will do. I have found the old branch and am looking. I agree with you about the SG_ALL. I will fix it to have a patch per changed driver, with out changing SG_ALL, and then final patch to just change SG_ALL. Boaz James hi reinspecting the code, what should I do with drivers that do not support chaining do to SW that still do sglist++? should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put a FIXME: in the submit message? or should we fix them first and serialize this effort on top of those fixes. (also in light of the other email where you removed the chaining flag) How many of them are left? The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately [PATCH] remove use_sg_chaining moved into a shared header. Worst case, just use that and add a fixme comment giving the real value (if there is one). James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SCSI power management for AHCI
On Tue, 15 Jan 2008, Pavel Machek wrote: Hi! This is my first attempt at ahci autosuspend. It is _very_ hacky at this moment, I'll seriously need to clean it up. But it seems to work here. How does this interact with Link Power Management? Should there be a stronger connection between the two? Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Wed, 2008-01-16 at 16:06 +0100, Jens Axboe wrote: On Tue, Jan 15 2008, James Bottomley wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? That function is meant for low level drivers to set their hw limits. So ideally it should just set -max_hw_sectors to what the driver asks for. As Jeff mentions, a long time ago we experimentally decided that going above 512k typically didn't yield any benefit, so Linux should not generate commands larger than that for normal fs io. That is what BLK_DEF_MAX_SECTORS does. IOW, the driver calls blk_queue_max_sectors() with its real limit - 64mb for instance. Linux then sets that as the hw limit, and puts a reasonable limit on the generated size based on a throughput/latency/memory concern. I think that is quite reasonable, and there's nothing preventing users from setting a larger size using sysfs by echoing something into queue/max_sectors_kb. You can set 512kb there easily, as long as the max_hw_sectors_kb is honored. Yes, I can buy the argument for filesystem I/Os. What about tapes which currently use the block queue and have internal home grown stuff to handle larger transfers ... how are they supposed to set the larger default sector size? Just modify the bare q-max_sectors? James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Wed, Jan 16 2008, James Bottomley wrote: On Wed, 2008-01-16 at 16:06 +0100, Jens Axboe wrote: On Tue, Jan 15 2008, James Bottomley wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? That function is meant for low level drivers to set their hw limits. So ideally it should just set -max_hw_sectors to what the driver asks for. As Jeff mentions, a long time ago we experimentally decided that going above 512k typically didn't yield any benefit, so Linux should not generate commands larger than that for normal fs io. That is what BLK_DEF_MAX_SECTORS does. IOW, the driver calls blk_queue_max_sectors() with its real limit - 64mb for instance. Linux then sets that as the hw limit, and puts a reasonable limit on the generated size based on a throughput/latency/memory concern. I think that is quite reasonable, and there's nothing preventing users from setting a larger size using sysfs by echoing something into queue/max_sectors_kb. You can set 512kb there easily, as long as the max_hw_sectors_kb is honored. Yes, I can buy the argument for filesystem I/Os. What about tapes which currently use the block queue and have internal home grown stuff to handle larger transfers ... how are they supposed to set the larger default sector size? Just modify the bare q-max_sectors? Yep, either that or we add a function for setting that. -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? James I have an old branch here where I've swiped through the scsi drivers just to remove the SG_ALL limit. Unfortunately some drivers mean laterally 255 when using SG_ALL. So I passed driver by driver and carfully inspected the code to change it to something driver specific if they really meant 255. I have used sg_tablesize = ~0; to indicate, I don't care any will do, and some driver constant if there is a real limit. Though removing SG_ALL at the end. Should I freshen up this branch and send it. By all means; however, I think having the defined constant SG_ALL is useful (even if it is eventually just set to ~0) it means I can support any scatterlist size. Having the drivers set sg_tablesize correctly that can't support SG_ALL is pretty vital. Thanks, James OK will do. I have found the old branch and am looking. I agree with you about the SG_ALL. I will fix it to have a patch per changed driver, with out changing SG_ALL, and then final patch to just change SG_ALL. Boaz James hi reinspecting the code, what should I do with drivers that do not support chaining do to SW that still do sglist++? should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put a FIXME: in the submit message? or should we fix them first and serialize this effort on top of those fixes. (also in light of the other email where you removed the chaining flag) How many of them are left? The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately [PATCH] remove use_sg_chaining moved into a shared header. Worst case, just use that and add a fixme comment giving the real value (if there is one). James I have 9 up to now and 10 more drivers to check. All but one are SW, one by one SCp.buffer++, so once it's fixed they should be able to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS as you requested. I have not checked drivers that did not use SG_ALL but I trust these are usually smaller. Boaz - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Actually using the sg table/chain code
On Wed, Jan 16 2008 at 18:11 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? James I have an old branch here where I've swiped through the scsi drivers just to remove the SG_ALL limit. Unfortunately some drivers mean laterally 255 when using SG_ALL. So I passed driver by driver and carfully inspected the code to change it to something driver specific if they really meant 255. I have used sg_tablesize = ~0; to indicate, I don't care any will do, and some driver constant if there is a real limit. Though removing SG_ALL at the end. Should I freshen up this branch and send it. By all means; however, I think having the defined constant SG_ALL is useful (even if it is eventually just set to ~0) it means I can support any scatterlist size. Having the drivers set sg_tablesize correctly that can't support SG_ALL is pretty vital. Thanks, James OK will do. I have found the old branch and am looking. I agree with you about the SG_ALL. I will fix it to have a patch per changed driver, with out changing SG_ALL, and then final patch to just change SG_ALL. Boaz James hi reinspecting the code, what should I do with drivers that do not support chaining do to SW that still do sglist++? should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put a FIXME: in the submit message? or should we fix them first and serialize this effort on top of those fixes. (also in light of the other email where you removed the chaining flag) How many of them are left? The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately [PATCH] remove use_sg_chaining moved into a shared header. Worst case, just use that and add a fixme comment giving the real value (if there is one). James I have 9 up to now and 10 more drivers to check. All but one are SW, one by one SCp.buffer++, so once it's fixed they should be able to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS as you requested. I have not checked drivers that did not use SG_ALL but I trust these are usually smaller. Boaz James Hi. Looking at the patches I just realized that I made a mistake and did not work on top of your: [PATCH] remove use_sg_chaining . Now rebasing should be easy but I think my patch should go first because there are some 10-15 drivers that are not chained ready but will work perfectly after my patch that sets sg_tablesize to SCSI_MAX_SG_SEGMENTS should I rebase or should [PATCH] remove use_sg_chaining be rebased?
Re: Actually using the sg table/chain code
On Wed, 2008-01-16 at 18:37 +0200, Boaz Harrosh wrote: On Wed, Jan 16 2008 at 18:11 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Wed, Jan 16 2008 at 17:09 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Wed, 2008-01-16 at 16:01 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 19:35 +0200, Boaz Harrosh [EMAIL PROTECTED] wrote: On Tue, Jan 15 2008 at 18:49 +0200, James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2008-01-15 at 18:09 +0200, Boaz Harrosh wrote: On Tue, Jan 15 2008 at 17:52 +0200, James Bottomley [EMAIL PROTECTED] wrote: I thought, now we had this new shiny code to increase the scatterlist table size I'd try it out. It turns out there's a pretty vast block conspiracy that prevents us going over 128 entries in a scatterlist. The first problems are in SCSI: The host parameters sg_tablesize and max_sectors are used to set the queue limits max_hw_segments and max_sectors respectively (the former is the maximum number of entries the HBA can tolerate in a scatterlist for each transaction, the latter is a total transfer cap on the maxiumum number of 512 byte sectors). The default settings, assuming the HBA doesn't vary them are sg_tablesize at SG_ALL (255) and max_sectors at SCSI_DEFAULT_MAX_SECTORS (1024). A quick calculation shows the latter is actually 512k or 128 pages (at 4k pages), hence the persistent 128 entry limit. However, raising max_sectors and sg_tablesize together still doesn't help: There's actually an insidious limit sitting in the block layer as well. This is what blk_queue_max_sectors says: void blk_queue_max_sectors(struct request_queue *q, unsigned int max_sectors) { if ((max_sectors 9) PAGE_CACHE_SIZE) { max_sectors = 1 (PAGE_CACHE_SHIFT - 9); printk(%s: set to minimum %d\n, __FUNCTION__, max_sectors); } if (BLK_DEF_MAX_SECTORS max_sectors) q-max_hw_sectors = q-max_sectors = max_sectors; else { q-max_sectors = BLK_DEF_MAX_SECTORS; q-max_hw_sectors = max_sectors; } } So it imposes a maximum possible setting of BLK_DEF_MAX_SECTORS which is defined in blkdev.h to 1024, thus also forcing the queue down to 128 scatterlist entries. Once I raised this limit as well, I was able to transfer over 128 scatterlist elements during benchmark test runs of normal I/O (actually kernel compiles seem best, they hit 608 scatterlist entries). So my question, is there any reason not to raise this limit to something large (like 65536) or even eliminate it altogether? James I have an old branch here where I've swiped through the scsi drivers just to remove the SG_ALL limit. Unfortunately some drivers mean laterally 255 when using SG_ALL. So I passed driver by driver and carfully inspected the code to change it to something driver specific if they really meant 255. I have used sg_tablesize = ~0; to indicate, I don't care any will do, and some driver constant if there is a real limit. Though removing SG_ALL at the end. Should I freshen up this branch and send it. By all means; however, I think having the defined constant SG_ALL is useful (even if it is eventually just set to ~0) it means I can support any scatterlist size. Having the drivers set sg_tablesize correctly that can't support SG_ALL is pretty vital. Thanks, James OK will do. I have found the old branch and am looking. I agree with you about the SG_ALL. I will fix it to have a patch per changed driver, with out changing SG_ALL, and then final patch to just change SG_ALL. Boaz James hi reinspecting the code, what should I do with drivers that do not support chaining do to SW that still do sglist++? should I set their sg_tablesize to SG_MAX_SINGLE_ALLOC, or hard code to 128, and put a FIXME: in the submit message? or should we fix them first and serialize this effort on top of those fixes. (also in light of the other email where you removed the chaining flag) How many of them are left? The correct value is clearly SCSI_MAX_SG_SEGMENTS which fortunately [PATCH] remove use_sg_chaining moved into a shared header. Worst case, just use that and add a fixme comment giving the real value (if there is one). James I have 9 up to now and 10 more drivers to check. All but one are SW, one by one SCp.buffer++, so once it's fixed they should be able to go back to SG_ALL. But for now I will set them to SCSI_MAX_SG_SEGMENTS as you requested. I have not checked drivers that did not use SG_ALL but I trust these are usually smaller. Boaz James Hi. Looking at the patches I just realized that I made a mistake and did not work on top of your: [PATCH] remove use_sg_chaining . Now rebasing should be easy but I think my patch should go first because there are some 10-15 drivers
what is return code 70000
Hi, I already grepped, but I don't find the definition of return code = 0x0007 Just got with FC and 2.4.18 of scientitfic linux: sd 1:0:0:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sdb, sector 294388752 device-mapper: multipath: Failing path 8:16. sd 1:0:1:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sdc, sector 1713114128 device-mapper: multipath: Failing path 8:32. sd 2:0:1:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sde, sector 2094272016 device-mapper: multipath: Failing path 8:64. Since I have some error handling patches in queue for 2.6.22, I would like to know if I would have catched this error, but 0x0007 is pretty meaningless for me :( Thanks, Bernd -- Bernd Schubert Q-Leap Networks GmbH - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: what is return code 70000
On Wed, 2008-01-16 at 19:13 +0100, Bernd Schubert wrote: Hi, I already grepped, but I don't find the definition of return code = 0x0007 Just got with FC and 2.4.18 of scientitfic linux: sd 1:0:0:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sdb, sector 294388752 device-mapper: multipath: Failing path 8:16. sd 1:0:1:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sdc, sector 1713114128 device-mapper: multipath: Failing path 8:32. sd 2:0:1:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sde, sector 2094272016 device-mapper: multipath: Failing path 8:64. Since I have some error handling patches in queue for 2.6.22, I would like to know if I would have catched this error, but 0x0007 is pretty meaningless for me :( SCSI returns are 32 bit numbers with definitions in include/scsi/scsi.h going (from lowest to highest) 1. status byte: the status return code from the command if successfully executed 2. message byte: now misnamed, message is SPI specific, what it means is task status interlaced with possible SPI message responses. 3. host byte: these are the DID_ codes, specific error codes returned by drivers. 4. driver byte: Additional qualification of the error in host byte In your case, it's showing DID_ERROR. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: what is return code 70000
On Wednesday 16 January 2008 19:27:43 James Bottomley wrote: On Wed, 2008-01-16 at 19:13 +0100, Bernd Schubert wrote: Hi, I already grepped, but I don't find the definition of return code = 0x0007 Just got with FC and 2.4.18 of scientitfic linux: sd 1:0:0:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sdb, sector 294388752 device-mapper: multipath: Failing path 8:16. sd 1:0:1:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sdc, sector 1713114128 device-mapper: multipath: Failing path 8:32. sd 2:0:1:0: SCSI error: return code = 0x0007 end_request: I/O error, dev sde, sector 2094272016 device-mapper: multipath: Failing path 8:64. Since I have some error handling patches in queue for 2.6.22, I would like to know if I would have catched this error, but 0x0007 is pretty meaningless for me :( SCSI returns are 32 bit numbers with definitions in include/scsi/scsi.h going (from lowest to highest) 1. status byte: the status return code from the command if successfully executed 2. message byte: now misnamed, message is SPI specific, what it means is task status interlaced with possible SPI message responses. 3. host byte: these are the DID_ codes, specific error codes returned by drivers. 4. driver byte: Additional qualification of the error in host byte Ah, thanks! Now I understand. In your case, it's showing DID_ERROR. Hmm, which means I wouldn't have cached it :( Well, this is fibre channel, so far I only had trouble with native scsi systems. Thanks again, Bernd -- Bernd Schubert Q-Leap Networks GmbH - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html