Re: 2.6.23-rc9 boot failure (megaraid?)
On Thu, 4 Oct 2007 09:28:34 +0200 Jens Axboe [EMAIL PROTECTED] wrote: On Thu, Oct 04 2007, FUJITA Tomonori wrote: On Wed, 3 Oct 2007 17:32:55 -0600 Patro, Sumant [EMAIL PROTECTED] wrote: -Original Message- From: FUJITA Tomonori [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 02, 2007 5:01 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; DL-MegaRAID Linux; linux-scsi@vger.kernel.org Subject: Re: 2.6.23-rc9 boot failure (megaraid?) On Tue, 02 Oct 2007 15:38:13 -0500 James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2007-10-02 at 20:15 +0200, Adrian Bunk wrote: Cc's added, the complete bug report is at http://lkml.org/lkml/2007/10/2/243 On Tue, Oct 02, 2007 at 12:48:26PM -0400, Burton Windle wrote: 2.6.23-rc9 fails to boot for me; 2.6.22.9 works fine. System is a Dell Poweredge with PERC 2/DC with RAID1 volume. ... Thanks for your report. Diff'ing the dmesg's shows: -- snip -- scsi0: scanning scsi channel 4 [P0] for physical devices. scsi0: scanning scsi channel 5 [P1] for physical devices. st: Version 20070203, fixed bufsize 32768, s/g segs 256 -sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB) +sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. +sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Asking for cache data failed sd 0:0:0:0: [sda] Assuming drive cache: write through -sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB) +sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. +sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Asking for cache data failed sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 + sda: p1 exceeds device capacity -- snip -- - case MEGA_BULK_DATA: - if (scb-cmd-use_sg == 0) - length = scb-cmd-request_bufflen; - else { - struct scatterlist *sgl = - (struct scatterlist *)scb-cmd-request_buffer; - length = sgl-length; - } - pci_unmap_page(adapter-dev, scb-dma_h_bulkdata, - length, scb-dma_direction); - break; - This is the problem piece I think. We've reintroduced a very old bug: commit 51c928c34fa7cff38df584ad01de988805877dba Author: James Bottomley [EMAIL PROTECTED] Date: Sat Oct 1 09:38:05 2005 -0500 [SCSI] Legacy MegaRAID: Fix READ CAPACITY Some Legacy megaraid cards can't actually cope with the scatter/gather version of the READ CAPACITY command (which is what we now send them since altering all SCSI internal I/O to go via the block layer). Fix this (and a few other broken megaraid driver assumptions) by sending the non-sg version of the command if the sg list only has a single element. Signed-off-by: James Bottomley [EMAIL PROTECTED] So what we have to do is put back the check for use_sg == 1 and send that as a bulk transfer command. Sorry about this. Can this fix the problem? Thanks, diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 3907f67..da56163 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -1753,6 +1753,14 @@ mega_build_sglist(adapter_t *adapter, scb_t *scb, u32 *buf, u32 *len) *len = 0; + if (scsi_sg_count(cmd) == 1 !adapter-has_64bit_addr) { + sg = scsi_sglist(cmd); + scb-dma_h_bulkdata = sg_dma_address(sg); + *buf = (u32)scb-dma_h_bulkdata; + *len = sg_dma_len(sg); + return 0; + } + scsi_for_each_sg(cmd, sg, sgcnt, idx) { if (adapter-has_64bit_addr) { scb-sgl64[idx].address = sg_dma_address(sg); With this patch I see the correct logical disk size reported. Thanks. Great, thanks for testing! Can you try the following patch instead of the above patch? http://marc.info/?l=linux-scsim=119137033016550w=2 I know the changes are pretty trivial and it should work... Tomo, this is the patch I added. Thanks. I thought that it will be sent via scsi-misc because the scsi accessor patch introduced this bug. But either is ok with me. BTW,
Re: 2.6.23-rc9 boot failure (megaraid?)
On Thu, Oct 04 2007, FUJITA Tomonori wrote: On Thu, 4 Oct 2007 09:28:34 +0200 Jens Axboe [EMAIL PROTECTED] wrote: On Thu, Oct 04 2007, FUJITA Tomonori wrote: On Wed, 3 Oct 2007 17:32:55 -0600 Patro, Sumant [EMAIL PROTECTED] wrote: -Original Message- From: FUJITA Tomonori [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 02, 2007 5:01 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; DL-MegaRAID Linux; linux-scsi@vger.kernel.org Subject: Re: 2.6.23-rc9 boot failure (megaraid?) On Tue, 02 Oct 2007 15:38:13 -0500 James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2007-10-02 at 20:15 +0200, Adrian Bunk wrote: Cc's added, the complete bug report is at http://lkml.org/lkml/2007/10/2/243 On Tue, Oct 02, 2007 at 12:48:26PM -0400, Burton Windle wrote: 2.6.23-rc9 fails to boot for me; 2.6.22.9 works fine. System is a Dell Poweredge with PERC 2/DC with RAID1 volume. ... Thanks for your report. Diff'ing the dmesg's shows: -- snip -- scsi0: scanning scsi channel 4 [P0] for physical devices. scsi0: scanning scsi channel 5 [P1] for physical devices. st: Version 20070203, fixed bufsize 32768, s/g segs 256 -sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB) +sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. +sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Asking for cache data failed sd 0:0:0:0: [sda] Assuming drive cache: write through -sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB) +sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. +sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Asking for cache data failed sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 + sda: p1 exceeds device capacity -- snip -- - case MEGA_BULK_DATA: - if (scb-cmd-use_sg == 0) - length = scb-cmd-request_bufflen; - else { - struct scatterlist *sgl = - (struct scatterlist *)scb-cmd-request_buffer; - length = sgl-length; - } - pci_unmap_page(adapter-dev, scb-dma_h_bulkdata, -length, scb-dma_direction); - break; - This is the problem piece I think. We've reintroduced a very old bug: commit 51c928c34fa7cff38df584ad01de988805877dba Author: James Bottomley [EMAIL PROTECTED] Date: Sat Oct 1 09:38:05 2005 -0500 [SCSI] Legacy MegaRAID: Fix READ CAPACITY Some Legacy megaraid cards can't actually cope with the scatter/gather version of the READ CAPACITY command (which is what we now send them since altering all SCSI internal I/O to go via the block layer). Fix this (and a few other broken megaraid driver assumptions) by sending the non-sg version of the command if the sg list only has a single element. Signed-off-by: James Bottomley [EMAIL PROTECTED] So what we have to do is put back the check for use_sg == 1 and send that as a bulk transfer command. Sorry about this. Can this fix the problem? Thanks, diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 3907f67..da56163 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -1753,6 +1753,14 @@ mega_build_sglist(adapter_t *adapter, scb_t *scb, u32 *buf, u32 *len) *len = 0; + if (scsi_sg_count(cmd) == 1 !adapter-has_64bit_addr) { + sg = scsi_sglist(cmd); + scb-dma_h_bulkdata = sg_dma_address(sg); + *buf = (u32)scb-dma_h_bulkdata; + *len = sg_dma_len(sg); + return 0; + } + scsi_for_each_sg(cmd, sg, sgcnt, idx) { if (adapter-has_64bit_addr) { scb-sgl64[idx].address = sg_dma_address(sg); With this patch I see the correct logical disk size reported. Thanks. Great, thanks for testing! Can you try the following patch instead of the above patch? http://marc.info/?l=linux-scsim=119137033016550w=2 I know the changes are pretty trivial and it should work... Tomo, this is the patch I
Re: 2.6.23-rc9 boot failure (megaraid?)
On Thu, Oct 04, 2007 at 09:28:34AM +0200, Jens Axboe wrote: ... Tomo, this is the patch I added. Please excuse my comment in case this was already clear: You are aware that this bug is a regression in 2.6.23-rc and the patch should therefore go to Linus ASAP and not after the release of 2.6.23? Jens Axboe cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc9 boot failure (megaraid?)
On Thu, 4 Oct 2007 12:48:58 +0200 Adrian Bunk [EMAIL PROTECTED] wrote: On Thu, Oct 04, 2007 at 09:28:34AM +0200, Jens Axboe wrote: ... Tomo, this is the patch I added. Please excuse my comment in case this was already clear: You are aware that this bug is a regression in 2.6.23-rc and the patch should therefore go to Linus ASAP and not after the release of 2.6.23? Oops, you are right. This should go via scsi-rc-fixes tree ASAP. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc9 boot failure (megaraid?)
On Thu, Oct 04 2007, FUJITA Tomonori wrote: On Thu, 4 Oct 2007 12:48:58 +0200 Adrian Bunk [EMAIL PROTECTED] wrote: On Thu, Oct 04, 2007 at 09:28:34AM +0200, Jens Axboe wrote: ... Tomo, this is the patch I added. Please excuse my comment in case this was already clear: You are aware that this bug is a regression in 2.6.23-rc and the patch should therefore go to Linus ASAP and not after the release of 2.6.23? Oops, you are right. This should go via scsi-rc-fixes tree ASAP. Irk, the scsi accessor stuff is already in, I forgot and thought it was pending for 2.6.24. So rush the patch upstream please! -- Jens Axboe - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.23-rc9 boot failure (megaraid?)
On Thu, 2007-10-04 at 12:36 +0200, Jens Axboe wrote: On Thu, Oct 04 2007, FUJITA Tomonori wrote: On Thu, 4 Oct 2007 09:28:34 +0200 Jens Axboe [EMAIL PROTECTED] wrote: On Thu, Oct 04 2007, FUJITA Tomonori wrote: On Wed, 3 Oct 2007 17:32:55 -0600 Patro, Sumant [EMAIL PROTECTED] wrote: -Original Message- From: FUJITA Tomonori [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 02, 2007 5:01 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; Patro, Sumant; DL-MegaRAID Linux; linux-scsi@vger.kernel.org Subject: Re: 2.6.23-rc9 boot failure (megaraid?) On Tue, 02 Oct 2007 15:38:13 -0500 James Bottomley [EMAIL PROTECTED] wrote: On Tue, 2007-10-02 at 20:15 +0200, Adrian Bunk wrote: Cc's added, the complete bug report is at http://lkml.org/lkml/2007/10/2/243 On Tue, Oct 02, 2007 at 12:48:26PM -0400, Burton Windle wrote: 2.6.23-rc9 fails to boot for me; 2.6.22.9 works fine. System is a Dell Poweredge with PERC 2/DC with RAID1 volume. ... Thanks for your report. Diff'ing the dmesg's shows: -- snip -- scsi0: scanning scsi channel 4 [P0] for physical devices. scsi0: scanning scsi channel 5 [P1] for physical devices. st: Version 20070203, fixed bufsize 32768, s/g segs 256 -sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB) +sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. +sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Asking for cache data failed sd 0:0:0:0: [sda] Assuming drive cache: write through -sd 0:0:0:0: [sda] 17547264 512-byte hardware sectors (8984 MB) +sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. +sd 0:0:0:0: [sda] 1 512-byte hardware sectors (0 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Asking for cache data failed sd 0:0:0:0: [sda] Assuming drive cache: write through sda: sda1 + sda: p1 exceeds device capacity -- snip -- - case MEGA_BULK_DATA: - if (scb-cmd-use_sg == 0) - length = scb-cmd-request_bufflen; - else { - struct scatterlist *sgl = - (struct scatterlist *)scb-cmd-request_buffer; - length = sgl-length; - } - pci_unmap_page(adapter-dev, scb-dma_h_bulkdata, - length, scb-dma_direction); - break; - This is the problem piece I think. We've reintroduced a very old bug: commit 51c928c34fa7cff38df584ad01de988805877dba Author: James Bottomley [EMAIL PROTECTED] Date: Sat Oct 1 09:38:05 2005 -0500 [SCSI] Legacy MegaRAID: Fix READ CAPACITY Some Legacy megaraid cards can't actually cope with the scatter/gather version of the READ CAPACITY command (which is what we now send them since altering all SCSI internal I/O to go via the block layer). Fix this (and a few other broken megaraid driver assumptions) by sending the non-sg version of the command if the sg list only has a single element. Signed-off-by: James Bottomley [EMAIL PROTECTED] So what we have to do is put back the check for use_sg == 1 and send that as a bulk transfer command. Sorry about this. Can this fix the problem? Thanks, diff --git a/drivers/scsi/megaraid.c b/drivers/scsi/megaraid.c index 3907f67..da56163 100644 --- a/drivers/scsi/megaraid.c +++ b/drivers/scsi/megaraid.c @@ -1753,6 +1753,14 @@ mega_build_sglist(adapter_t *adapter, scb_t *scb, u32 *buf, u32 *len) *len = 0; + if (scsi_sg_count(cmd) == 1 !adapter-has_64bit_addr) { + sg = scsi_sglist(cmd); + scb-dma_h_bulkdata = sg_dma_address(sg); + *buf = (u32)scb-dma_h_bulkdata; + *len = sg_dma_len(sg); + return 0; + } + scsi_for_each_sg(cmd, sg, sgcnt, idx) { if (adapter-has_64bit_addr) { scb-sgl64[idx].address = sg_dma_address(sg); With this patch I see the correct logical disk size reported. Thanks.
SCSI runtime power management
I'm trying to work out a simple framework for SCSI runtime management but am handicapped by lack of detailed knowledge of many of the nuances in the SCSI stack. Here's an initial proposal, modelled after the USB implementation: There are three reasons why a SCSI device might be suspended. They differ in how I/O requests get treated: System suspend: The device is suspended because the system as a whole is being suspended. Incoming requests should be deferred until the system and device are resumed. Manual suspend: The user has told the kernel (via sysfs) that the device should be suspended and should remain that way, unusable until the user says otherwise. Incoming requests should fail immediately. Autosuspend: The device has been idle for an appropriate time and is automatically suspended by the SCSI midlayer. Incoming requests should cause the device to be automatically resumed. Each of these would fall under the SDEV_BLOCK state in the state model, distinguished by a couple of bits in the scsi_device structure. Or maybe there should be a separate SDEV_SUSPENDED state? In general a device would count as idle when there are no requests in flight, although there will be a way for drivers to declare that a device is busy even when no I/O is going on. The autosuspend idle-time limit will be adjustable per-device by means of a sysfs attribute. Periodically, say every 10 seconds, a timer or workqueue routine will walk through all the devices attached to each host, checking to see which of them can be autosuspended. Conversely, there will be a workqueue used for autoresuming; it will be invoked by scsi_prep_fn() whenever a request arrives for an autosuspended device. Each host structure will maintain a count of the number of unsuspended devices on its bus. (It's not clear whether an offline device should contribute to the count.) When that count drops to 0, the midlayer will call the lower-level driver through a new autosuspend method pointer in the host template, and the LLD can then autosuspend the host adapter. When the count rises above 0, the midlayer will call the LLD through a new autoresume method pointer and the LLD can then autoresume the host adapter. It might make sense to add an SHOST_SUSPENDED state, or it might be unnecessary. I don't know how any of this would interact with the link power management work Kristen has been doing. Perhaps they would cooperate nicely. Does this seem reasonable? Too naive? Not workable for some reason? Can anyone provide suggestions or helpful criticism? Thanks, Alan Stern P.S.: James, why is it that some requests are allowed to go through scsi_prep_fn() when the device is in the SDEV_BLOCK state, only to fail later in scsi_dispatch_cmd()? - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PATCH] critical megaraid bug fix for 2.6.23
This is a critical fix for the reported megaraid inability to boot: http://lkml.org/lkml/2007/10/2/243 The patch is here: master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6.git And the description and diffstat: commit d5e89385e92a77b2764d9eb8284808a7628cb2a8 Author: FUJITA Tomonori [EMAIL PROTECTED] Date: Wed Oct 3 09:00:58 2007 +0900 [SCSI] megaraid_old: fix READ_CAPACITY The bulk transfer mode got eleminated by 3f6270ef76f2ce5c134615a470685d6c2a66c07e. Unfortunately, this mode is required for READ_CAPACITY commands on certain cards, so put it back again. This fixes a boot failure regression reported by Burton Windle. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] Signed-off-by: James Bottomley [EMAIL PROTECTED] megaraid.c |8 1 file changed, 8 insertions(+) James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 02/17] PCI Error Recovery: Symbios SCSI base support
On Tue, Oct 02, 2007 at 03:49:26PM -0600, Matthew Wilcox wrote: On Tue, Oct 02, 2007 at 02:38:00PM -0700, [EMAIL PROTECTED] wrote: From: Linas Vepstas [EMAIL PROTECTED] Various PCI bus errors can be signaled by newer PCI controllers. This patch adds the PCI error recovery callbacks to the Symbios SCSI device driver. The patch has been tested, and appears to work well. Linas and I have been discussing the problems with this patch. I think we have a solution; we certainly have something in my tree that's acceptable to me; he'd jus like to test it before it's unleashed on the world. Matthew, your fix was a patch on top of my patch ... I assume you want to submit it that way, instead of reworking this patch? Anyway, I finally got a chance to run it yesterday, it worked fine. I'll try to make final coments in the other thread. --linas - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 0/2] blk request timeout handler patches
I refreshed the blk layer timeout patch from Mike Christie to 2.6.23-rc8-mm1. Fixed couple of places where scsi_dispatch_cmd() wasn't stopping timers. Thanks, Malahal. - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] [PATCH 2/2] blk request timeout handler patches
Fix scsi_dispatch_cmd() to stop timers. Signed-off-by: Malahal Naineni [EMAIL PROTECTED] diff -r 870bb598c977 drivers/scsi/scsi.c --- a/drivers/scsi/scsi.c Thu Sep 27 00:25:38 2007 -0700 +++ b/drivers/scsi/scsi.c Thu Sep 27 01:04:10 2007 -0700 @@ -471,14 +471,19 @@ int scsi_dispatch_cmd(struct scsi_cmnd * unsigned long timeout; int rtn = 0; + /* +* We will use a queued command if possible, otherwise we will +* emulate the queuing and calling of completion function ourselves. +*/ + atomic_inc(cmd-device-iorequest_cnt); + /* check if the device is still usable */ if (unlikely(cmd-device-sdev_state == SDEV_DEL)) { /* in SDEV_DEL we error all commands. DID_NO_CONNECT * returns an immediate error upwards, and signals * that the device is no longer present */ cmd-result = DID_NO_CONNECT 16; - atomic_inc(cmd-device-iorequest_cnt); - __blk_complete_request(cmd-request); + scsi_done(cmd); /* return 0 (because the command has been processed) */ goto out; } @@ -491,7 +496,8 @@ int scsi_dispatch_cmd(struct scsi_cmnd * * future requests should not occur until the device * transitions out of the suspend state. */ - scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); + + scsi_queue_retry(cmd, SCSI_MLQUEUE_DEVICE_BUSY); SCSI_LOG_MLQUEUE(3, printk(queuecommand : device blocked \n)); @@ -536,12 +542,6 @@ int scsi_dispatch_cmd(struct scsi_cmnd * scsi_log_send(cmd); /* -* We will use a queued command if possible, otherwise we will -* emulate the queuing and calling of completion function ourselves. -*/ - atomic_inc(cmd-device-iorequest_cnt); - - /* * Before we queue this command, check if the command * length exceeds what the host adapter can handle. */ @@ -571,12 +571,8 @@ int scsi_dispatch_cmd(struct scsi_cmnd * } spin_unlock_irqrestore(host-host_lock, flags); if (rtn) { - if (blk_delete_timer(cmd-request)) { - atomic_inc(cmd-device-iodone_cnt); - scsi_queue_insert(cmd, - (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ? - rtn : SCSI_MLQUEUE_HOST_BUSY); - } + scsi_queue_retry(cmd, (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ? + rtn : SCSI_MLQUEUE_HOST_BUSY); SCSI_LOG_MLQUEUE(3, printk(queuecommand : request rejected\n)); } diff -r 870bb598c977 drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c Thu Sep 27 00:25:38 2007 -0700 +++ b/drivers/scsi/scsi_lib.c Thu Sep 27 01:16:28 2007 -0700 @@ -160,6 +160,36 @@ int scsi_queue_insert(struct scsi_cmnd * return 0; } + +/* + * Function:scsi_queue_retry() + * + * Purpose: Try inserting a command in the midlevel queue. + * + * Arguments: cmd- command that we are adding to queue. + * reason - why we are inserting command to queue. + * + * Lock status: Assumed that lock is not held upon entry. + * + * Returns: Nothing. + * + * Notes: This is very similar to scsi_queue_insert except that we + * call this function when we don't know if the blk layer timer + * is active or not. We could implement this either by calling + * blk_delete_timer and inserting in the midlevel queue if we + * successfully delete the timer OR setting appropriate result + * field in the cmd and letting it go through the normal done + * routines which will retry the command. For now, We call + * blk_delete_timer! + */ +void scsi_queue_retry(struct scsi_cmnd *cmd, int reason) +{ + if (blk_delete_timer(cmd-request)) { + atomic_inc(cmd-device-iodone_cnt); + scsi_queue_insert(cmd, reason); + } +} + /** * scsi_execute - insert request and wait for the result diff -r 870bb598c977 drivers/scsi/scsi_priv.h --- a/drivers/scsi/scsi_priv.h Thu Sep 27 00:25:38 2007 -0700 +++ b/drivers/scsi/scsi_priv.h Thu Sep 27 01:03:39 2007 -0700 @@ -64,6 +64,7 @@ extern int scsi_maybe_unblock_host(struc extern int scsi_maybe_unblock_host(struct scsi_device *sdev); extern void scsi_device_unbusy(struct scsi_device *sdev); extern int scsi_queue_insert(struct scsi_cmnd *cmd, int reason); +extern void scsi_queue_retry(struct scsi_cmnd *cmd, int reason); extern void scsi_next_command(struct scsi_cmnd *cmd); extern void scsi_run_host_queues(struct Scsi_Host *shost); extern struct request_queue *scsi_alloc_queue(struct scsi_device *sdev); - To unsubscribe from this list: send the line unsubscribe
[RFC] [PATCH 1/2] blk request timeout handler patches
Mike Christie's patches refreshed to 2.6.23-rc8-mm1. Signed-off-by: Mike Christie [EMAIL PROTECTED] Signed-off-by: Malahal Naineni [EMAIL PROTECTED] diff -r 3697367c6e4d block/ll_rw_blk.c --- a/block/ll_rw_blk.c Thu Sep 27 00:12:13 2007 -0700 +++ b/block/ll_rw_blk.c Thu Sep 27 00:13:07 2007 -0700 @@ -181,6 +181,19 @@ void blk_queue_softirq_done(struct reque EXPORT_SYMBOL(blk_queue_softirq_done); +void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout) +{ + q-rq_timeout = timeout; +} +EXPORT_SYMBOL_GPL(blk_queue_rq_timeout); + +void blk_queue_rq_timed_out(struct request_queue *q, rq_timed_out_fn *fn) +{ + q-rq_timed_out_fn = fn; +} + +EXPORT_SYMBOL_GPL(blk_queue_rq_timed_out); + /** * blk_queue_make_request - define an alternate make_request function for a device * @q: the request queue for the device to be affected @@ -243,7 +256,9 @@ static void rq_init(struct request_queue { INIT_LIST_HEAD(rq-queuelist); INIT_LIST_HEAD(rq-donelist); - + init_timer(rq-timer); + + rq-timeout = 0; rq-errors = 0; rq-bio = rq-biotail = NULL; INIT_HLIST_NODE(rq-hash); @@ -2305,6 +2320,7 @@ EXPORT_SYMBOL(blk_start_queueing); */ void blk_requeue_request(struct request_queue *q, struct request *rq) { + blk_delete_timer(rq); blk_add_trace_rq(q, rq, BLK_TA_REQUEUE); if (blk_rq_tagged(rq)) @@ -3647,8 +3663,121 @@ static struct notifier_block blk_cpu_not }; /** + * blk_delete_timer - Delete/cancel timer for a given function. + * @req: request that we are canceling timer for + * + * Return value: + * 1 if we were able to detach the timer. 0 if we blew it, and the + * timer function has already started to run. + **/ +int blk_delete_timer(struct request *req) +{ + int rtn; + + if (!req-q-rq_timed_out_fn) + return 1; + + rtn = del_timer(req-timer); + req-timer.data = (unsigned long)NULL; + req-timer.function = NULL; + + return rtn; +} + +EXPORT_SYMBOL_GPL(blk_delete_timer); + +static void blk_rq_timed_out(struct request *req) +{ + struct request_queue *q = req-q; + + switch (q-rq_timed_out_fn(req)) { + case BLK_EH_HANDLED: + __blk_complete_request(req); + return; + case BLK_EH_RESET_TIMER: + blk_add_timer(req); + return; + case BLK_EH_NOT_HANDLED: + /* +* LLD handles this for now but in the future +* we can send a request msg to abort the command +* and we can move more of the generic scsi eh code to +* the blk layer. +*/ + return; + } +} + +/** + * blk_abort_req -- Request request recovery for the specified command + * req:pointer to the request of interest + * + * This function requests that the block layer start recovery for the + * request by deleting the timer and calling the q's timeout function. + * LLDDs who implement their own error recovery MAY ignore the timeout + * event if they generated blk_abort_req. + */ +void blk_abort_req(struct request *req) +{ +if (!blk_delete_timer(req)) +return; +blk_rq_timed_out(req); +} + +EXPORT_SYMBOL_GPL(blk_abort_req); + +/** + * blk_add_timer - Start timeout timer for a single request + * @req: request that is about to start running. + * + * Notes: + *Each request has its own timer, and as it is added to the queue, we + *set up the timer. When the request completes, we cancel the timer. + **/ +void blk_add_timer(struct request *req) +{ + struct request_queue *q = req-q; + + /* +* If the clock was already running for this command, then +* first delete the timer. The timer handling code gets rather +* confused if we don't do this. +*/ + if (req-timer.function) + del_timer(req-timer); + + req-timer.data = (unsigned long)req; + if (req-timeout) + req-timer.expires = jiffies + req-timeout; + else + req-timer.expires = jiffies + q-rq_timeout; + req-timer.function = (void (*)(unsigned long))blk_rq_timed_out; +add_timer(req-timer); +} + +EXPORT_SYMBOL_GPL(blk_add_timer); + +void __blk_complete_request(struct request *req) +{ + struct list_head *cpu_list; + unsigned long flags; + + BUG_ON(!req-q-softirq_done_fn); + + local_irq_save(flags); + + cpu_list = __get_cpu_var(blk_cpu_done); + list_add_tail(req-donelist, cpu_list); + raise_softirq_irqoff(BLOCK_SOFTIRQ); + + local_irq_restore(flags); +} + +EXPORT_SYMBOL_GPL(__blk_complete_request); + +/** * blk_complete_request - end I/O on a request - * @req: the request being processed + * @req: the request being processed * * Description: * Ends all I/O on a request. It does not handle partial completions, @@
Re: [PATCH 2/2]: PCI Error Recovery: Symbios SCSI First Failure
On Mon, Oct 01, 2007 at 07:27:30PM -0600, Matthew Wilcox wrote: The thing to remember is that sym2 is in transition from being a dual BSD/Linux driver to being a purely Linux driver. I was wondering about that; couldn't tell if the split in the code was historical, or being intentionally maintained. My gut instinct is to say ack, although prudence dictates that I should test first. Which might take a few days... Fine by me. I tested the patch, it worked great. It also seemed to recover much more quickly -- so quickly, in fact, that I thought something had gone wrong. I reviewed it one more time, it really does look good. A formal submission and acked by's at earliest convenience would be good. --linas - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 2/2] blk request timeout handler patches
On Thu, 4 Oct 2007 11:20:03 -0700 [EMAIL PROTECTED] wrote: Fix scsi_dispatch_cmd() to stop timers. Signed-off-by: Malahal Naineni [EMAIL PROTECTED] diff -r 870bb598c977 drivers/scsi/scsi.c --- a/drivers/scsi/scsi.c Thu Sep 27 00:25:38 2007 -0700 +++ b/drivers/scsi/scsi.c Thu Sep 27 01:04:10 2007 -0700 @@ -471,14 +471,19 @@ int scsi_dispatch_cmd(struct scsi_cmnd * unsigned long timeout; int rtn = 0; + /* + * We will use a queued command if possible, otherwise we will + * emulate the queuing and calling of completion function ourselves. + */ + atomic_inc(cmd-device-iorequest_cnt); + /* check if the device is still usable */ if (unlikely(cmd-device-sdev_state == SDEV_DEL)) { /* in SDEV_DEL we error all commands. DID_NO_CONNECT * returns an immediate error upwards, and signals * that the device is no longer present */ cmd-result = DID_NO_CONNECT 16; - atomic_inc(cmd-device-iorequest_cnt); - __blk_complete_request(cmd-request); + scsi_done(cmd); /* return 0 (because the command has been processed) */ goto out; } @@ -491,7 +496,8 @@ int scsi_dispatch_cmd(struct scsi_cmnd * * future requests should not occur until the device * transitions out of the suspend state. */ - scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY); + + scsi_queue_retry(cmd, SCSI_MLQUEUE_DEVICE_BUSY); SCSI_LOG_MLQUEUE(3, printk(queuecommand : device blocked \n)); @@ -536,12 +542,6 @@ int scsi_dispatch_cmd(struct scsi_cmnd * scsi_log_send(cmd); /* - * We will use a queued command if possible, otherwise we will - * emulate the queuing and calling of completion function ourselves. - */ - atomic_inc(cmd-device-iorequest_cnt); - - /* * Before we queue this command, check if the command * length exceeds what the host adapter can handle. */ @@ -571,12 +571,8 @@ int scsi_dispatch_cmd(struct scsi_cmnd * } spin_unlock_irqrestore(host-host_lock, flags); if (rtn) { - if (blk_delete_timer(cmd-request)) { - atomic_inc(cmd-device-iodone_cnt); - scsi_queue_insert(cmd, - (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ? - rtn : SCSI_MLQUEUE_HOST_BUSY); - } + scsi_queue_retry(cmd, (rtn == SCSI_MLQUEUE_DEVICE_BUSY) ? + rtn : SCSI_MLQUEUE_HOST_BUSY); SCSI_LOG_MLQUEUE(3, printk(queuecommand : request rejected\n)); } diff -r 870bb598c977 drivers/scsi/scsi_lib.c --- a/drivers/scsi/scsi_lib.c Thu Sep 27 00:25:38 2007 -0700 +++ b/drivers/scsi/scsi_lib.c Thu Sep 27 01:16:28 2007 -0700 @@ -160,6 +160,36 @@ int scsi_queue_insert(struct scsi_cmnd * return 0; } + +/* + * Function:scsi_queue_retry() + * + * Purpose: Try inserting a command in the midlevel queue. + * + * Arguments: cmd- command that we are adding to queue. + * reason - why we are inserting command to queue. + * + * Lock status: Assumed that lock is not held upon entry. + * + * Returns: Nothing. + * + * Notes: This is very similar to scsi_queue_insert except that we + * call this function when we don't know if the blk layer timer + * is active or not. We could implement this either by calling + * blk_delete_timer and inserting in the midlevel queue if we + * successfully delete the timer OR setting appropriate result + * field in the cmd and letting it go through the normal done + * routines which will retry the command. For now, We call + * blk_delete_timer! + */ Please use kernel-doc notation for the function interface doc. Thanks. +void scsi_queue_retry(struct scsi_cmnd *cmd, int reason) +{ + if (blk_delete_timer(cmd-request)) { + atomic_inc(cmd-device-iodone_cnt); + scsi_queue_insert(cmd, reason); + } +} --- ~Randy - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] [PATCH 1/2] blk request timeout handler patches
On Thu, 4 Oct 2007 11:17:50 -0700 [EMAIL PROTECTED] wrote: Mike Christie's patches refreshed to 2.6.23-rc8-mm1. Signed-off-by: Mike Christie [EMAIL PROTECTED] Signed-off-by: Malahal Naineni [EMAIL PROTECTED] diff -r 3697367c6e4d block/ll_rw_blk.c --- a/block/ll_rw_blk.c Thu Sep 27 00:12:13 2007 -0700 +++ b/block/ll_rw_blk.c Thu Sep 27 00:13:07 2007 -0700 @@ -181,6 +181,19 @@ void blk_queue_softirq_done(struct reque EXPORT_SYMBOL(blk_queue_softirq_done); +void blk_queue_rq_timeout(struct request_queue *q, unsigned int timeout) +{ + q-rq_timeout = timeout; +} +EXPORT_SYMBOL_GPL(blk_queue_rq_timeout); + +void blk_queue_rq_timed_out(struct request_queue *q, rq_timed_out_fn *fn) +{ + q-rq_timed_out_fn = fn; +} + Drop that blank line. +EXPORT_SYMBOL_GPL(blk_queue_rq_timed_out); + /** * blk_queue_make_request - define an alternate make_request function for a device * @q: the request queue for the device to be affected @@ -3647,8 +3663,121 @@ static struct notifier_block blk_cpu_not }; /** + * blk_delete_timer - Delete/cancel timer for a given function. + * @req: request that we are canceling timer for + * + * Return value: + * 1 if we were able to detach the timer. 0 if we blew it, and the + * timer function has already started to run. + **/ +int blk_delete_timer(struct request *req) +{ + int rtn; + + if (!req-q-rq_timed_out_fn) + return 1; + + rtn = del_timer(req-timer); + req-timer.data = (unsigned long)NULL; + req-timer.function = NULL; + + return rtn; +} + ditto. +EXPORT_SYMBOL_GPL(blk_delete_timer); + +static void blk_rq_timed_out(struct request *req) +{ + struct request_queue *q = req-q; + + switch (q-rq_timed_out_fn(req)) { + case BLK_EH_HANDLED: + __blk_complete_request(req); + return; + case BLK_EH_RESET_TIMER: + blk_add_timer(req); + return; + case BLK_EH_NOT_HANDLED: + /* + * LLD handles this for now but in the future + * we can send a request msg to abort the command + * and we can move more of the generic scsi eh code to + * the blk layer. + */ + return; + } +} + +/** + * blk_abort_req -- Request request recovery for the specified command + * req: pointer to the request of interest + * s/req:/@req:/ + * This function requests that the block layer start recovery for the + * request by deleting the timer and calling the q's timeout function. + * LLDDs who implement their own error recovery MAY ignore the timeout + * event if they generated blk_abort_req. + */ +void blk_abort_req(struct request *req) +{ +if (!blk_delete_timer(req)) +return; +blk_rq_timed_out(req); +} + drop blank line between closing } of function and the following EXPORT_SYMBOL... +EXPORT_SYMBOL_GPL(blk_abort_req); + +/** + * blk_add_timer - Start timeout timer for a single request + * @req: request that is about to start running. + * + * Notes: + *Each request has its own timer, and as it is added to the queue, we + *set up the timer. When the request completes, we cancel the timer. + **/ */ +void blk_add_timer(struct request *req) +{ + struct request_queue *q = req-q; + + /* + * If the clock was already running for this command, then + * first delete the timer. The timer handling code gets rather + * confused if we don't do this. + */ + if (req-timer.function) + del_timer(req-timer); + + req-timer.data = (unsigned long)req; + if (req-timeout) + req-timer.expires = jiffies + req-timeout; + else + req-timer.expires = jiffies + q-rq_timeout; + req-timer.function = (void (*)(unsigned long))blk_rq_timed_out; +add_timer(req-timer); +} + drop blank line. Please just check the rest of them. +EXPORT_SYMBOL_GPL(blk_add_timer); + +/** * blk_complete_request - end I/O on a request - * @req: the request being processed + * @req: the request being processed * * Description: * Ends all I/O on a request. It does not handle partial completions, @@ -3657,25 +3786,24 @@ static struct notifier_block blk_cpu_not * through a softirq handler. The user must have registered a completion * callback through blk_queue_softirq_done(). **/ */ - void blk_complete_request(struct request *req) { --- ~Randy - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC] [PATCH 1/2] blk request timeout handler patches
ACK for the trivial portion surrounding aacraid and ips! Sincerely -- Mark Salyzyn -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, October 04, 2007 2:18 PM To: linux-scsi@vger.kernel.org; [EMAIL PROTECTED] Subject: [RFC] [PATCH 1/2] blk request timeout handler patches Mike Christie's patches refreshed to 2.6.23-rc8-mm1. Signed-off-by: Mike Christie [EMAIL PROTECTED] Signed-off-by: Malahal Naineni [EMAIL PROTECTED] . . . diff -r 3697367c6e4d drivers/scsi/aacraid/aachba.c --- a/drivers/scsi/aacraid/aachba.c Thu Sep 27 00:12:13 2007 -0700 +++ b/drivers/scsi/aacraid/aachba.c Thu Sep 27 00:13:07 2007 -0700 @@ -1125,7 +1125,7 @@ static struct aac_srb * aac_scsi_common( srbcmd-id = cpu_to_le32(scmd_id(cmd)); srbcmd-lun = cpu_to_le32(cmd-device-lun); srbcmd-flags= cpu_to_le32(flag); - timeout = cmd-timeout_per_command/HZ; + timeout = cmd-request-timeout/HZ; if (timeout == 0) timeout = 1; srbcmd-timeout = cpu_to_le32(timeout); // timeout in seconds . . . diff -r 3697367c6e4d drivers/scsi/ips.c --- a/drivers/scsi/ips.c Thu Sep 27 00:12:13 2007 -0700 +++ b/drivers/scsi/ips.c Thu Sep 27 00:13:07 2007 -0700 @@ -3862,7 +3862,7 @@ ips_send_cmd(ips_ha_t * ha, ips_scb_t * scb-cmd.dcdb.segment_4G = 0; scb-cmd.dcdb.enhanced_sg = 0; - TimeOut = scb-scsi_cmd-timeout_per_command; + TimeOut = scb-scsi_cmd-request-timeout; if (ha-subsys-param[4] 0x0010) { /* If NEW Tape DCDB is Supported */ if (!scb-sg_len) { . . . - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html