Re: [PATCH 2/5] scsi: improved eh timeout handler
Hi Hannes, I'll need a little reminder how we came to the conclusion that the cancel_delayed_work in scsi_put_command was safe and we didn't need a cancel_delayed_work_sync or flush_delayed_work. I remember we had that discussion, but it seems that same doesn't apply to the equivalent call in the blk-mq codepath in my tree. If you remember anything that might make my life debugging this a bit easier. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 02/11/2014 03:01 PM, Christoph Hellwig wrote: Hi Hannes, I'll need a little reminder how we came to the conclusion that the cancel_delayed_work in scsi_put_command was safe and we didn't need a cancel_delayed_work_sync or flush_delayed_work. I remember we had that discussion, but it seems that same doesn't apply to the equivalent call in the blk-mq codepath in my tree. If you remember anything that might make my life debugging this a bit easier. Well, _actually_ the cancel_delayed_work should be pointless; I've just added it as a terminal measure here. (It'd actually be an idea to insert a BUG_ON() here ...) Thing is whenever the eh_timeout thingie kicks in we most definitely know there's a command in flight, and hence scsi_command_put() should _never_ be called. Only after eh_abort has finished the command will be returned via scsi_command_put(), but then eh_abort is done for, too, and no item should remain in the workqueue. HTH. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On Tue, Feb 11, 2014 at 03:29:10PM +0100, Hannes Reinecke wrote: Well, _actually_ the cancel_delayed_work should be pointless; I've just added it as a terminal measure here. (It'd actually be an idea to insert a BUG_ON() here ...) Thing is whenever the eh_timeout thingie kicks in we most definitely know there's a command in flight, and hence scsi_command_put() should _never_ be called. Only after eh_abort has finished the command will be returned via scsi_command_put(), but then eh_abort is done for, too, and no item should remain in the workqueue. The issue I saw actually was with a different workqueue, sorry for the noise. I have to say I really hate the generic workqueue workers which make it almost impossible to debug issues before they hit the actual worker function.. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 02/12/2014 08:45 AM, Christoph Hellwig wrote: On Tue, Feb 11, 2014 at 03:29:10PM +0100, Hannes Reinecke wrote: Well, _actually_ the cancel_delayed_work should be pointless; I've just added it as a terminal measure here. (It'd actually be an idea to insert a BUG_ON() here ...) Thing is whenever the eh_timeout thingie kicks in we most definitely know there's a command in flight, and hence scsi_command_put() should _never_ be called. Only after eh_abort has finished the command will be returned via scsi_command_put(), but then eh_abort is done for, too, and no item should remain in the workqueue. The issue I saw actually was with a different workqueue, sorry for the noise. I have to say I really hate the generic workqueue workers which make it almost impossible to debug issues before they hit the actual worker function.. Oh, workqueues are fun, no doubt. _Especially_ when some poor deluded soul executes I/O from userspace tasks running with RT priorities. Handling that properly would be a fitting subject for LSF ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] scsi: improved eh timeout handler
When a command runs into a timeout we need to send an 'ABORT TASK' TMF. This is typically done by the 'eh_abort_handler' LLDD callback. Conceptually, however, this function is a normal SCSI command, so there is no need to enter the error handler. This patch implements a new scsi_abort_command() function which invokes an asynchronous function scsi_eh_abort_handler() to abort the commands via the usual 'eh_abort_handler'. If abort succeeds the command is either retried or terminated, depending on the number of allowed retries. However, 'eh_eflags' records the abort, so if the retry would fail again the command is pushed onto the error handler without trying to abort it (again); it'll be cleared up from SCSI EH. Signed-off-by: Hannes Reinecke h...@suse.de Cc: Ren Mingxin re...@cn.fujitsu.com Cc: Christoph Hellwig h...@infradead.org --- drivers/scsi/hosts.c | 14 - drivers/scsi/scsi.c | 3 + drivers/scsi/scsi_error.c | 151 ++ drivers/scsi/scsi_priv.h | 2 + include/scsi/scsi_cmnd.h | 1 + include/scsi/scsi_host.h | 10 +++ 6 files changed, 167 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c index f334859..3b619819 100644 --- a/drivers/scsi/hosts.c +++ b/drivers/scsi/hosts.c @@ -169,6 +169,7 @@ void scsi_remove_host(struct Scsi_Host *shost) spin_unlock_irqrestore(shost-host_lock, flags); scsi_autopm_get_host(shost); + flush_workqueue(shost-tmf_work_q); scsi_forget_host(shost); mutex_unlock(shost-scan_mutex); scsi_proc_host_rm(shost); @@ -294,6 +295,8 @@ static void scsi_host_dev_release(struct device *dev) scsi_proc_hostdir_rm(shost-hostt); + if (shost-tmf_work_q) + destroy_workqueue(shost-tmf_work_q); if (shost-ehandler) kthread_stop(shost-ehandler); if (shost-work_q) @@ -360,7 +363,6 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) INIT_LIST_HEAD(shost-eh_cmd_q); INIT_LIST_HEAD(shost-starved_list); init_waitqueue_head(shost-host_wait); - mutex_init(shost-scan_mutex); /* @@ -443,9 +445,19 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template *sht, int privsize) goto fail_kfree; } + shost-tmf_work_q = alloc_workqueue(scsi_tmf_%d, + WQ_UNBOUND | WQ_MEM_RECLAIM, + 1, shost-host_no); + if (!shost-tmf_work_q) { + printk(KERN_WARNING scsi%d: failed to create tmf workq\n, + shost-host_no); + goto fail_kthread; + } scsi_proc_hostdir_add(shost-hostt); return shost; + fail_kthread: + kthread_stop(shost-ehandler); fail_kfree: kfree(shost); return NULL; diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index fe0bcb1..2b04a57 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask) cmd-device = dev; INIT_LIST_HEAD(cmd-list); + INIT_DELAYED_WORK(cmd-abort_work, scmd_eh_abort_handler); spin_lock_irqsave(dev-list_lock, flags); list_add_tail(cmd-list, dev-cmd_list); spin_unlock_irqrestore(dev-list_lock, flags); @@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd) list_del_init(cmd-list); spin_unlock_irqrestore(cmd-device-list_lock, flags); + cancel_delayed_work(cmd-abort_work); + __scsi_put_command(cmd-device-host, cmd, sdev-sdev_gendev); } EXPORT_SYMBOL(scsi_put_command); diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 67c0014..cab3c9b 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -53,6 +53,8 @@ static void scsi_eh_done(struct scsi_cmnd *scmd); #define HOST_RESET_SETTLE_TIME (10) static int scsi_eh_try_stu(struct scsi_cmnd *scmd); +static int scsi_try_to_abort_cmd(struct scsi_host_template *, +struct scsi_cmnd *); /* called with shost-host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) @@ -100,6 +102,116 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) } /** + * scmd_eh_abort_handler - Handle command aborts + * @work: command to be aborted. + */ +void +scmd_eh_abort_handler(struct work_struct *work) +{ + struct scsi_cmnd *scmd = + container_of(work, struct scsi_cmnd, abort_work.work); + struct scsi_device *sdev = scmd-device; + unsigned long flags; + int rtn; + + spin_lock_irqsave(sdev-host-host_lock, flags); + if (scsi_host_eh_past_deadline(sdev-host)) { + spin_unlock_irqrestore(sdev-host-host_lock, flags); + SCSI_LOG_ERROR_RECOVERY(3, +
Re: [PATCH 2/5] scsi: improved eh timeout handler
On Thu, 2013-11-07 at 07:45 +0100, Hannes Reinecke wrote: On 11/06/2013 06:23 PM, Mike Christie wrote: On 11/05/2013 10:48 PM, Hannes Reinecke wrote: On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + +scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; +SCSI_LOG_ERROR_RECOVERY(3, +scmd_printk(KERN_INFO, scmd, +scmd %p abort scheduled\n, scmd)); +schedule_delayed_work(scmd-abort_work, HZ / 100); +return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? We all share the above workqueue_structs pool of threads, so if we get stuck behind code doing GFP_KERNEL allocs that end up needing to write data to the disk we are now trying to aborts on, then we could get stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets created at workqueue_struct create time which can get used in cases like that so we can always make forward progress. Ah. Right. Yes, that makes sense. I guess I'll have to redo the patches _yet again_. So when were you going to send the redo? The merge window is now open and Linus was shirty with me about a late merge last time ... James -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/09/2013 09:35 AM, James Bottomley wrote: On Thu, 2013-11-07 at 07:45 +0100, Hannes Reinecke wrote: On 11/06/2013 06:23 PM, Mike Christie wrote: On 11/05/2013 10:48 PM, Hannes Reinecke wrote: On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? We all share the above workqueue_structs pool of threads, so if we get stuck behind code doing GFP_KERNEL allocs that end up needing to write data to the disk we are now trying to aborts on, then we could get stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets created at workqueue_struct create time which can get used in cases like that so we can always make forward progress. Ah. Right. Yes, that makes sense. I guess I'll have to redo the patches _yet again_. So when were you going to send the redo? The merge window is now open and Linus was shirty with me about a late merge last time ... Patch is prepared, I'll just have to run a verification on it. Will be sending the updated patchset on Monday. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/07/2013 07:33 PM, Douglas Gilbert wrote: On 13-11-07 01:45 AM, Hannes Reinecke wrote: On 11/06/2013 06:23 PM, Mike Christie wrote: On 11/05/2013 10:48 PM, Hannes Reinecke wrote: On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + +scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; +SCSI_LOG_ERROR_RECOVERY(3, +scmd_printk(KERN_INFO, scmd, +scmd %p abort scheduled\n, scmd)); +schedule_delayed_work(scmd-abort_work, HZ / 100); +return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? We all share the above workqueue_structs pool of threads, so if we get stuck behind code doing GFP_KERNEL allocs that end up needing to write data to the disk we are now trying to aborts on, then we could get stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets created at workqueue_struct create time which can get used in cases like that so we can always make forward progress. Ah. Right. Yes, that makes sense. I guess I'll have to redo the patches _yet again_. I wonder if it might be useful to flag a LU (disk) with try really hard to recover me, perhaps at the expense of other LUs. Seems like a LU containing the rootfs or swap might qualify for setting such a flag. And LUs that have this flag cleared could be assumed to not get wedged in the fashion that Mike pointed out. While this would be a good idea in general, I would _very much_ see to have this patch accepted first. Without that proviso any discussion is pretty much moot anyway. So I would like to defer that until the patch has been accepted. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 13-11-07 01:45 AM, Hannes Reinecke wrote: On 11/06/2013 06:23 PM, Mike Christie wrote: On 11/05/2013 10:48 PM, Hannes Reinecke wrote: On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? We all share the above workqueue_structs pool of threads, so if we get stuck behind code doing GFP_KERNEL allocs that end up needing to write data to the disk we are now trying to aborts on, then we could get stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets created at workqueue_struct create time which can get used in cases like that so we can always make forward progress. Ah. Right. Yes, that makes sense. I guess I'll have to redo the patches _yet again_. I wonder if it might be useful to flag a LU (disk) with try really hard to recover me, perhaps at the expense of other LUs. Seems like a LU containing the rootfs or swap might qualify for setting such a flag. And LUs that have this flag cleared could be assumed to not get wedged in the fashion that Mike pointed out. Doug Gilbert -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On Tue, Nov 05, 2013 at 11:19:27AM -0800, Mike Christie wrote: + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? That's probably required, I can think of all kinds of issues otherwise. Not in this patch series, but in the near future it might make sense to convert the whole EH thread to that workqueue. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/05/2013 10:48 PM, Hannes Reinecke wrote: On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? We all share the above workqueue_structs pool of threads, so if we get stuck behind code doing GFP_KERNEL allocs that end up needing to write data to the disk we are now trying to aborts on, then we could get stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets created at workqueue_struct create time which can get used in cases like that so we can always make forward progress. -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/06/2013 03:47 PM, Christoph Hellwig wrote: On Tue, Nov 05, 2013 at 11:19:27AM -0800, Mike Christie wrote: + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? That's probably required, I can think of all kinds of issues otherwise. Not in this patch series, but in the near future it might make sense to convert the whole EH thread to that workqueue. And given the recent issues we had with RT scheduling I guess this is the way to go. _After_ this patchset gets in, that is ... Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/06/2013 06:23 PM, Mike Christie wrote: On 11/05/2013 10:48 PM, Hannes Reinecke wrote: On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? We all share the above workqueue_structs pool of threads, so if we get stuck behind code doing GFP_KERNEL allocs that end up needing to write data to the disk we are now trying to aborts on, then we could get stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets created at workqueue_struct create time which can get used in cases like that so we can always make forward progress. Ah. Right. Yes, that makes sense. I guess I'll have to redo the patches _yet again_. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort scheduled\n, scmd)); + schedule_delayed_work(scmd-abort_work, HZ / 100); + return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/5] scsi: improved eh timeout handler
On 11/05/2013 08:19 PM, Mike Christie wrote: On 11/04/2013 11:05 PM, Hannes Reinecke wrote: + +scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED; +SCSI_LOG_ERROR_RECOVERY(3, +scmd_printk(KERN_INFO, scmd, +scmd %p abort scheduled\n, scmd)); +schedule_delayed_work(scmd-abort_work, HZ / 100); +return SUCCESS; +} Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set? Errm. Yes, why? I must admit I'm not _that_ familiar with workqueues ... Care to explain? Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] scsi: improved eh timeout handler
When a command runs into a timeout we need to send an 'ABORT TASK' TMF. This is typically done by the 'eh_abort_handler' LLDD callback. Conceptually, however, this function is a normal SCSI command, so there is no need to enter the error handler. This patch implements a new scsi_abort_command() function which invokes an asynchronous function scsi_eh_abort_handler() to abort the commands via the usual 'eh_abort_handler'. If abort succeeds the command is either retried or terminated, depending on the number of allowed retries. However, 'eh_eflags' records the abort, so if the retry would fail again the command is pushed onto the error handler without trying to abort it (again); it'll be cleared up from SCSI EH. Signed-off-by: Hannes Reinecke h...@suse.de Cc: Ren Mingxin re...@cn.fujitsu.com Cc: Christoph Hellwig h...@infradead.org --- drivers/scsi/scsi.c | 3 + drivers/scsi/scsi_error.c | 151 ++ drivers/scsi/scsi_priv.h | 2 + include/scsi/scsi_cmnd.h | 1 + include/scsi/scsi_host.h | 5 ++ 5 files changed, 149 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index fe0bcb1..2b04a57 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, gfp_t gfp_mask) cmd-device = dev; INIT_LIST_HEAD(cmd-list); + INIT_DELAYED_WORK(cmd-abort_work, scmd_eh_abort_handler); spin_lock_irqsave(dev-list_lock, flags); list_add_tail(cmd-list, dev-cmd_list); spin_unlock_irqrestore(dev-list_lock, flags); @@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd) list_del_init(cmd-list); spin_unlock_irqrestore(cmd-device-list_lock, flags); + cancel_delayed_work(cmd-abort_work); + __scsi_put_command(cmd-device-host, cmd, sdev-sdev_gendev); } EXPORT_SYMBOL(scsi_put_command); diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 67c0014..7eecbb5 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -53,6 +53,8 @@ static void scsi_eh_done(struct scsi_cmnd *scmd); #define HOST_RESET_SETTLE_TIME (10) static int scsi_eh_try_stu(struct scsi_cmnd *scmd); +static int scsi_try_to_abort_cmd(struct scsi_host_template *, +struct scsi_cmnd *); /* called with shost-host_lock held */ void scsi_eh_wakeup(struct Scsi_Host *shost) @@ -100,6 +102,116 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost) } /** + * scmd_eh_abort_handler - Handle command aborts + * @work: command to be aborted. + */ +void +scmd_eh_abort_handler(struct work_struct *work) +{ + struct scsi_cmnd *scmd = + container_of(work, struct scsi_cmnd, abort_work.work); + struct scsi_device *sdev = scmd-device; + unsigned long flags; + int rtn; + + spin_lock_irqsave(sdev-host-host_lock, flags); + if (scsi_host_eh_past_deadline(sdev-host)) { + spin_unlock_irqrestore(sdev-host-host_lock, flags); + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p eh timeout, not aborting\n, + scmd)); + } else { + spin_unlock_irqrestore(sdev-host-host_lock, flags); + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + aborting command %p\n, scmd)); + rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd); + if (rtn == SUCCESS) { + scmd-result |= DID_TIME_OUT 16; + if (!scsi_noretry_cmd(scmd) + (++scmd-retries = scmd-allowed)) { + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_WARNING, scmd, + scmd %p retry + aborted command\n, scmd)); + scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY); + } else { + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_WARNING, scmd, + scmd %p finish + aborted command\n, scmd)); + scsi_finish_command(scmd); + } + return; + } + SCSI_LOG_ERROR_RECOVERY(3, + scmd_printk(KERN_INFO, scmd, + scmd %p abort failed, rtn %d\n, + scmd, rtn)); + } + + if (!scsi_eh_scmd_add(scmd, 0)) { +