Re: [PATCH 2/5] scsi: improved eh timeout handler

2014-02-11 Thread Christoph Hellwig
Hi Hannes,

I'll need a little reminder how we came to the conclusion that the
cancel_delayed_work in scsi_put_command was safe and we didn't need
a cancel_delayed_work_sync or flush_delayed_work.  I remember we had
that discussion, but it seems that same doesn't apply to the equivalent
call in the blk-mq codepath in my tree.  If you remember anything that
might make my life debugging this a bit easier.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2014-02-11 Thread Hannes Reinecke
On 02/11/2014 03:01 PM, Christoph Hellwig wrote:
 Hi Hannes,
 
 I'll need a little reminder how we came to the conclusion that the
 cancel_delayed_work in scsi_put_command was safe and we didn't need
 a cancel_delayed_work_sync or flush_delayed_work.  I remember we had
 that discussion, but it seems that same doesn't apply to the equivalent
 call in the blk-mq codepath in my tree.  If you remember anything that
 might make my life debugging this a bit easier.
 

Well, _actually_ the cancel_delayed_work should be pointless; I've
just added it as a terminal measure here.
(It'd actually be an idea to insert a BUG_ON() here ...)

Thing is whenever the eh_timeout thingie kicks in we most definitely
know there's a command in flight, and hence scsi_command_put()
should _never_ be called.
Only after eh_abort has finished the command will be returned via
scsi_command_put(), but then eh_abort is done for, too, and no item
should remain in the workqueue.

HTH.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2014-02-11 Thread Christoph Hellwig
On Tue, Feb 11, 2014 at 03:29:10PM +0100, Hannes Reinecke wrote:
 Well, _actually_ the cancel_delayed_work should be pointless; I've
 just added it as a terminal measure here.
 (It'd actually be an idea to insert a BUG_ON() here ...)
 
 Thing is whenever the eh_timeout thingie kicks in we most definitely
 know there's a command in flight, and hence scsi_command_put()
 should _never_ be called.
 Only after eh_abort has finished the command will be returned via
 scsi_command_put(), but then eh_abort is done for, too, and no item
 should remain in the workqueue.

The issue I saw actually was with a different workqueue, sorry for the
noise.  I have to say I really hate the generic workqueue workers which
make it almost impossible to debug issues before they hit the actual
worker function..

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2014-02-11 Thread Hannes Reinecke
On 02/12/2014 08:45 AM, Christoph Hellwig wrote:
 On Tue, Feb 11, 2014 at 03:29:10PM +0100, Hannes Reinecke wrote:
 Well, _actually_ the cancel_delayed_work should be pointless; I've
 just added it as a terminal measure here.
 (It'd actually be an idea to insert a BUG_ON() here ...)

 Thing is whenever the eh_timeout thingie kicks in we most definitely
 know there's a command in flight, and hence scsi_command_put()
 should _never_ be called.
 Only after eh_abort has finished the command will be returned via
 scsi_command_put(), but then eh_abort is done for, too, and no item
 should remain in the workqueue.
 
 The issue I saw actually was with a different workqueue, sorry for the
 noise.  I have to say I really hate the generic workqueue workers which
 make it almost impossible to debug issues before they hit the actual
 worker function..
 
Oh, workqueues are fun, no doubt.

_Especially_ when some poor deluded soul executes I/O from userspace
tasks running with RT priorities.

Handling that properly would be a fitting subject for LSF ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] scsi: improved eh timeout handler

2013-11-11 Thread Hannes Reinecke
When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new scsi_abort_command() function which
invokes an asynchronous function scsi_eh_abort_handler() to
abort the commands via the usual 'eh_abort_handler'.

If abort succeeds the command is either retried or terminated,
depending on the number of allowed retries. However, 'eh_eflags'
records the abort, so if the retry would fail again the
command is pushed onto the error handler without trying to
abort it (again); it'll be cleared up from SCSI EH.

Signed-off-by: Hannes Reinecke h...@suse.de
Cc: Ren Mingxin re...@cn.fujitsu.com
Cc: Christoph Hellwig h...@infradead.org
---
 drivers/scsi/hosts.c  |  14 -
 drivers/scsi/scsi.c   |   3 +
 drivers/scsi/scsi_error.c | 151 ++
 drivers/scsi/scsi_priv.h  |   2 +
 include/scsi/scsi_cmnd.h  |   1 +
 include/scsi/scsi_host.h  |  10 +++
 6 files changed, 167 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f334859..3b619819 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -169,6 +169,7 @@ void scsi_remove_host(struct Scsi_Host *shost)
spin_unlock_irqrestore(shost-host_lock, flags);
 
scsi_autopm_get_host(shost);
+   flush_workqueue(shost-tmf_work_q);
scsi_forget_host(shost);
mutex_unlock(shost-scan_mutex);
scsi_proc_host_rm(shost);
@@ -294,6 +295,8 @@ static void scsi_host_dev_release(struct device *dev)
 
scsi_proc_hostdir_rm(shost-hostt);
 
+   if (shost-tmf_work_q)
+   destroy_workqueue(shost-tmf_work_q);
if (shost-ehandler)
kthread_stop(shost-ehandler);
if (shost-work_q)
@@ -360,7 +363,6 @@ struct Scsi_Host *scsi_host_alloc(struct scsi_host_template 
*sht, int privsize)
INIT_LIST_HEAD(shost-eh_cmd_q);
INIT_LIST_HEAD(shost-starved_list);
init_waitqueue_head(shost-host_wait);
-
mutex_init(shost-scan_mutex);
 
/*
@@ -443,9 +445,19 @@ struct Scsi_Host *scsi_host_alloc(struct 
scsi_host_template *sht, int privsize)
goto fail_kfree;
}
 
+   shost-tmf_work_q = alloc_workqueue(scsi_tmf_%d,
+   WQ_UNBOUND | WQ_MEM_RECLAIM,
+  1, shost-host_no);
+   if (!shost-tmf_work_q) {
+   printk(KERN_WARNING scsi%d: failed to create tmf workq\n,
+  shost-host_no);
+   goto fail_kthread;
+   }
scsi_proc_hostdir_add(shost-hostt);
return shost;
 
+ fail_kthread:
+   kthread_stop(shost-ehandler);
  fail_kfree:
kfree(shost);
return NULL;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index fe0bcb1..2b04a57 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, 
gfp_t gfp_mask)
 
cmd-device = dev;
INIT_LIST_HEAD(cmd-list);
+   INIT_DELAYED_WORK(cmd-abort_work, scmd_eh_abort_handler);
spin_lock_irqsave(dev-list_lock, flags);
list_add_tail(cmd-list, dev-cmd_list);
spin_unlock_irqrestore(dev-list_lock, flags);
@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(cmd-list);
spin_unlock_irqrestore(cmd-device-list_lock, flags);
 
+   cancel_delayed_work(cmd-abort_work);
+
__scsi_put_command(cmd-device-host, cmd, sdev-sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 67c0014..cab3c9b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -53,6 +53,8 @@ static void scsi_eh_done(struct scsi_cmnd *scmd);
 #define HOST_RESET_SETTLE_TIME  (10)
 
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
+static int scsi_try_to_abort_cmd(struct scsi_host_template *,
+struct scsi_cmnd *);
 
 /* called with shost-host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -100,6 +102,116 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
 }
 
 /**
+ * scmd_eh_abort_handler - Handle command aborts
+ * @work:  command to be aborted.
+ */
+void
+scmd_eh_abort_handler(struct work_struct *work)
+{
+   struct scsi_cmnd *scmd =
+   container_of(work, struct scsi_cmnd, abort_work.work);
+   struct scsi_device *sdev = scmd-device;
+   unsigned long flags;
+   int rtn;
+
+   spin_lock_irqsave(sdev-host-host_lock, flags);
+   if (scsi_host_eh_past_deadline(sdev-host)) {
+   spin_unlock_irqrestore(sdev-host-host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+ 

Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-09 Thread James Bottomley
On Thu, 2013-11-07 at 07:45 +0100, Hannes Reinecke wrote:
 On 11/06/2013 06:23 PM, Mike Christie wrote:
  On 11/05/2013 10:48 PM, Hannes Reinecke wrote:
  On 11/05/2013 08:19 PM, Mike Christie wrote:
  On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
  +
  +scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
  +SCSI_LOG_ERROR_RECOVERY(3,
  +scmd_printk(KERN_INFO, scmd,
  +scmd %p abort scheduled\n, scmd));
  +schedule_delayed_work(scmd-abort_work, HZ / 100);
  +return SUCCESS;
  +}
 
  Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?
 
  Errm. Yes, why?
 
  I must admit I'm not _that_ familiar with workqueues ...
  Care to explain?
 
  
  We all share the above workqueue_structs pool of threads, so if we get
  stuck behind code doing GFP_KERNEL allocs that end up needing to write
  data to the disk we are now trying to aborts on, then we could get
  stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
  created at workqueue_struct create time which can get used in cases like
  that so we can always make forward progress.
  
 Ah. Right. Yes, that makes sense.
 
 I guess I'll have to redo the patches _yet again_.

So when were you going to send the redo?  The merge window is now open
and Linus was shirty with me about a late merge last time ...

James

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-09 Thread Hannes Reinecke

On 11/09/2013 09:35 AM, James Bottomley wrote:

On Thu, 2013-11-07 at 07:45 +0100, Hannes Reinecke wrote:

On 11/06/2013 06:23 PM, Mike Christie wrote:

On 11/05/2013 10:48 PM, Hannes Reinecke wrote:

On 11/05/2013 08:19 PM, Mike Christie wrote:

On 11/04/2013 11:05 PM, Hannes Reinecke wrote:

+
+   scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   scmd %p abort scheduled\n, scmd));
+   schedule_delayed_work(scmd-abort_work, HZ / 100);
+   return SUCCESS;
+}


Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?


Errm. Yes, why?

I must admit I'm not _that_ familiar with workqueues ...
Care to explain?



We all share the above workqueue_structs pool of threads, so if we get
stuck behind code doing GFP_KERNEL allocs that end up needing to write
data to the disk we are now trying to aborts on, then we could get
stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
created at workqueue_struct create time which can get used in cases like
that so we can always make forward progress.


Ah. Right. Yes, that makes sense.

I guess I'll have to redo the patches _yet again_.


So when were you going to send the redo?  The merge window is now open
and Linus was shirty with me about a late merge last time ...


Patch is prepared, I'll just have to run a verification on it.
Will be sending the updated patchset on Monday.

Cheers,

Hannes
--
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-08 Thread Hannes Reinecke
On 11/07/2013 07:33 PM, Douglas Gilbert wrote:
 On 13-11-07 01:45 AM, Hannes Reinecke wrote:
 On 11/06/2013 06:23 PM, Mike Christie wrote:
 On 11/05/2013 10:48 PM, Hannes Reinecke wrote:
 On 11/05/2013 08:19 PM, Mike Christie wrote:
 On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
 +
 +scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 +SCSI_LOG_ERROR_RECOVERY(3,
 +scmd_printk(KERN_INFO, scmd,
 +scmd %p abort scheduled\n, scmd));
 +schedule_delayed_work(scmd-abort_work, HZ / 100);
 +return SUCCESS;
 +}

 Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM
 set?

 Errm. Yes, why?

 I must admit I'm not _that_ familiar with workqueues ...
 Care to explain?


 We all share the above workqueue_structs pool of threads, so if
 we get
 stuck behind code doing GFP_KERNEL allocs that end up needing to
 write
 data to the disk we are now trying to aborts on, then we could get
 stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
 created at workqueue_struct create time which can get used in
 cases like
 that so we can always make forward progress.

 Ah. Right. Yes, that makes sense.

 I guess I'll have to redo the patches _yet again_.
 
 I wonder if it might be useful to flag a LU (disk)
 with try really hard to recover me, perhaps at the
 expense of other LUs. Seems like a LU containing the
 rootfs or swap might qualify for setting such a flag.
 And LUs that have this flag cleared could be assumed
 to not get wedged in the fashion that Mike pointed out.
 
While this would be a good idea in general, I would _very much_ see
to have this patch accepted first. Without that proviso
any discussion is pretty much moot anyway.
So I would like to defer that until the patch has been accepted.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-07 Thread Douglas Gilbert

On 13-11-07 01:45 AM, Hannes Reinecke wrote:

On 11/06/2013 06:23 PM, Mike Christie wrote:

On 11/05/2013 10:48 PM, Hannes Reinecke wrote:

On 11/05/2013 08:19 PM, Mike Christie wrote:

On 11/04/2013 11:05 PM, Hannes Reinecke wrote:

+
+   scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   scmd %p abort scheduled\n, scmd));
+   schedule_delayed_work(scmd-abort_work, HZ / 100);
+   return SUCCESS;
+}


Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?


Errm. Yes, why?

I must admit I'm not _that_ familiar with workqueues ...
Care to explain?



We all share the above workqueue_structs pool of threads, so if we get
stuck behind code doing GFP_KERNEL allocs that end up needing to write
data to the disk we are now trying to aborts on, then we could get
stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
created at workqueue_struct create time which can get used in cases like
that so we can always make forward progress.


Ah. Right. Yes, that makes sense.

I guess I'll have to redo the patches _yet again_.


I wonder if it might be useful to flag a LU (disk)
with try really hard to recover me, perhaps at the
expense of other LUs. Seems like a LU containing the
rootfs or swap might qualify for setting such a flag.
And LUs that have this flag cleared could be assumed
to not get wedged in the fashion that Mike pointed out.

Doug Gilbert


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-06 Thread Christoph Hellwig
On Tue, Nov 05, 2013 at 11:19:27AM -0800, Mike Christie wrote:
  +   scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
  +   SCSI_LOG_ERROR_RECOVERY(3,
  +   scmd_printk(KERN_INFO, scmd,
  +   scmd %p abort scheduled\n, scmd));
  +   schedule_delayed_work(scmd-abort_work, HZ / 100);
  +   return SUCCESS;
  +}
 
 Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?

That's probably required, I can think of all kinds of issues otherwise.

Not in this patch series, but in the near future it might make sense 
to convert the whole EH thread to that workqueue.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-06 Thread Mike Christie
On 11/05/2013 10:48 PM, Hannes Reinecke wrote:
 On 11/05/2013 08:19 PM, Mike Christie wrote:
 On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
 +
 +   scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 +   SCSI_LOG_ERROR_RECOVERY(3,
 +   scmd_printk(KERN_INFO, scmd,
 +   scmd %p abort scheduled\n, scmd));
 +   schedule_delayed_work(scmd-abort_work, HZ / 100);
 +   return SUCCESS;
 +}

 Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?

 Errm. Yes, why?
 
 I must admit I'm not _that_ familiar with workqueues ...
 Care to explain?
 

We all share the above workqueue_structs pool of threads, so if we get
stuck behind code doing GFP_KERNEL allocs that end up needing to write
data to the disk we are now trying to aborts on, then we could get
stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
created at workqueue_struct create time which can get used in cases like
that so we can always make forward progress.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-06 Thread Hannes Reinecke
On 11/06/2013 03:47 PM, Christoph Hellwig wrote:
 On Tue, Nov 05, 2013 at 11:19:27AM -0800, Mike Christie wrote:
 +   scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 +   SCSI_LOG_ERROR_RECOVERY(3,
 +   scmd_printk(KERN_INFO, scmd,
 +   scmd %p abort scheduled\n, scmd));
 +   schedule_delayed_work(scmd-abort_work, HZ / 100);
 +   return SUCCESS;
 +}

 Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?
 
 That's probably required, I can think of all kinds of issues otherwise.
 
 Not in this patch series, but in the near future it might make sense 
 to convert the whole EH thread to that workqueue.
 
And given the recent issues we had with RT scheduling I guess this
is the way to go.

_After_ this patchset gets in, that is ...

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-06 Thread Hannes Reinecke
On 11/06/2013 06:23 PM, Mike Christie wrote:
 On 11/05/2013 10:48 PM, Hannes Reinecke wrote:
 On 11/05/2013 08:19 PM, Mike Christie wrote:
 On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
 +
 +  scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 +  SCSI_LOG_ERROR_RECOVERY(3,
 +  scmd_printk(KERN_INFO, scmd,
 +  scmd %p abort scheduled\n, scmd));
 +  schedule_delayed_work(scmd-abort_work, HZ / 100);
 +  return SUCCESS;
 +}

 Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?

 Errm. Yes, why?

 I must admit I'm not _that_ familiar with workqueues ...
 Care to explain?

 
 We all share the above workqueue_structs pool of threads, so if we get
 stuck behind code doing GFP_KERNEL allocs that end up needing to write
 data to the disk we are now trying to aborts on, then we could get
 stuck. With WQ_MEM_RECLAIM, we have our own backup thread that gets
 created at workqueue_struct create time which can get used in cases like
 that so we can always make forward progress.
 
Ah. Right. Yes, that makes sense.

I guess I'll have to redo the patches _yet again_.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-05 Thread Mike Christie
On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
 +
 + scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 + SCSI_LOG_ERROR_RECOVERY(3,
 + scmd_printk(KERN_INFO, scmd,
 + scmd %p abort scheduled\n, scmd));
 + schedule_delayed_work(scmd-abort_work, HZ / 100);
 + return SUCCESS;
 +}

Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/5] scsi: improved eh timeout handler

2013-11-05 Thread Hannes Reinecke
On 11/05/2013 08:19 PM, Mike Christie wrote:
 On 11/04/2013 11:05 PM, Hannes Reinecke wrote:
 +
 +scmd-eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 +SCSI_LOG_ERROR_RECOVERY(3,
 +scmd_printk(KERN_INFO, scmd,
 +scmd %p abort scheduled\n, scmd));
 +schedule_delayed_work(scmd-abort_work, HZ / 100);
 +return SUCCESS;
 +}
 
 Do we want to use our own workqueue_struct with WQ_MEM_RECLAIM set?
 
Errm. Yes, why?

I must admit I'm not _that_ familiar with workqueues ...
Care to explain?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/5] scsi: improved eh timeout handler

2013-11-04 Thread Hannes Reinecke
When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new scsi_abort_command() function which
invokes an asynchronous function scsi_eh_abort_handler() to
abort the commands via the usual 'eh_abort_handler'.

If abort succeeds the command is either retried or terminated,
depending on the number of allowed retries. However, 'eh_eflags'
records the abort, so if the retry would fail again the
command is pushed onto the error handler without trying to
abort it (again); it'll be cleared up from SCSI EH.

Signed-off-by: Hannes Reinecke h...@suse.de
Cc: Ren Mingxin re...@cn.fujitsu.com
Cc: Christoph Hellwig h...@infradead.org
---
 drivers/scsi/scsi.c   |   3 +
 drivers/scsi/scsi_error.c | 151 ++
 drivers/scsi/scsi_priv.h  |   2 +
 include/scsi/scsi_cmnd.h  |   1 +
 include/scsi/scsi_host.h  |   5 ++
 5 files changed, 149 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index fe0bcb1..2b04a57 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -297,6 +297,7 @@ struct scsi_cmnd *scsi_get_command(struct scsi_device *dev, 
gfp_t gfp_mask)
 
cmd-device = dev;
INIT_LIST_HEAD(cmd-list);
+   INIT_DELAYED_WORK(cmd-abort_work, scmd_eh_abort_handler);
spin_lock_irqsave(dev-list_lock, flags);
list_add_tail(cmd-list, dev-cmd_list);
spin_unlock_irqrestore(dev-list_lock, flags);
@@ -353,6 +354,8 @@ void scsi_put_command(struct scsi_cmnd *cmd)
list_del_init(cmd-list);
spin_unlock_irqrestore(cmd-device-list_lock, flags);
 
+   cancel_delayed_work(cmd-abort_work);
+
__scsi_put_command(cmd-device-host, cmd, sdev-sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_put_command);
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 67c0014..7eecbb5 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -53,6 +53,8 @@ static void scsi_eh_done(struct scsi_cmnd *scmd);
 #define HOST_RESET_SETTLE_TIME  (10)
 
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
+static int scsi_try_to_abort_cmd(struct scsi_host_template *,
+struct scsi_cmnd *);
 
 /* called with shost-host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -100,6 +102,116 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host 
*shost)
 }
 
 /**
+ * scmd_eh_abort_handler - Handle command aborts
+ * @work:  command to be aborted.
+ */
+void
+scmd_eh_abort_handler(struct work_struct *work)
+{
+   struct scsi_cmnd *scmd =
+   container_of(work, struct scsi_cmnd, abort_work.work);
+   struct scsi_device *sdev = scmd-device;
+   unsigned long flags;
+   int rtn;
+
+   spin_lock_irqsave(sdev-host-host_lock, flags);
+   if (scsi_host_eh_past_deadline(sdev-host)) {
+   spin_unlock_irqrestore(sdev-host-host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   scmd %p eh timeout, not aborting\n,
+   scmd));
+   } else {
+   spin_unlock_irqrestore(sdev-host-host_lock, flags);
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   aborting command %p\n, scmd));
+   rtn = scsi_try_to_abort_cmd(sdev-host-hostt, scmd);
+   if (rtn == SUCCESS) {
+   scmd-result |= DID_TIME_OUT  16;
+   if (!scsi_noretry_cmd(scmd) 
+   (++scmd-retries = scmd-allowed)) {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   scmd %p retry 
+   aborted command\n, scmd));
+   scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+   } else {
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_WARNING, scmd,
+   scmd %p finish 
+   aborted command\n, scmd));
+   scsi_finish_command(scmd);
+   }
+   return;
+   }
+   SCSI_LOG_ERROR_RECOVERY(3,
+   scmd_printk(KERN_INFO, scmd,
+   scmd %p abort failed, rtn %d\n,
+   scmd, rtn));
+   }
+
+   if (!scsi_eh_scmd_add(scmd, 0)) {
+