Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Jeremy Linton
On 5/13/2013 12:46 AM, Hannes Reinecke wrote:

 True. But and the end of the day, we _do_ want to recover the failed LUN.
 If we were to disable that faulty LUN and continue running with the others
 we won't have a chance of _ever_ recovering that one LUN.

I don't buy this. Especially for FC devices, the vast majority of 
errors I see
are related to zoning, SFP and cabling problems. Once one of those happens you
tend to get a lot of shotgun debugging, which injects all kinds of
further errors. None of these errors are fixed by the linux error recovery 
paths.

That said, if the admin fixes something, for FC/SAS (and potentially 
others)
you _WILL_ get notification that the device is online again.


 SET when the link is down). So we basically _have_ to escalate it to the
 next level. Even though that will mean to stop I/O to other, hitherto
 unaffected instances.

And a single failure, turns into performance bubbles and further errors 
on
other devices. Particularly if the functional devices are stateful, and the
error recovery mechanism isn't sufficiently intelligent about that state (see
tape drives). Think about what happens when a marginal SFP on a target causes
a device to repeatably drop off and reappear at some random point in the future.


Anyway, It is possible to make a determination about the topology and 
make
decisions about the likely-hood of any given portion being at fault. For
example, if one lun on a target has failed and the remainder continue to work,
then its unlikely that if abort and lun reset fail that anything higher up in
the stack is going to succeed.

I feel pretty strongly, at that point your better off providing good
diagnostics about the failure and expecting user interaction rather than
muddying the waters by causing other device interruptions. If the user tries
everything and determines that a HBA reset is the right choice, provide that
option, don't do it for them.

If every device attached to the HBA fails then resetting the HBA is a 
valid
choice, not before. Same for I_T.



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Hannes Reinecke
On 05/13/2013 04:40 PM, Jeremy Linton wrote:
 On 5/13/2013 12:46 AM, Hannes Reinecke wrote:
 
 True. But and the end of the day, we _do_ want to recover the failed LUN.
 If we were to disable that faulty LUN and continue running with the others
 we won't have a chance of _ever_ recovering that one LUN.
 
   I don't buy this. Especially for FC devices, the vast majority of 
 errors I see
 are related to zoning, SFP and cabling problems. Once one of those happens you
 tend to get a lot of shotgun debugging, which injects all kinds of
 further errors.   None of these errors are fixed by the linux error 
 recovery paths.
 
   That said, if the admin fixes something, for FC/SAS (and potentially 
 others)
 you _WILL_ get notification that the device is online again.
 

Well, yes, of course.
Sadly, these kind of errors tend to be very erratic and very hard to
diagnose. There simply is no way telling that the error you've had
is due to a bad cable or bad SFP.
Bad zoning is easy; then the device is simply not reachable anymore.

So for error recovery we first have to assume that the error is
fixable. And then we have a standard way of trying to fix this error.
The problem we have is that we lose all information about the error
once it's 'fixed' (ie after eh is done). Which is the main problem
with bad cabling: we're running the same sequence all over again,
without ever figuring out 'hey, I've done this already'.

sd.c has some _very_ limited support for this. But trying to
generalise things here will be _hard_.

So yeah, I see your point. In fact, I've been bitten by this, too.
But the error scenarios I've seen are far to complex to have them
modelled into something re-usable.

 SET when the link is down). So we basically _have_ to escalate it to the
 next level. Even though that will mean to stop I/O to other, hitherto
 unaffected instances.
 
   And a single failure, turns into performance bubbles and further errors 
 on
 other devices. Particularly if the functional devices are stateful, and the
 error recovery mechanism isn't sufficiently intelligent about that state (see
 tape drives). Think about what happens when a marginal SFP on a target causes
 a device to repeatably drop off and reappear at some random point in the 
 future.
 
 
   Anyway, It is possible to make a determination about the topology and 
 make
 decisions about the likely-hood of any given portion being at fault. For
 example, if one lun on a target has failed and the remainder continue to work,
 then its unlikely that if abort and lun reset fail that anything higher up in
 the stack is going to succeed.
 
Which is why I suggested 'ABORT TASK SET' instead of 'LUN reset'.
That will be restricted to the I_T_L nexus, and leave the rest of
the LUN alone (or so one hopes).

   I feel pretty strongly, at that point your better off providing good
 diagnostics about the failure and expecting user interaction rather than
 muddying the waters by causing other device interruptions. If the user tries
 everything and determines that a HBA reset is the right choice, provide that
 option, don't do it for them.
 
   If every device attached to the HBA fails then resetting the HBA is a 
 valid
 choice, not before. Same for I_T.
 
Hmm. Really not sure.

Take the 'target not responding' case. (which is what triggered this
whole issue anyway). Say a target port went out to lunch and don't
respond to FC commands anymore.

With our current EH it'll take _ages_, but eventually the big hammer
hits (or the device comes back) and everything is back to normal again.
So LUN reset (or ABORT TASK SET) fails.
The other LUNs haven't reported an error. But how do you know
whether they are still okay? The other LUNs might simply be idle,
and no commands have been send to them.
So the state's still good. Do we reset the I_T nexus or not?

If we do, we would find that the entire rport doesn't respond, so
the devloss_tmo mechanism would trigger, and eventually the rport
will disappear and we're back on normal operation.

If we don't the LUN will be stuck forever, until someone actually
issues I/O to the other LUNs for that rport. And only when I/O is
issued to the _last_ LUN we'll decide to reset the I_T nexus.
Not a very appealing scenario.

And 'reset I_T nexus' should be a rather fast operation; with a bit
of luck the other rports wouldn't even notice.
I've had a prototype running which would just kick off the
dev_loss_tmo mechanism; that worked like a charm.
(Agreed, as James Smart indicated 'only by luck', but nevertheless)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

RE: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Elliott, Robert (Server Storage)


 -Original Message-
 From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
 ow...@vger.kernel.org] On Behalf Of Ewan Milne
 Sent: Friday, 10 May, 2013 11:59 AM
 To: Hannes Reinecke
 Cc: Baruch Even; Martin K. Petersen; linux-scsi; michaelc
 Subject: Re: [PATCH] scsi: Allow error handling timeout to be specified
 
 On Fri, 2013-05-10 at 16:24 +0200, Hannes Reinecke wrote:
  On 05/10/2013 04:01 PM, Ewan Milne wrote:
   On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
   On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com
 wrote:
  
  
   I would argue that waiting for the eh to timeout before you switch to
   another path is most likely to be wrong. If you did the first pass of
   error recovery (task abort) and that failed the
   path/hba/logical-device is doomed. If you will switch to another path
   it will either work (meaning the path/hba were bad) or not (logical
   device was the culprit).
  
   It is necessary to either know the disposition of a command or
   else wait for a defined amount of time before retrying the command on
   another path.  Otherwise you run the risk that the command will
   eventually complete on the first path.  So yes, we need to do the abort
   (and its timeout).
  
  Strictly speaking that's not true.
  Yes, we do need to wait for a certain amount of time for the command
  completion to come in.
 
  However, this time is only defined _on the initiator_.
  The specification does _NOT_ have any fixed timeout values for _any_
  command. As such it could in theory (and does, if you happen to run
  against certain arrays under certain conditions) take several
  minutes to return a completion.

The REPORT SUPPORTED OPERATION CODES command (see SPC-4) 
returns nominal and recommended timeout values for each supported
command.  Similarly, REPORT SUPPORTED TASK MANAGEMENT FUNCTIONS
returns timeouts for task management functions.

Those times are from the device server's perspective, so any fabric 
overhead needs to be added.

Those commands and the command timeout descriptors are optional.
They are proposed to be mandatory in the Base feature set, though.

 Granted.  (e.g. in the case of WRITE SAME, it could be a while before
 the command completes, and retrying it on another path too quickly,
 followed by other WRITE commands could be a disaster).  So the timeout
 used for the original command has to be appropriate for the command.
 Reducing that timeout and issuing an abort / lun reset / target reset
 to try to fail over to another path earlier won't work if the device
 never gets the abort / lun reset / target reset and the command is still
 executing.

One problem with the ABORT TASK and I_T NEXUS RESET task management
functions is they must be sent down the same I_T nexus as the command(s)
that ran into timeouts.  If that I_T nexus is the source of the problem,
then they are likely to timeout as well.

The REMOVE I_T NEXUS command (standardized in March 2012 in 
SPC-4 revision 35) is designed to be sent down a different I_T nexus - the 
failover path.  It ensures that commands on the original I_T nexus won't 
suddenly resume.  That command is optional and still very new in
standards time.




Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Jeremy Linton
On 5/13/2013 10:03 AM, Hannes Reinecke wrote:
 The other LUNs haven't reported an error. But how do you know whether they
 are still okay? The other LUNs might simply be idle, and no commands have
 been send to them.

Well, how about generating std inquiry against them if they are idle 
and the
given HBA has a device in error state? Then you can make a rough approximation
of what has failed, and escalate the error handling if all the devices at a
particular level have failed.

The midlayer may not even need to send the inquiries. If the individual
device drivers (sd/st/etc) are responsible for monitoring and error recovery
then they can be tasked with determining device availability as well. I think
this solves other problems too. For example, the use of TUR in the midlayer,
is a problem because it doesn't have enough knowledge about the possible check
conditions being returned to act on them appropriately.







--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Baruch Even
On Mon, May 13, 2013 at 6:58 PM, Jeremy Linton jlin...@tributary.com wrote:
 On 5/13/2013 10:03 AM, Hannes Reinecke wrote:
 The other LUNs haven't reported an error. But how do you know whether they
 are still okay? The other LUNs might simply be idle, and no commands have
 been send to them.

 Well, how about generating std inquiry against them if they are idle 
 and the
 given HBA has a device in error state? Then you can make a rough approximation
 of what has failed, and escalate the error handling if all the devices at a
 particular level have failed.

 The midlayer may not even need to send the inquiries. If the 
 individual
 device drivers (sd/st/etc) are responsible for monitoring and error recovery
 then they can be tasked with determining device availability as well. I think
 this solves other problems too. For example, the use of TUR in the midlayer,
 is a problem because it doesn't have enough knowledge about the possible check
 conditions being returned to act on them appropriately.

Such an approach is preferable IMO than the big hammer, especially if
we are talking about a likely condition of using multipath and having
other links over the same host  that do have traffic flowing through
them. If there is traffic already on the same host there is no reason
to do a host reset, if there is no traffic and there are no other
luns, go for the big gun it will not matter to anything else, if there
are other inactive luns some mechanism to trigger some basic traffic
(inquiry/tur) on them is much preferable to just a plain big hammer
application.

It might be that the kernel is not the right place for all of this
diagnostics work but then some interface for an external daemon to do
this diagnostics is preferable to just wielding the big hammer and
killing all traffic.

In my experience if the device doesn't respond it usually just
disappeared from the network, if it is on the network and the task
abort or target reset do not return successfully it is either unlikely
that the host reset will help (the host is fine, device is gone) or
that the host reset is the only way since the host is dead but then a
simple check on all other luns would reveal that quite fast. In many
cases the host controller itself is mostly dead and the driver could
detect that on its own without waiting for the traffic to time out but
that's an issue for each driver to handle.

Baruch

Baruch
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Martin K. Petersen
 Jeremy == Jeremy Linton jlin...@tributary.com writes:

Jeremy Well, how about generating std inquiry against them if they are
Jeremy idle and the given HBA has a device in error state? Then you can
Jeremy make a rough approximation of what has failed, and escalate the
Jeremy error handling if all the devices at a particular level have
Jeremy failed.

It's not that simple, unfortunately. Some HBAs keep more state than
others. We see cases fairly often where a misbehaving target has
confused the HBA enough that we can not bring the device back without
doing an HBA firmware reset. Despite I/O completing successfully on
other targets connected to the same HBA.

So at some point we do need to give up and escalate to a full HBA
reset. We would just like to defer that hammer until we have run out of
other options.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Jeremy Linton
On 5/13/2013 3:29 PM, Martin K. Petersen wrote:

 others. We see cases fairly often where a misbehaving target has
 confused the HBA enough that we can not bring the device back without
 doing an HBA firmware reset. Despite I/O completing successfully on
 other targets connected to the same HBA.

This would seem to indicate a HBA/driver bug...

 So at some point we do need to give up and escalate to a full HBA
 reset. We would just like to defer that hammer until we have run out of
 other options.

Except that I've seen the linux error recovery cause more problems than 
it
solves on a fairly regular basis. I would rather have a solution designed to
isolate failures, than one that makes a lot of mistakes and causes further
problems (sometimes with other machines). I'm pretty convinced that attempting
everything possible to recover a device when the underlying problem is unknown
is a bad strategy.

I think maybe its a perspective difference. If the device that is 
failing is an
OS disk, then giving up is paramount to crashing the machine. On the other hand,
if the failing device is some shared tape drive in a SAN with a few hundred
alternatives then killing the OS in an attempt to recover that drive is a 
problem.

Maybe, the super aggressive recovery paths should be reserved for 
devices
marked critical to system operation.


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-12 Thread Hannes Reinecke
On 05/10/2013 09:27 PM, Baruch Even wrote:
 On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke h...@suse.de wrote:
 On 05/10/2013 07:51 PM, Baruch Even wrote:

 The error handling I have in mind (admittedly, not fully thought out)
 should work for both FC and SAS. Currently the error recovery
 progresses at the host level regardless of if the errors are on one
 device or all of them, it also stops the IOs on all devices and LUNs.
 It would be nice if that was taken into account. My ideas may be more
 suitable to the environment I work in (enterprise storage devices
 rather than hosts) but I believe the same approach would benefit the
 hosts as well.

 It would be interesting to see what approach the new error handling will
 take.

 So, my general idea is this:

 1) Send command aborts from scsi_times_out(). There is no requirement
on stopping I/O on the host simply because a single command times
out. And as scsi_times_out() is run from a separate thread anyway
we should be able to send ABORT TASK TMFs without a problem
 2) Modify recovery sequence.
One of the major pitfalls of the current scsi_eh is that it
spills over onto unrelated LUNs for higher levels. So for the
new EH we should be using a sequence of
- ABORT TASK
- ABORT TASK SET
- (Terminate I_T nexus)
- (Host reset)
'Terminate I_T nexus' for FibreChannel is equivalent to a LOGO.
'Host reset' is the current host reset function.
 3) Finegrained recovery setting.
There is no need to stop the entire host when doing a recovery;
it should be sufficient to stop I/O to the unit
(LUN, I_T nexus, host) when the error recovery is at the
respective level.
 
 This looks great and much in line with what I'm thinking.
 
 What about not going to the higher level if not everything at that
 level had failed?
 I mean that if at the target not all LUNs failed it will be quite
 troublesome to other LUNs if I-T-Nexus is terminated and that at the
 host level if there are still targets that are functioning it will
 kill them too to reset the host.
 

True. But and the end of the day, we _do_ want to recover the failed
LUN. If we were to disable that faulty LUN and continue running with
the others we won't have a chance of _ever_ recovering that one LUN.

Plus we have to keep in mind that the attempted error recovery did
not succeed for totally unrelated issues (ie sending a ABORT TASK
SET when the link is down). So we basically _have_ to escalate it
to the next level. Even though that will mean to stop I/O to other,
hitherto unaffected instances.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bart Van Assche

On 05/10/13 05:11, Martin K. Petersen wrote:

Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller) before we give up.


Hello Martin,

sdev-max_queue_depth = sdev-queue_depth;
+   sdev-eh_timeout = SCSI_DEFAULT_EH_TIMEOUT;

Have you considered to move the eh_timeout assignment statement to just 
before the transport_configure_device() and slave_configure() calls ? 
That would allow transport drivers and LLD drivers to override the 
default eh_timeout value.


Bart.
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Ewan Milne
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
 Introduce eh_timeout which can be used for error handling purposes. This
 was previously hardcoded to 10 seconds in the SCSI error handling
 code. However, for some fast-fail scenarios it is necessary to be able
 to tune this as it can take several iterations (bus device, target, bus,
 controller) before we give up.
 
 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
 

Thanks for posting this.  It will be very helpful to have this
capability, particularly when alternate paths to the device exist.

Acked-by: Ewan D. Milne emi...@redhat.com


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Hannes Reinecke
On 05/10/2013 02:43 PM, Ewan Milne wrote:
 On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
 Introduce eh_timeout which can be used for error handling purposes. This
 was previously hardcoded to 10 seconds in the SCSI error handling
 code. However, for some fast-fail scenarios it is necessary to be able
 to tune this as it can take several iterations (bus device, target, bus,
 controller) before we give up.

 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com

 
 Thanks for posting this.  It will be very helpful to have this
 capability, particularly when alternate paths to the device exist.
 
 Acked-by: Ewan D. Milne emi...@redhat.com
 
 
Seconded.

Acked-by: Hannes Reinecke h...@suse.de

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bryn M. Reeves

On 05/10/2013 01:43 PM, Ewan Milne wrote:

On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:

Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller) before we give up.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com



Thanks for posting this.  It will be very helpful to have this
capability, particularly when alternate paths to the device exist.


Ack - this is definitely a step forward but until we have better eh 
behaviour for FC the benefits are pretty limited. This is especially the 
case with large LU counts and certain LLDDs since some impose much 
longer timeouts (e.g. lpfc's 60s TMF timeout).


With 5 LUs presented and a single dd driving IO on lpfc I see a time to 
fail an IO of 10-11m when inducing a fabric fault that blackholes all 
traffic to a particular target port on my test setup.


Looking at where the time is being spent in this example there's around 
200s of TUR waits (3m20) and 500s waiting on TMF timeouts (foreach 
device, BDR, foreach target, etc.):


http://paste.fedoraproject.org/11473/81911241/

Environments with 100s of devices can easily spend an hour or more 
waiting for the eh to do its thing.


Regards,
Bryn.



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:

 On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
  Introduce eh_timeout which can be used for error handling purposes. This
  was previously hardcoded to 10 seconds in the SCSI error handling
  code. However, for some fast-fail scenarios it is necessary to be able
  to tune this as it can take several iterations (bus device, target, bus,
  controller) before we give up.
 
  Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
 

 Thanks for posting this.  It will be very helpful to have this
 capability, particularly when alternate paths to the device exist.

 Acked-by: Ewan D. Milne emi...@redhat.com


I would argue that waiting for the eh to timeout before you switch to
another path is most likely to be wrong. If you did the first pass of
error recovery (task abort) and that failed the
path/hba/logical-device is doomed. If you will switch to another path
it will either work (meaning the path/hba were bad) or not (logical
device was the culprit).

Actually reducing the timeouts is probably not a good approach since
it will cause the host to take a more radical approach without waiting
sufficiently for a potential recovery. In addition the more radical
error handlings such as host reset will destroy other paths for
completely unrelated devices/links, from my experience a host reset is
usually not required and the Linux kernel currently reaches to this
big hammer too fast.

Not that I have any qualms about the patch itself, I've been down this
path myself and was proven wrong by real life. Though my experience
was mostly on the SAS network rather than the FC network.

Baruch
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Ewan Milne
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
 On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
 
  On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
   Introduce eh_timeout which can be used for error handling purposes. This
   was previously hardcoded to 10 seconds in the SCSI error handling
   code. However, for some fast-fail scenarios it is necessary to be able
   to tune this as it can take several iterations (bus device, target, bus,
   controller) before we give up.
  
   Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
  
 
  Thanks for posting this.  It will be very helpful to have this
  capability, particularly when alternate paths to the device exist.
 
  Acked-by: Ewan D. Milne emi...@redhat.com
 
 
 I would argue that waiting for the eh to timeout before you switch to
 another path is most likely to be wrong. If you did the first pass of
 error recovery (task abort) and that failed the
 path/hba/logical-device is doomed. If you will switch to another path
 it will either work (meaning the path/hba were bad) or not (logical
 device was the culprit).

It is necessary to either know the disposition of a command or
else wait for a defined amount of time before retrying the command on
another path.  Otherwise you run the risk that the command will
eventually complete on the first path.  So yes, we need to do the abort
(and its timeout).

 
 Actually reducing the timeouts is probably not a good approach since
 it will cause the host to take a more radical approach without waiting
 sufficiently for a potential recovery. In addition the more radical
 error handlings such as host reset will destroy other paths for
 completely unrelated devices/links, from my experience a host reset is
 usually not required and the Linux kernel currently reaches to this
 big hammer too fast.

I believe that Hannes is working on a better error handling algorithm
that e.g. does not cause an emulated bus reset in an FC environment
by resetting all the targets (and affecting I/O to unrelated targets in
the process).

 
 Not that I have any qualms about the patch itself, I've been down this
 path myself and was proven wrong by real life. Though my experience
 was mostly on the SAS network rather than the FC network.
 
 Baruch



--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Hannes Reinecke
On 05/10/2013 04:01 PM, Ewan Milne wrote:
 On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
 On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:

 On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
 Introduce eh_timeout which can be used for error handling purposes. This
 was previously hardcoded to 10 seconds in the SCSI error handling
 code. However, for some fast-fail scenarios it is necessary to be able
 to tune this as it can take several iterations (bus device, target, bus,
 controller) before we give up.

 Signed-off-by: Martin K. Petersen martin.peter...@oracle.com


 Thanks for posting this.  It will be very helpful to have this
 capability, particularly when alternate paths to the device exist.

 Acked-by: Ewan D. Milne emi...@redhat.com


 I would argue that waiting for the eh to timeout before you switch to
 another path is most likely to be wrong. If you did the first pass of
 error recovery (task abort) and that failed the
 path/hba/logical-device is doomed. If you will switch to another path
 it will either work (meaning the path/hba were bad) or not (logical
 device was the culprit).
 
 It is necessary to either know the disposition of a command or
 else wait for a defined amount of time before retrying the command on
 another path.  Otherwise you run the risk that the command will
 eventually complete on the first path.  So yes, we need to do the abort
 (and its timeout).
 
Strictly speaking that's not true.
Yes, we do need to wait for a certain amount of time for the command
completion to come in.

However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take several
minutes to return a completion.

So we have to accept that a command completion might happen in
between the time we take between deciding that a command abort has
to be send and the actual submission of the command abort by the
HBA. Which is totally independent of any command timeout we set.
It's just that a short command timeout increases the likelyhood of
the race to happen; the race itself is always present.


 Actually reducing the timeouts is probably not a good approach since
 it will cause the host to take a more radical approach without waiting
 sufficiently for a potential recovery. In addition the more radical
 error handlings such as host reset will destroy other paths for
 completely unrelated devices/links, from my experience a host reset is
 usually not required and the Linux kernel currently reaches to this
 big hammer too fast.
 
 I believe that Hannes is working on a better error handling algorithm
 that e.g. does not cause an emulated bus reset in an FC environment
 by resetting all the targets (and affecting I/O to unrelated targets in
 the process).
 
Yes, that was the idea.
Which I'll get down to eventually; if only customers wouldn't have
all these obnoxious issues no-one has ever seen...

And there is nothing wrong with reducing the timeout per se. It's
just that the current error recovery strategy isn't well equipped to
handle it :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bryn M. Reeves

On 05/10/2013 03:24 PM, Hannes Reinecke wrote:

However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take several
minutes to return a completion.


That's my understanding too - in a multipath configuration we're 
waiting only for our own fast_io_fail_tmo (if set), which is essentially 
an arbitrary, administrator-controlled interval. You can tune it between 
extremes of rapid fault identification vs. paths twitching at every 
transient glitch.



Yes, that was the idea.
Which I'll get down to eventually; if only customers wouldn't have
all these obnoxious issues no-one has ever seen...


The class I've been looking at is really very easy to reproduce and 
we've seen it at least a half dozen times at different sites with 
different FC switches (so it's certainly not that unusual).


To recreate it artificially you just need a target, a host, and a switch 
that can block RSCN propagation on a per-port basis. I've been using 
brocades with the rscnsupr portcfg attribute.


It's important that you block a port on the switch-target side 
otherwise the host will see a link event which short-circuits everything.


E.g. if you have one port of an array attached to port 1 on a brocade 
the following two commands will set up this scenario:


portcfg rscnsupr 1 --enable
portdisable 1

Regards,
Bryn.

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Martin K. Petersen
 Bart == Bart Van Assche bvanass...@acm.org writes:

Bart Have you considered to move the eh_timeout assignment statement to
Bart just before the transport_configure_device() and slave_configure()
Bart calls ?  That would allow transport drivers and LLD drivers to
Bart override the default eh_timeout value.

I'm ok with that...


scsi: Allow error handling timeout to be specified

Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller) before we give up.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..91adc52 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -44,8 +44,6 @@
 
 static void scsi_eh_done(struct scsi_cmnd *scmd);
 
-#define SENSE_TIMEOUT  (10*HZ)
-
 /*
  * These should *probably* be handled by the host itself.
  * Since it is allowed to sleep, it probably should.
@@ -864,7 +862,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
  */
 static int scsi_request_sense(struct scsi_cmnd *scmd)
 {
-   return scsi_send_eh_cmnd(scmd, NULL, 0, SENSE_TIMEOUT, ~0);
+   return scsi_send_eh_cmnd(scmd, NULL, 0, scmd-device-eh_timeout, ~0);
 }
 
 /**
@@ -965,7 +963,8 @@ static int scsi_eh_tur(struct scsi_cmnd *scmd)
int retry_cnt = 1, rtn;
 
 retry_tur:
-   rtn = scsi_send_eh_cmnd(scmd, tur_command, 6, SENSE_TIMEOUT, 0);
+   rtn = scsi_send_eh_cmnd(scmd, tur_command, 6,
+   scmd-device-eh_timeout, 0);
 
SCSI_LOG_ERROR_RECOVERY(3, printk(%s: scmd %p rtn %x\n,
__func__, scmd, rtn));
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index b9e39e0..9091d6d 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -927,6 +927,8 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned 
char *inq_result,
if (*bflags  BLIST_SKIP_VPD_PAGES)
sdev-skip_vpd_pages = 1;
 
+   sdev-eh_timeout = SCSI_DEFAULT_EH_TIMEOUT;
+
transport_configure_device(sdev-sdev_gendev);
 
if (sdev-host-hostt-slave_configure) {
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 931a7d9..38db310 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -560,6 +560,35 @@ sdev_store_timeout (struct device *dev, struct 
device_attribute *attr,
 static DEVICE_ATTR(timeout, S_IRUGO | S_IWUSR, sdev_show_timeout, 
sdev_store_timeout);
 
 static ssize_t
+sdev_show_eh_timeout (struct device *dev, struct device_attribute *attr, char 
*buf)
+{
+   struct scsi_device *sdev;
+   sdev = to_scsi_device(dev);
+   return snprintf(buf, 20, %u\n, sdev-eh_timeout / HZ);
+}
+
+static ssize_t
+sdev_store_eh_timeout (struct device *dev, struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct scsi_device *sdev;
+   unsigned int eh_timeout;
+   int err;
+
+   if (!capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
+   sdev = to_scsi_device(dev);
+   err = kstrtouint(buf, 10, eh_timeout);
+   if (err)
+   return err;
+   sdev-eh_timeout = eh_timeout * HZ;
+
+   return count;
+}
+static DEVICE_ATTR(eh_timeout, S_IRUGO | S_IWUSR, sdev_show_eh_timeout, 
sdev_store_eh_timeout);
+
+static ssize_t
 store_rescan_field (struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
 {
@@ -723,6 +752,7 @@ static struct attribute *scsi_sdev_attrs[] = {
dev_attr_delete.attr,
dev_attr_state.attr,
dev_attr_timeout.attr,
+   dev_attr_eh_timeout.attr,
dev_attr_iocounterbits.attr,
dev_attr_iorequest_cnt.attr,
dev_attr_iodone_cnt.attr,
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 66216c1..4b87d99 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -10,9 +10,14 @@
 
 #include linux/types.h
 #include linux/scatterlist.h
+#include linux/kernel.h
 
 struct scsi_cmnd;
 
+enum scsi_timeouts {
+   SCSI_DEFAULT_EH_TIMEOUT = 10 * HZ,
+};
+
 /*
  * The maximum number of SG segments that we will put inside a
  * scatterlist (unless chaining is used). Should ideally fit inside a
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index a7f9cba..7eb9b20 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -113,6 +113,7 @@ struct scsi_device {
 * scsi_devinfo.[hc]. For now used only to
 * pass settings from slave_alloc to scsi
 * core. */
+   unsigned int eh_timeout; /* Error handling timeout */
unsigned writeable:1;
unsigned removable:1;
   

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Martin K. Petersen
 Baruch == Baruch Even bar...@ev-en.org writes:

Baruch Actually reducing the timeouts is probably not a good approach
Baruch since it will cause the host to take a more radical approach
Baruch without waiting sufficiently for a potential recovery.

Reducing the eh timeout is a requirement in many clustered setups. We've
been shipping a predecessor to this patch in our kernels for a long
time.


Baruch In addition the more radical error handlings such as host reset
Baruch will destroy other paths for completely unrelated devices/links,
Baruch from my experience a host reset is usually not required and the
Baruch Linux kernel currently reaches to this big hammer too fast.

I'm also working on a patch to add some heuristics to avoid the HBA and
bus resets if I/O is completing successfully on other attached
targets. But that's an orthogonal issue.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Ewan Milne
On Fri, 2013-05-10 at 16:24 +0200, Hannes Reinecke wrote: 
 On 05/10/2013 04:01 PM, Ewan Milne wrote:
  On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
  On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
 
  On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
  Introduce eh_timeout which can be used for error handling purposes. This
  was previously hardcoded to 10 seconds in the SCSI error handling
  code. However, for some fast-fail scenarios it is necessary to be able
  to tune this as it can take several iterations (bus device, target, bus,
  controller) before we give up.
 
  Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
 
 
  Thanks for posting this.  It will be very helpful to have this
  capability, particularly when alternate paths to the device exist.
 
  Acked-by: Ewan D. Milne emi...@redhat.com
 
 
  I would argue that waiting for the eh to timeout before you switch to
  another path is most likely to be wrong. If you did the first pass of
  error recovery (task abort) and that failed the
  path/hba/logical-device is doomed. If you will switch to another path
  it will either work (meaning the path/hba were bad) or not (logical
  device was the culprit).
  
  It is necessary to either know the disposition of a command or
  else wait for a defined amount of time before retrying the command on
  another path.  Otherwise you run the risk that the command will
  eventually complete on the first path.  So yes, we need to do the abort
  (and its timeout).
  
 Strictly speaking that's not true.
 Yes, we do need to wait for a certain amount of time for the command
 completion to come in.
 
 However, this time is only defined _on the initiator_.
 The specification does _NOT_ have any fixed timeout values for _any_
 command. As such it could in theory (and does, if you happen to run
 against certain arrays under certain conditions) take several
 minutes to return a completion.

Granted.  (e.g. in the case of WRITE SAME, it could be a while before
the command completes, and retrying it on another path too quickly,
followed by other WRITE commands could be a disaster).  So the timeout
used for the original command has to be appropriate for the command.
Reducing that timeout and issuing an abort / lun reset / target reset
to try to fail over to another path earlier won't work if the device
never gets the abort / lun reset / target reset and the command is still
executing.

In the case of commands / TMFs issued by the error handling, the timeout
needs to be long enough to account for the delay in the driver / HBA,
switches (i.e. in an FC environment), and the target's device server.
But this time might very well be much shorter than the worst case for
other commands.  So I think allowing EH timeouts to be specified is a
good thing.  They just have to be set properly, the same as timeouts
for other commands (which can already be adjusted, but are overridden
for SYNCHRONIZE CACHE and WRITE SAME).

 
 So we have to accept that a command completion might happen in
 between the time we take between deciding that a command abort has
 to be send and the actual submission of the command abort by the
 HBA. Which is totally independent of any command timeout we set.
 It's just that a short command timeout increases the likelyhood of
 the race to happen; the race itself is always present.
 
 
  Actually reducing the timeouts is probably not a good approach since
  it will cause the host to take a more radical approach without waiting
  sufficiently for a potential recovery. In addition the more radical
  error handlings such as host reset will destroy other paths for
  completely unrelated devices/links, from my experience a host reset is
  usually not required and the Linux kernel currently reaches to this
  big hammer too fast.
  
  I believe that Hannes is working on a better error handling algorithm
  that e.g. does not cause an emulated bus reset in an FC environment
  by resetting all the targets (and affecting I/O to unrelated targets in
  the process).
  
 Yes, that was the idea.
 Which I'll get down to eventually; if only customers wouldn't have
 all these obnoxious issues no-one has ever seen...
 
 And there is nothing wrong with reducing the timeout per se. It's
 just that the current error recovery strategy isn't well equipped to
 handle it :-)
 
 Cheers,
 
 Hannes


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne emi...@redhat.com wrote:
 On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
 On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
 
  On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
   Introduce eh_timeout which can be used for error handling purposes. This
   was previously hardcoded to 10 seconds in the SCSI error handling
   code. However, for some fast-fail scenarios it is necessary to be able
   to tune this as it can take several iterations (bus device, target, bus,
   controller) before we give up.
  
   Signed-off-by: Martin K. Petersen martin.peter...@oracle.com
  
 
  Thanks for posting this.  It will be very helpful to have this
  capability, particularly when alternate paths to the device exist.
 
  Acked-by: Ewan D. Milne emi...@redhat.com


 I would argue that waiting for the eh to timeout before you switch to
 another path is most likely to be wrong. If you did the first pass of
 error recovery (task abort) and that failed the
 path/hba/logical-device is doomed. If you will switch to another path
 it will either work (meaning the path/hba were bad) or not (logical
 device was the culprit).

 It is necessary to either know the disposition of a command or
 else wait for a defined amount of time before retrying the command on
 another path.  Otherwise you run the risk that the command will
 eventually complete on the first path.  So yes, we need to do the abort
 (and its timeout).


 Actually reducing the timeouts is probably not a good approach since
 it will cause the host to take a more radical approach without waiting
 sufficiently for a potential recovery. In addition the more radical
 error handlings such as host reset will destroy other paths for
 completely unrelated devices/links, from my experience a host reset is
 usually not required and the Linux kernel currently reaches to this
 big hammer too fast.

 I believe that Hannes is working on a better error handling algorithm
 that e.g. does not cause an emulated bus reset in an FC environment
 by resetting all the targets (and affecting I/O to unrelated targets in
 the process).

The error handling I have in mind (admittedly, not fully thought out)
should work for both FC and SAS. Currently the error recovery
progresses at the host level regardless of if the errors are on one
device or all of them, it also stops the IOs on all devices and LUNs.
It would be nice if that was taken into account. My ideas may be more
suitable to the environment I work in (enterprise storage devices
rather than hosts) but I believe the same approach would benefit the
hosts as well.

It would be interesting to see what approach the new error handling will take.

Baruch
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 5:53 PM, Martin K. Petersen
martin.peter...@oracle.com wrote:
 Baruch == Baruch Even bar...@ev-en.org writes:

 Baruch Actually reducing the timeouts is probably not a good approach
 Baruch since it will cause the host to take a more radical approach
 Baruch without waiting sufficiently for a potential recovery.

 Reducing the eh timeout is a requirement in many clustered setups. We've
 been shipping a predecessor to this patch in our kernels for a long
 time.

 Baruch In addition the more radical error handlings such as host reset
 Baruch will destroy other paths for completely unrelated devices/links,
 Baruch from my experience a host reset is usually not required and the
 Baruch Linux kernel currently reaches to this big hammer too fast.

 I'm also working on a patch to add some heuristics to avoid the HBA and
 bus resets if I/O is completing successfully on other attached
 targets. But that's an orthogonal issue.

Why?

In my experience (again, SAS based inside a storage device) the
reduced eh timeout is more likely to cause escalated problems rather
than resolve the issue.

I actually find that the higher level should have a small timeout of
its own to do its own recovery work, which normally entails going to
other copies of the data where available and let the device try to get
the IO done if possible. Not sure how applicable it is to the kernel
itself but I do feel it could be relevant.

Baruch
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Hannes Reinecke

On 05/10/2013 07:51 PM, Baruch Even wrote:

On Fri, May 10, 2013 at 5:01 PM, Ewan Milne emi...@redhat.com wrote:

On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:

On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:


On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:

Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller) before we give up.

Signed-off-by: Martin K. Petersen martin.peter...@oracle.com



Thanks for posting this.  It will be very helpful to have this
capability, particularly when alternate paths to the device exist.

Acked-by: Ewan D. Milne emi...@redhat.com



I would argue that waiting for the eh to timeout before you switch to
another path is most likely to be wrong. If you did the first pass of
error recovery (task abort) and that failed the
path/hba/logical-device is doomed. If you will switch to another path
it will either work (meaning the path/hba were bad) or not (logical
device was the culprit).


It is necessary to either know the disposition of a command or
else wait for a defined amount of time before retrying the command on
another path.  Otherwise you run the risk that the command will
eventually complete on the first path.  So yes, we need to do the abort
(and its timeout).



Actually reducing the timeouts is probably not a good approach since
it will cause the host to take a more radical approach without waiting
sufficiently for a potential recovery. In addition the more radical
error handlings such as host reset will destroy other paths for
completely unrelated devices/links, from my experience a host reset is
usually not required and the Linux kernel currently reaches to this
big hammer too fast.


I believe that Hannes is working on a better error handling algorithm
that e.g. does not cause an emulated bus reset in an FC environment
by resetting all the targets (and affecting I/O to unrelated targets in
the process).


The error handling I have in mind (admittedly, not fully thought out)
should work for both FC and SAS. Currently the error recovery
progresses at the host level regardless of if the errors are on one
device or all of them, it also stops the IOs on all devices and LUNs.
It would be nice if that was taken into account. My ideas may be more
suitable to the environment I work in (enterprise storage devices
rather than hosts) but I believe the same approach would benefit the
hosts as well.

It would be interesting to see what approach the new error handling will take.


So, my general idea is this:

1) Send command aborts from scsi_times_out(). There is no requirement
   on stopping I/O on the host simply because a single command times
   out. And as scsi_times_out() is run from a separate thread anyway
   we should be able to send ABORT TASK TMFs without a problem
2) Modify recovery sequence.
   One of the major pitfalls of the current scsi_eh is that it
   spills over onto unrelated LUNs for higher levels. So for the
   new EH we should be using a sequence of
   - ABORT TASK
   - ABORT TASK SET
   - (Terminate I_T nexus)
   - (Host reset)
   'Terminate I_T nexus' for FibreChannel is equivalent to a LOGO.
   'Host reset' is the current host reset function.
3) Finegrained recovery setting.
   There is no need to stop the entire host when doing a recovery;
   it should be sufficient to stop I/O to the unit
   (LUN, I_T nexus, host) when the error recovery is at the
   respective level.

As usual, comments are welcome.

Cheers,

Hannes

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke h...@suse.de wrote:
 On 05/10/2013 07:51 PM, Baruch Even wrote:

 The error handling I have in mind (admittedly, not fully thought out)
 should work for both FC and SAS. Currently the error recovery
 progresses at the host level regardless of if the errors are on one
 device or all of them, it also stops the IOs on all devices and LUNs.
 It would be nice if that was taken into account. My ideas may be more
 suitable to the environment I work in (enterprise storage devices
 rather than hosts) but I believe the same approach would benefit the
 hosts as well.

 It would be interesting to see what approach the new error handling will
 take.

 So, my general idea is this:

 1) Send command aborts from scsi_times_out(). There is no requirement
on stopping I/O on the host simply because a single command times
out. And as scsi_times_out() is run from a separate thread anyway
we should be able to send ABORT TASK TMFs without a problem
 2) Modify recovery sequence.
One of the major pitfalls of the current scsi_eh is that it
spills over onto unrelated LUNs for higher levels. So for the
new EH we should be using a sequence of
- ABORT TASK
- ABORT TASK SET
- (Terminate I_T nexus)
- (Host reset)
'Terminate I_T nexus' for FibreChannel is equivalent to a LOGO.
'Host reset' is the current host reset function.
 3) Finegrained recovery setting.
There is no need to stop the entire host when doing a recovery;
it should be sufficient to stop I/O to the unit
(LUN, I_T nexus, host) when the error recovery is at the
respective level.

This looks great and much in line with what I'm thinking.

What about not going to the higher level if not everything at that
level had failed?
I mean that if at the target not all LUNs failed it will be quite
troublesome to other LUNs if I-T-Nexus is terminated and that at the
host level if there are still targets that are functioning it will
kill them too to reset the host.

Will this replace all of EH or just FC EH?

Baruch
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html