Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Jeremy Linton
On 5/13/2013 12:46 AM, Hannes Reinecke wrote: True. But and the end of the day, we _do_ want to recover the failed LUN. If we were to disable that faulty LUN and continue running with the others we won't have a chance of _ever_ recovering that one LUN. I don't buy this. Especially for

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Hannes Reinecke
On 05/13/2013 04:40 PM, Jeremy Linton wrote: On 5/13/2013 12:46 AM, Hannes Reinecke wrote: True. But and the end of the day, we _do_ want to recover the failed LUN. If we were to disable that faulty LUN and continue running with the others we won't have a chance of _ever_ recovering that one

RE: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Elliott, Robert (Server Storage)
-Original Message- From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi- ow...@vger.kernel.org] On Behalf Of Ewan Milne Sent: Friday, 10 May, 2013 11:59 AM To: Hannes Reinecke Cc: Baruch Even; Martin K. Petersen; linux-scsi; michaelc Subject: Re: [PATCH] scsi: Allow error

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Jeremy Linton
On 5/13/2013 10:03 AM, Hannes Reinecke wrote: The other LUNs haven't reported an error. But how do you know whether they are still okay? The other LUNs might simply be idle, and no commands have been send to them. Well, how about generating std inquiry against them if they are idle

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Baruch Even
On Mon, May 13, 2013 at 6:58 PM, Jeremy Linton jlin...@tributary.com wrote: On 5/13/2013 10:03 AM, Hannes Reinecke wrote: The other LUNs haven't reported an error. But how do you know whether they are still okay? The other LUNs might simply be idle, and no commands have been send to them.

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Martin K. Petersen
Jeremy == Jeremy Linton jlin...@tributary.com writes: Jeremy Well, how about generating std inquiry against them if they are Jeremy idle and the given HBA has a device in error state? Then you can Jeremy make a rough approximation of what has failed, and escalate the Jeremy error handling if all

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-13 Thread Jeremy Linton
On 5/13/2013 3:29 PM, Martin K. Petersen wrote: others. We see cases fairly often where a misbehaving target has confused the HBA enough that we can not bring the device back without doing an HBA firmware reset. Despite I/O completing successfully on other targets connected to the same HBA.

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-12 Thread Hannes Reinecke
On 05/10/2013 09:27 PM, Baruch Even wrote: On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke h...@suse.de wrote: On 05/10/2013 07:51 PM, Baruch Even wrote: The error handling I have in mind (admittedly, not fully thought out) should work for both FC and SAS. Currently the error recovery

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bart Van Assche
On 05/10/13 05:11, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for some fast-fail scenarios it is necessary to be able to tune this as it can take several

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Ewan Milne
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for some fast-fail scenarios it is necessary to be able to tune this as it can

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Hannes Reinecke
On 05/10/2013 02:43 PM, Ewan Milne wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for some fast-fail scenarios it is

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bryn M. Reeves
On 05/10/2013 01:43 PM, Ewan Milne wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for some fast-fail scenarios it is

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Ewan Milne
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote: On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Hannes Reinecke
On 05/10/2013 04:01 PM, Ewan Milne wrote: On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote: On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be used for error handling purposes.

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Bryn M. Reeves
On 05/10/2013 03:24 PM, Hannes Reinecke wrote: However, this time is only defined _on the initiator_. The specification does _NOT_ have any fixed timeout values for _any_ command. As such it could in theory (and does, if you happen to run against certain arrays under certain conditions) take

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Martin K. Petersen
Bart == Bart Van Assche bvanass...@acm.org writes: Bart Have you considered to move the eh_timeout assignment statement to Bart just before the transport_configure_device() and slave_configure() Bart calls ? That would allow transport drivers and LLD drivers to Bart override the default

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Martin K. Petersen
Baruch == Baruch Even bar...@ev-en.org writes: Baruch Actually reducing the timeouts is probably not a good approach Baruch since it will cause the host to take a more radical approach Baruch without waiting sufficiently for a potential recovery. Reducing the eh timeout is a requirement in many

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Ewan Milne
On Fri, 2013-05-10 at 16:24 +0200, Hannes Reinecke wrote: On 05/10/2013 04:01 PM, Ewan Milne wrote: On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote: On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne emi...@redhat.com wrote: On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote: On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote: Introduce eh_timeout which can be

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 5:53 PM, Martin K. Petersen martin.peter...@oracle.com wrote: Baruch == Baruch Even bar...@ev-en.org writes: Baruch Actually reducing the timeouts is probably not a good approach Baruch since it will cause the host to take a more radical approach Baruch without waiting

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Hannes Reinecke
On 05/10/2013 07:51 PM, Baruch Even wrote: On Fri, May 10, 2013 at 5:01 PM, Ewan Milne emi...@redhat.com wrote: On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote: On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote: On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen

Re: [PATCH] scsi: Allow error handling timeout to be specified

2013-05-10 Thread Baruch Even
On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke h...@suse.de wrote: On 05/10/2013 07:51 PM, Baruch Even wrote: The error handling I have in mind (admittedly, not fully thought out) should work for both FC and SAS. Currently the error recovery progresses at the host level regardless of if

[PATCH] scsi: Allow error handling timeout to be specified

2013-05-09 Thread Martin K. Petersen
Introduce eh_timeout which can be used for error handling purposes. This was previously hardcoded to 10 seconds in the SCSI error handling code. However, for some fast-fail scenarios it is necessary to be able to tune this as it can take several iterations (bus device, target, bus, controller)