On 5/13/2013 12:46 AM, Hannes Reinecke wrote:
True. But and the end of the day, we _do_ want to recover the failed LUN.
If we were to disable that faulty LUN and continue running with the others
we won't have a chance of _ever_ recovering that one LUN.
I don't buy this. Especially for
On 05/13/2013 04:40 PM, Jeremy Linton wrote:
On 5/13/2013 12:46 AM, Hannes Reinecke wrote:
True. But and the end of the day, we _do_ want to recover the failed LUN.
If we were to disable that faulty LUN and continue running with the others
we won't have a chance of _ever_ recovering that one
-Original Message-
From: linux-scsi-ow...@vger.kernel.org [mailto:linux-scsi-
ow...@vger.kernel.org] On Behalf Of Ewan Milne
Sent: Friday, 10 May, 2013 11:59 AM
To: Hannes Reinecke
Cc: Baruch Even; Martin K. Petersen; linux-scsi; michaelc
Subject: Re: [PATCH] scsi: Allow error
On 5/13/2013 10:03 AM, Hannes Reinecke wrote:
The other LUNs haven't reported an error. But how do you know whether they
are still okay? The other LUNs might simply be idle, and no commands have
been send to them.
Well, how about generating std inquiry against them if they are idle
On Mon, May 13, 2013 at 6:58 PM, Jeremy Linton jlin...@tributary.com wrote:
On 5/13/2013 10:03 AM, Hannes Reinecke wrote:
The other LUNs haven't reported an error. But how do you know whether they
are still okay? The other LUNs might simply be idle, and no commands have
been send to them.
Jeremy == Jeremy Linton jlin...@tributary.com writes:
Jeremy Well, how about generating std inquiry against them if they are
Jeremy idle and the given HBA has a device in error state? Then you can
Jeremy make a rough approximation of what has failed, and escalate the
Jeremy error handling if all
On 5/13/2013 3:29 PM, Martin K. Petersen wrote:
others. We see cases fairly often where a misbehaving target has
confused the HBA enough that we can not bring the device back without
doing an HBA firmware reset. Despite I/O completing successfully on
other targets connected to the same HBA.
On 05/10/2013 09:27 PM, Baruch Even wrote:
On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke h...@suse.de wrote:
On 05/10/2013 07:51 PM, Baruch Even wrote:
The error handling I have in mind (admittedly, not fully thought out)
should work for both FC and SAS. Currently the error recovery
On 05/10/13 05:11, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can
On 05/10/2013 02:43 PM, Ewan Milne wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is
On 05/10/2013 01:43 PM, Ewan Milne wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10
On 05/10/2013 04:01 PM, Ewan Milne wrote:
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes.
On 05/10/2013 03:24 PM, Hannes Reinecke wrote:
However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take
Bart == Bart Van Assche bvanass...@acm.org writes:
Bart Have you considered to move the eh_timeout assignment statement to
Bart just before the transport_configure_device() and slave_configure()
Bart calls ? That would allow transport drivers and LLD drivers to
Bart override the default
Baruch == Baruch Even bar...@ev-en.org writes:
Baruch Actually reducing the timeouts is probably not a good approach
Baruch since it will cause the host to take a more radical approach
Baruch without waiting sufficiently for a potential recovery.
Reducing the eh timeout is a requirement in many
On Fri, 2013-05-10 at 16:24 +0200, Hannes Reinecke wrote:
On 05/10/2013 04:01 PM, Ewan Milne wrote:
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne emi...@redhat.com wrote:
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be
On Fri, May 10, 2013 at 5:53 PM, Martin K. Petersen
martin.peter...@oracle.com wrote:
Baruch == Baruch Even bar...@ev-en.org writes:
Baruch Actually reducing the timeouts is probably not a good approach
Baruch since it will cause the host to take a more radical approach
Baruch without waiting
On 05/10/2013 07:51 PM, Baruch Even wrote:
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne emi...@redhat.com wrote:
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne emi...@redhat.com wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen
On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke h...@suse.de wrote:
On 05/10/2013 07:51 PM, Baruch Even wrote:
The error handling I have in mind (admittedly, not fully thought out)
should work for both FC and SAS. Currently the error recovery
progresses at the host level regardless of if
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller)
24 matches
Mail list logo