> "Jeremy" == Jeremy Linton writes:
>> others. We see cases fairly often where a misbehaving target has
>> confused the HBA enough that we can not bring the device back without
>> doing an HBA firmware reset. Despite I/O completing successfully on
>> other targets connected to the same HBA.
On 5/13/2013 3:29 PM, Martin K. Petersen wrote:
> others. We see cases fairly often where a misbehaving target has
> confused the HBA enough that we can not bring the device back without
> doing an HBA firmware reset. Despite I/O completing successfully on
> other targets connected to the same HBA
> "Jeremy" == Jeremy Linton writes:
Jeremy> Well, how about generating std inquiry against them if they are
Jeremy> idle and the given HBA has a device in error state? Then you can
Jeremy> make a rough approximation of what has failed, and escalate the
Jeremy> error handling if all the device
On Mon, May 13, 2013 at 6:58 PM, Jeremy Linton wrote:
> On 5/13/2013 10:03 AM, Hannes Reinecke wrote:
>> The other LUNs haven't reported an error. But how do you know whether they
>> are still okay? The other LUNs might simply be idle, and no commands have
>> been send to them.
>
> Well, h
On 5/13/2013 10:03 AM, Hannes Reinecke wrote:
> The other LUNs haven't reported an error. But how do you know whether they
> are still okay? The other LUNs might simply be idle, and no commands have
> been send to them.
Well, how about generating std inquiry against them if they are idle
: [PATCH] scsi: Allow error handling timeout to be specified
>
> On Fri, 2013-05-10 at 16:24 +0200, Hannes Reinecke wrote:
> > On 05/10/2013 04:01 PM, Ewan Milne wrote:
> > > On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
> > >> On Fri, May 10, 2013 at 3:43
On 05/13/2013 04:40 PM, Jeremy Linton wrote:
> On 5/13/2013 12:46 AM, Hannes Reinecke wrote:
>
>> True. But and the end of the day, we _do_ want to recover the failed LUN.
>> If we were to disable that faulty LUN and continue running with the others
>> we won't have a chance of _ever_ recovering t
On 5/13/2013 12:46 AM, Hannes Reinecke wrote:
> True. But and the end of the day, we _do_ want to recover the failed LUN.
> If we were to disable that faulty LUN and continue running with the others
> we won't have a chance of _ever_ recovering that one LUN.
I don't buy this. Especially f
On 05/10/2013 09:27 PM, Baruch Even wrote:
> On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke wrote:
>> On 05/10/2013 07:51 PM, Baruch Even wrote:
>>>
>>> The error handling I have in mind (admittedly, not fully thought out)
>>> should work for both FC and SAS. Currently the error recovery
>>> pr
On Fri, May 10, 2013 at 11:18 PM, Hannes Reinecke wrote:
> On 05/10/2013 07:51 PM, Baruch Even wrote:
>>
>> The error handling I have in mind (admittedly, not fully thought out)
>> should work for both FC and SAS. Currently the error recovery
>> progresses at the host level regardless of if the er
On 05/10/2013 07:51 PM, Baruch Even wrote:
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne wrote:
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can
On Fri, May 10, 2013 at 5:53 PM, Martin K. Petersen
wrote:
>> "Baruch" == Baruch Even writes:
>
> Baruch> Actually reducing the timeouts is probably not a good approach
> Baruch> since it will cause the host to take a more radical approach
> Baruch> without waiting sufficiently for a potentia
On Fri, May 10, 2013 at 5:01 PM, Ewan Milne wrote:
> On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
>> On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote:
>> >
>> > On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
>> > > Introduce eh_timeout which can be used for error handling
On Fri, 2013-05-10 at 16:24 +0200, Hannes Reinecke wrote:
> On 05/10/2013 04:01 PM, Ewan Milne wrote:
> > On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
> >> On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote:
> >>>
> >>> On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
> I
> "Martin" == Martin K Petersen writes:
Martin> I'm also working on a patch to add some heuristics to avoid the
Martin> HBA and bus resets
Or rather: Defer the HBA and bus resets...
Martin> if I/O is completing successfully on other attached targets. But
Martin> that's an orthogonal issue.
> "Baruch" == Baruch Even writes:
Baruch> Actually reducing the timeouts is probably not a good approach
Baruch> since it will cause the host to take a more radical approach
Baruch> without waiting sufficiently for a potential recovery.
Reducing the eh timeout is a requirement in many cluste
> "Bart" == Bart Van Assche writes:
Bart> Have you considered to move the eh_timeout assignment statement to
Bart> just before the transport_configure_device() and slave_configure()
Bart> calls ? That would allow transport drivers and LLD drivers to
Bart> override the default eh_timeout valu
On 05/10/2013 03:24 PM, Hannes Reinecke wrote:
However, this time is only defined _on the initiator_.
The specification does _NOT_ have any fixed timeout values for _any_
command. As such it could in theory (and does, if you happen to run
against certain arrays under certain conditions) take seve
On 05/10/2013 04:01 PM, Ewan Milne wrote:
> On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
>> On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote:
>>>
>>> On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
On Fri, 2013-05-10 at 16:22 +0300, Baruch Even wrote:
> On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote:
> >
> > On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
> > > Introduce eh_timeout which can be used for error handling purposes. This
> > > was previously hardcoded to 10 second
On Fri, May 10, 2013 at 3:43 PM, Ewan Milne wrote:
>
> On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
> > Introduce eh_timeout which can be used for error handling purposes. This
> > was previously hardcoded to 10 seconds in the SCSI error handling
> > code. However, for some fast-fa
On 05/10/2013 01:43 PM, Ewan Milne wrote:
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is nece
On 05/10/2013 02:43 PM, Ewan Milne wrote:
> On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
>> Introduce eh_timeout which can be used for error handling purposes. This
>> was previously hardcoded to 10 seconds in the SCSI error handling
>> code. However, for some fast-fail scenarios it
On Thu, 2013-05-09 at 23:11 -0400, Martin K. Petersen wrote:
> Introduce eh_timeout which can be used for error handling purposes. This
> was previously hardcoded to 10 seconds in the SCSI error handling
> code. However, for some fast-fail scenarios it is necessary to be able
> to tune this as it c
On 05/10/13 05:11, Martin K. Petersen wrote:
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterat
25 matches
Mail list logo