On Fri, Feb 02, 2007 at 05:58:04PM -0500, Mark Lord wrote:
> Matt Mackall wrote:
> >..
> >Also worth considering is that spending minutes trying to reread
> >damaged sectors is likely to accelerate your death spiral. More data
> >may be recoverable if you give up quickly in a first pass, then go
>
Matt Mackall wrote:
..
Also worth considering is that spending minutes trying to reread
damaged sectors is likely to accelerate your death spiral. More data
may be recoverable if you give up quickly in a first pass, then go
back and manually retry damaged bits with smaller I/Os.
All good
Alan wrote:
>> The interesting point of this question is about the typically pattern of
>> IO errors. On a read, it is safe to assume that you will have issues
>> with some bounded numbers of adjacent sectors.
>
> Which in theory you can get by asking the drive for the real sector size
> from
On Fri, Feb 02, 2007 at 11:06:19AM -0500, Mark Lord wrote:
> Alan wrote:
> >
> >If this is the right strategy for disk recovery for a given type of
> >device then this ought to be an automatic strategy. Most end users will
> >not have the knowledge to frob about in sysfs, and if the bad sector
James Bottomley wrote:
On Fri, 2007-02-02 at 14:42 +, Alan wrote:
The interesting point of this question is about the typically pattern of
IO errors. On a read, it is safe to assume that you will have issues
with some bounded numbers of adjacent sectors.
Which in theory you
Alan wrote:
If this is the right strategy for disk recovery for a given type of
device then this ought to be an automatic strategy. Most end users will
not have the knowledge to frob about in sysfs, and if the bad sector hits
at the wrong moment a sensible automatic recovery strategy is going
On Fri, 2007-02-02 at 14:42 +, Alan wrote:
> > The interesting point of this question is about the typically pattern of
> > IO errors. On a read, it is safe to assume that you will have issues
> > with some bounded numbers of adjacent sectors.
>
> Which in theory you can get by asking the
> your system requirements are, what the system is trying to do (i.e.,
> when trying to recover a failing but not dead yet disk, IO errors should
> be as quick as possible and we should choose an IO scheduler that does
> not combine IO's).
If this is the right strategy for disk recovery for a
> The interesting point of this question is about the typically pattern of
> IO errors. On a read, it is safe to assume that you will have issues
> with some bounded numbers of adjacent sectors.
Which in theory you can get by asking the drive for the real sector size
from the ATA7 info. (We
James Bottomley wrote:
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
I believe you made the first change in response to my prodding at the time,
when libata was not returning valid sense data (no LBA) for media errors.
The SCSI EH handling of that was rather poor at the time,
and so
James Bottomley wrote:
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
I believe you made the first change in response to my prodding at the time,
when libata was not returning valid sense data (no LBA) for media errors.
The SCSI EH handling of that was rather poor at the time,
and so
The interesting point of this question is about the typically pattern of
IO errors. On a read, it is safe to assume that you will have issues
with some bounded numbers of adjacent sectors.
Which in theory you can get by asking the drive for the real sector size
from the ATA7 info. (We ought
your system requirements are, what the system is trying to do (i.e.,
when trying to recover a failing but not dead yet disk, IO errors should
be as quick as possible and we should choose an IO scheduler that does
not combine IO's).
If this is the right strategy for disk recovery for a
On Fri, 2007-02-02 at 14:42 +, Alan wrote:
The interesting point of this question is about the typically pattern of
IO errors. On a read, it is safe to assume that you will have issues
with some bounded numbers of adjacent sectors.
Which in theory you can get by asking the drive for
Alan wrote:
If this is the right strategy for disk recovery for a given type of
device then this ought to be an automatic strategy. Most end users will
not have the knowledge to frob about in sysfs, and if the bad sector hits
at the wrong moment a sensible automatic recovery strategy is going
James Bottomley wrote:
On Fri, 2007-02-02 at 14:42 +, Alan wrote:
The interesting point of this question is about the typically pattern of
IO errors. On a read, it is safe to assume that you will have issues
with some bounded numbers of adjacent sectors.
Which in theory you
On Fri, Feb 02, 2007 at 11:06:19AM -0500, Mark Lord wrote:
Alan wrote:
If this is the right strategy for disk recovery for a given type of
device then this ought to be an automatic strategy. Most end users will
not have the knowledge to frob about in sysfs, and if the bad sector hits
at the
Alan wrote:
The interesting point of this question is about the typically pattern of
IO errors. On a read, it is safe to assume that you will have issues
with some bounded numbers of adjacent sectors.
Which in theory you can get by asking the drive for the real sector size
from the ATA7
Matt Mackall wrote:
..
Also worth considering is that spending minutes trying to reread
damaged sectors is likely to accelerate your death spiral. More data
may be recoverable if you give up quickly in a first pass, then go
back and manually retry damaged bits with smaller I/Os.
All good
On Fri, Feb 02, 2007 at 05:58:04PM -0500, Mark Lord wrote:
Matt Mackall wrote:
..
Also worth considering is that spending minutes trying to reread
damaged sectors is likely to accelerate your death spiral. More data
may be recoverable if you give up quickly in a first pass, then go
back and
James Bottomley wrote:
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
..
One thing that could be even better than the patch below,
would be to have it perhaps skip the entire bio that includes
the failed sector, rather than only the bad sector itself.
Er ... define "skip over the bio".
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
> I believe you made the first change in response to my prodding at the time,
> when libata was not returning valid sense data (no LBA) for media errors.
> The SCSI EH handling of that was rather poor at the time,
> and so having it not retry the
James Bottomley wrote:
On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote:
Kernels since about 2.6.16 or so have been broken in this regard.
They "complete" the good sectors before the error,
and then fail the entire remaining portions of the request.
What was the commit that introduced the
James Bottomley wrote:
On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote:
Kernels since about 2.6.16 or so have been broken in this regard.
They complete the good sectors before the error,
and then fail the entire remaining portions of the request.
What was the commit that introduced the
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
I believe you made the first change in response to my prodding at the time,
when libata was not returning valid sense data (no LBA) for media errors.
The SCSI EH handling of that was rather poor at the time,
and so having it not retry the
James Bottomley wrote:
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote:
..
One thing that could be even better than the patch below,
would be to have it perhaps skip the entire bio that includes
the failed sector, rather than only the bad sector itself.
Er ... define skip over the bio. A
James Bottomley wrote:
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote:
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
"with retry" ATA opcodes for this).
This depends on the
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote:
> Alan wrote:
> >> When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
> >> as the drive itself has already done internal retries (libata uses the
> >> "with retry" ATA opcodes for this).
> >
> > This depends on the
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
"with retry" ATA opcodes for this).
This depends on the firmware. Some of the "raid firmware" drives don't
appear to do retries in
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
"with retry" ATA opcodes for this).
This depends on the firmware. Some of the "raid firmware" drives don't
appear to do retries in
Douglas Gilbert wrote:
Ric,
Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added
command support to flag a block as "uncorrectable". There
is no need to send bad "long" data to it and suppress the
disk's automatic re-allocation logic.
That'll be useful in a couple of years, once drives
Ric Wheeler wrote:
>
>
> Jeff Garzik wrote:
>> Mark Lord wrote:
>>> Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12
On Wed, 2007-01-31 at 10:13 -0500, Mark Lord wrote:
> James Bottomley wrote:
> >
> > For the MD case, this is what REQ_FAILFAST is for.
> I cannot find where SCSI honours that flag. James?
Er, it's in scsi_error.c:scsi_decide_disposition():
maybe_retry:
/* we requeue for retry
Mark Lord wrote:
James Bottomley wrote:
For the MD case, this is what REQ_FAILFAST is for.
I cannot find where SCSI honours that flag. James?
Scratch that thought.. SCSI honours it in scsi_end_request().
But I'm not certain that the block layer handles it correctly,
at least not in the
James Bottomley wrote:
For the MD case, this is what REQ_FAILFAST is for.
I cannot find where SCSI honours that flag. James?
And for that matter, even when I patch SCSI so that it *does* honour it,
I don't actually see the flag making it into the SCSI layer from above.
And I don't see
> When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
> as the drive itself has already done internal retries (libata uses the
> "with retry" ATA opcodes for this).
This depends on the firmware. Some of the "raid firmware" drives don't
appear to do retries in firmware.
>
Ric Wheeler wrote:
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries.
(note: libata does *not* generate retries for medium errors;
the looping is driven by the SCSI mid-layer code).
It really beats the alternative
Jeff Garzik wrote:
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12 seconds
per failure. Multiply that by the
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the drives,
that's about 3 seconds per retry on average, or 12 seconds per
failure. Multiply that by the number of blocks past
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the drives,
that's about 3 seconds per retry on average, or 12 seconds per
failure. Multiply that by the number of blocks past
Jeff Garzik wrote:
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12 seconds
per failure. Multiply that by the
Ric Wheeler wrote:
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries.
(note: libata does *not* generate retries for medium errors;
the looping is driven by the SCSI mid-layer code).
It really beats the alternative
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
with retry ATA opcodes for this).
This depends on the firmware. Some of the raid firmware drives don't
appear to do retries in firmware.
But
James Bottomley wrote:
For the MD case, this is what REQ_FAILFAST is for.
I cannot find where SCSI honours that flag. James?
And for that matter, even when I patch SCSI so that it *does* honour it,
I don't actually see the flag making it into the SCSI layer from above.
And I don't see
Mark Lord wrote:
James Bottomley wrote:
For the MD case, this is what REQ_FAILFAST is for.
I cannot find where SCSI honours that flag. James?
Scratch that thought.. SCSI honours it in scsi_end_request().
But I'm not certain that the block layer handles it correctly,
at least not in the
On Wed, 2007-01-31 at 10:13 -0500, Mark Lord wrote:
James Bottomley wrote:
For the MD case, this is what REQ_FAILFAST is for.
I cannot find where SCSI honours that flag. James?
Er, it's in scsi_error.c:scsi_decide_disposition():
maybe_retry:
/* we requeue for retry because
Ric Wheeler wrote:
Jeff Garzik wrote:
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12 seconds
per failure.
Douglas Gilbert wrote:
Ric,
Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added
command support to flag a block as uncorrectable. There
is no need to send bad long data to it and suppress the
disk's automatic re-allocation logic.
That'll be useful in a couple of years, once drives that
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
with retry ATA opcodes for this).
This depends on the firmware. Some of the raid firmware drives don't
appear to do retries in
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
with retry ATA opcodes for this).
This depends on the firmware. Some of the raid firmware drives don't
appear to do retries in firmware.
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote:
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
with retry ATA opcodes for this).
This depends on the firmware. Some of the
James Bottomley wrote:
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote:
Alan wrote:
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable,
as the drive itself has already done internal retries (libata uses the
with retry ATA opcodes for this).
This depends on the
Ric Wheeler wrote:
>
>
> Mark Lord wrote:
>
>> Eric D. Mudama wrote:
>>
>>>
>>> Actually, it's possibly worse, since each failure in libata will
>>> generate 3-4 retries. With existing ATA error recovery in the
>>> drives, that's about 3 seconds per retry on average, or 12 seconds
>>> per
On Tue, 2007-01-30 at 22:20 -0500, Ric Wheeler wrote:
> Mark Lord wrote:
> > The number of retries is an entirely separate issue.
> > If we really care about it, then we should fix SD_MAX_RETRIES.
> >
> > The current value of 5 is *way* too high. It should be zero or one.
> >
> > Cheers
> >
> I
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12 seconds
per failure. Multiply that by the number of blocks
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the drives,
that's about 3 seconds per retry on average, or 12 seconds per failure.
Multiply that by the number of blocks past the error to
James Bottomley wrote:
First off, please send SCSI patches to the SCSI list:
Fixed already, thanks!
This patch fixes the behaviour to be similar to what we had originally.
When a bad sector is encounted, SCSI will now work around it again,
failing *only* the bad sector itself.
Erm, but
First off, please send SCSI patches to the SCSI list:
On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote:
> In ancient kernels, the SCSI disk code used to continue after
> encountering a MEDIUM_ERROR. It would "complete" the good
> sectors before the error, fail the bad sector/block, and then
>
In ancient kernels, the SCSI disk code used to continue after
encountering a MEDIUM_ERROR. It would "complete" the good
sectors before the error, fail the bad sector/block, and then
continue with the rest of the request.
Kernels since about 2.6.16 or so have been broken in this regard.
They
In ancient kernels, the SCSI disk code used to continue after
encountering a MEDIUM_ERROR. It would complete the good
sectors before the error, fail the bad sector/block, and then
continue with the rest of the request.
Kernels since about 2.6.16 or so have been broken in this regard.
They
First off, please send SCSI patches to the SCSI list:
linux-scsi@vger.kernel.org
On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote:
In ancient kernels, the SCSI disk code used to continue after
encountering a MEDIUM_ERROR. It would complete the good
sectors before the error, fail the bad
James Bottomley wrote:
First off, please send SCSI patches to the SCSI list:
linux-scsi@vger.kernel.org
Fixed already, thanks!
This patch fixes the behaviour to be similar to what we had originally.
When a bad sector is encounted, SCSI will now work around it again,
failing *only* the bad
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the drives,
that's about 3 seconds per retry on average, or 12 seconds per failure.
Multiply that by the number of blocks past the error to
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12 seconds
per failure. Multiply that by the number of blocks
On Tue, 2007-01-30 at 22:20 -0500, Ric Wheeler wrote:
Mark Lord wrote:
The number of retries is an entirely separate issue.
If we really care about it, then we should fix SD_MAX_RETRIES.
The current value of 5 is *way* too high. It should be zero or one.
Cheers
I think that drives
Ric Wheeler wrote:
Mark Lord wrote:
Eric D. Mudama wrote:
Actually, it's possibly worse, since each failure in libata will
generate 3-4 retries. With existing ATA error recovery in the
drives, that's about 3 seconds per retry on average, or 12 seconds
per failure. Multiply that by
66 matches
Mail list logo