Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Matt Mackall
On Fri, Feb 02, 2007 at 05:58:04PM -0500, Mark Lord wrote: > Matt Mackall wrote: > >.. > >Also worth considering is that spending minutes trying to reread > >damaged sectors is likely to accelerate your death spiral. More data > >may be recoverable if you give up quickly in a first pass, then go >

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Mark Lord
Matt Mackall wrote: .. Also worth considering is that spending minutes trying to reread damaged sectors is likely to accelerate your death spiral. More data may be recoverable if you give up quickly in a first pass, then go back and manually retry damaged bits with smaller I/Os. All good

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Douglas Gilbert
Alan wrote: >> The interesting point of this question is about the typically pattern of >> IO errors. On a read, it is safe to assume that you will have issues >> with some bounded numbers of adjacent sectors. > > Which in theory you can get by asking the drive for the real sector size > from

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Matt Mackall
On Fri, Feb 02, 2007 at 11:06:19AM -0500, Mark Lord wrote: > Alan wrote: > > > >If this is the right strategy for disk recovery for a given type of > >device then this ought to be an automatic strategy. Most end users will > >not have the knowledge to frob about in sysfs, and if the bad sector

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Ric Wheeler
James Bottomley wrote: On Fri, 2007-02-02 at 14:42 +, Alan wrote: The interesting point of this question is about the typically pattern of IO errors. On a read, it is safe to assume that you will have issues with some bounded numbers of adjacent sectors. Which in theory you

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Mark Lord
Alan wrote: If this is the right strategy for disk recovery for a given type of device then this ought to be an automatic strategy. Most end users will not have the knowledge to frob about in sysfs, and if the bad sector hits at the wrong moment a sensible automatic recovery strategy is going

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread James Bottomley
On Fri, 2007-02-02 at 14:42 +, Alan wrote: > > The interesting point of this question is about the typically pattern of > > IO errors. On a read, it is safe to assume that you will have issues > > with some bounded numbers of adjacent sectors. > > Which in theory you can get by asking the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Alan
> your system requirements are, what the system is trying to do (i.e., > when trying to recover a failing but not dead yet disk, IO errors should > be as quick as possible and we should choose an IO scheduler that does > not combine IO's). If this is the right strategy for disk recovery for a

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Alan
> The interesting point of this question is about the typically pattern of > IO errors. On a read, it is safe to assume that you will have issues > with some bounded numbers of adjacent sectors. Which in theory you can get by asking the drive for the real sector size from the ATA7 info. (We

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Ric Wheeler
James Bottomley wrote: On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: I believe you made the first change in response to my prodding at the time, when libata was not returning valid sense data (no LBA) for media errors. The SCSI EH handling of that was rather poor at the time, and so

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Ric Wheeler
James Bottomley wrote: On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: I believe you made the first change in response to my prodding at the time, when libata was not returning valid sense data (no LBA) for media errors. The SCSI EH handling of that was rather poor at the time, and so

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Alan
The interesting point of this question is about the typically pattern of IO errors. On a read, it is safe to assume that you will have issues with some bounded numbers of adjacent sectors. Which in theory you can get by asking the drive for the real sector size from the ATA7 info. (We ought

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Alan
your system requirements are, what the system is trying to do (i.e., when trying to recover a failing but not dead yet disk, IO errors should be as quick as possible and we should choose an IO scheduler that does not combine IO's). If this is the right strategy for disk recovery for a

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread James Bottomley
On Fri, 2007-02-02 at 14:42 +, Alan wrote: The interesting point of this question is about the typically pattern of IO errors. On a read, it is safe to assume that you will have issues with some bounded numbers of adjacent sectors. Which in theory you can get by asking the drive for

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Mark Lord
Alan wrote: If this is the right strategy for disk recovery for a given type of device then this ought to be an automatic strategy. Most end users will not have the knowledge to frob about in sysfs, and if the bad sector hits at the wrong moment a sensible automatic recovery strategy is going

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Ric Wheeler
James Bottomley wrote: On Fri, 2007-02-02 at 14:42 +, Alan wrote: The interesting point of this question is about the typically pattern of IO errors. On a read, it is safe to assume that you will have issues with some bounded numbers of adjacent sectors. Which in theory you

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Matt Mackall
On Fri, Feb 02, 2007 at 11:06:19AM -0500, Mark Lord wrote: Alan wrote: If this is the right strategy for disk recovery for a given type of device then this ought to be an automatic strategy. Most end users will not have the knowledge to frob about in sysfs, and if the bad sector hits at the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Douglas Gilbert
Alan wrote: The interesting point of this question is about the typically pattern of IO errors. On a read, it is safe to assume that you will have issues with some bounded numbers of adjacent sectors. Which in theory you can get by asking the drive for the real sector size from the ATA7

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Mark Lord
Matt Mackall wrote: .. Also worth considering is that spending minutes trying to reread damaged sectors is likely to accelerate your death spiral. More data may be recoverable if you give up quickly in a first pass, then go back and manually retry damaged bits with smaller I/Os. All good

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-02 Thread Matt Mackall
On Fri, Feb 02, 2007 at 05:58:04PM -0500, Mark Lord wrote: Matt Mackall wrote: .. Also worth considering is that spending minutes trying to reread damaged sectors is likely to accelerate your death spiral. More data may be recoverable if you give up quickly in a first pass, then go back and

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread Mark Lord
James Bottomley wrote: On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: .. One thing that could be even better than the patch below, would be to have it perhaps skip the entire bio that includes the failed sector, rather than only the bad sector itself. Er ... define "skip over the bio".

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread James Bottomley
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: > I believe you made the first change in response to my prodding at the time, > when libata was not returning valid sense data (no LBA) for media errors. > The SCSI EH handling of that was rather poor at the time, > and so having it not retry the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread Mark Lord
James Bottomley wrote: On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: Kernels since about 2.6.16 or so have been broken in this regard. They "complete" the good sectors before the error, and then fail the entire remaining portions of the request. What was the commit that introduced the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread Mark Lord
James Bottomley wrote: On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: Kernels since about 2.6.16 or so have been broken in this regard. They complete the good sectors before the error, and then fail the entire remaining portions of the request. What was the commit that introduced the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread James Bottomley
On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: I believe you made the first change in response to my prodding at the time, when libata was not returning valid sense data (no LBA) for media errors. The SCSI EH handling of that was rather poor at the time, and so having it not retry the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-02-01 Thread Mark Lord
James Bottomley wrote: On Thu, 2007-02-01 at 15:02 -0500, Mark Lord wrote: .. One thing that could be even better than the patch below, would be to have it perhaps skip the entire bio that includes the failed sector, rather than only the bad sector itself. Er ... define skip over the bio. A

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
James Bottomley wrote: On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote: Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the "with retry" ATA opcodes for this). This depends on the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread James Bottomley
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote: > Alan wrote: > >> When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, > >> as the drive itself has already done internal retries (libata uses the > >> "with retry" ATA opcodes for this). > > > > This depends on the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the "with retry" ATA opcodes for this). This depends on the firmware. Some of the "raid firmware" drives don't appear to do retries in

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Ric Wheeler
Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the "with retry" ATA opcodes for this). This depends on the firmware. Some of the "raid firmware" drives don't appear to do retries in

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Douglas Gilbert wrote: Ric, Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added command support to flag a block as "uncorrectable". There is no need to send bad "long" data to it and suppress the disk's automatic re-allocation logic. That'll be useful in a couple of years, once drives

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Douglas Gilbert
Ric Wheeler wrote: > > > Jeff Garzik wrote: >> Mark Lord wrote: >>> Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread James Bottomley
On Wed, 2007-01-31 at 10:13 -0500, Mark Lord wrote: > James Bottomley wrote: > > > > For the MD case, this is what REQ_FAILFAST is for. > I cannot find where SCSI honours that flag. James? Er, it's in scsi_error.c:scsi_decide_disposition(): maybe_retry: /* we requeue for retry

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Mark Lord wrote: James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? Scratch that thought.. SCSI honours it in scsi_end_request(). But I'm not certain that the block layer handles it correctly, at least not in the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? And for that matter, even when I patch SCSI so that it *does* honour it, I don't actually see the flag making it into the SCSI layer from above. And I don't see

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Alan
> When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, > as the drive itself has already done internal retries (libata uses the > "with retry" ATA opcodes for this). This depends on the firmware. Some of the "raid firmware" drives don't appear to do retries in firmware. >

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Ric Wheeler wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. (note: libata does *not* generate retries for medium errors; the looping is driven by the SCSI mid-layer code). It really beats the alternative

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Ric Wheeler
Jeff Garzik wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Jeff Garzik
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Jeff Garzik
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Ric Wheeler
Jeff Garzik wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Ric Wheeler wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. (note: libata does *not* generate retries for medium errors; the looping is driven by the SCSI mid-layer code). It really beats the alternative

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Alan
When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the with retry ATA opcodes for this). This depends on the firmware. Some of the raid firmware drives don't appear to do retries in firmware. But

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? And for that matter, even when I patch SCSI so that it *does* honour it, I don't actually see the flag making it into the SCSI layer from above. And I don't see

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Mark Lord wrote: James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? Scratch that thought.. SCSI honours it in scsi_end_request(). But I'm not certain that the block layer handles it correctly, at least not in the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread James Bottomley
On Wed, 2007-01-31 at 10:13 -0500, Mark Lord wrote: James Bottomley wrote: For the MD case, this is what REQ_FAILFAST is for. I cannot find where SCSI honours that flag. James? Er, it's in scsi_error.c:scsi_decide_disposition(): maybe_retry: /* we requeue for retry because

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Douglas Gilbert
Ric Wheeler wrote: Jeff Garzik wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure.

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Douglas Gilbert wrote: Ric, Both ATA (ATA8-ACS) and SCSI (SBC-3) have recently added command support to flag a block as uncorrectable. There is no need to send bad long data to it and suppress the disk's automatic re-allocation logic. That'll be useful in a couple of years, once drives that

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Ric Wheeler
Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the with retry ATA opcodes for this). This depends on the firmware. Some of the raid firmware drives don't appear to do retries in

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the with retry ATA opcodes for this). This depends on the firmware. Some of the raid firmware drives don't appear to do retries in firmware.

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread James Bottomley
On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote: Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the with retry ATA opcodes for this). This depends on the firmware. Some of the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-31 Thread Mark Lord
James Bottomley wrote: On Wed, 2007-01-31 at 12:57 -0500, Mark Lord wrote: Alan wrote: When libata reports a MEDIUM_ERROR to us, we *know* it's non-recoverable, as the drive itself has already done internal retries (libata uses the with retry ATA opcodes for this). This depends on the

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Douglas Gilbert
Ric Wheeler wrote: > > > Mark Lord wrote: > >> Eric D. Mudama wrote: >> >>> >>> Actually, it's possibly worse, since each failure in libata will >>> generate 3-4 retries. With existing ATA error recovery in the >>> drives, that's about 3 seconds per retry on average, or 12 seconds >>> per

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread James Bottomley
On Tue, 2007-01-30 at 22:20 -0500, Ric Wheeler wrote: > Mark Lord wrote: > > The number of retries is an entirely separate issue. > > If we really care about it, then we should fix SD_MAX_RETRIES. > > > > The current value of 5 is *way* too high. It should be zero or one. > > > > Cheers > > > I

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Ric Wheeler
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past the error to

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
James Bottomley wrote: First off, please send SCSI patches to the SCSI list: Fixed already, thanks! This patch fixes the behaviour to be similar to what we had originally. When a bad sector is encounted, SCSI will now work around it again, failing *only* the bad sector itself. Erm, but

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread James Bottomley
First off, please send SCSI patches to the SCSI list: On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: > In ancient kernels, the SCSI disk code used to continue after > encountering a MEDIUM_ERROR. It would "complete" the good > sectors before the error, fail the bad sector/block, and then >

[PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
In ancient kernels, the SCSI disk code used to continue after encountering a MEDIUM_ERROR. It would "complete" the good sectors before the error, fail the bad sector/block, and then continue with the rest of the request. Kernels since about 2.6.16 or so have been broken in this regard. They

[PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
In ancient kernels, the SCSI disk code used to continue after encountering a MEDIUM_ERROR. It would complete the good sectors before the error, fail the bad sector/block, and then continue with the rest of the request. Kernels since about 2.6.16 or so have been broken in this regard. They

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread James Bottomley
First off, please send SCSI patches to the SCSI list: linux-scsi@vger.kernel.org On Tue, 2007-01-30 at 19:47 -0500, Mark Lord wrote: In ancient kernels, the SCSI disk code used to continue after encountering a MEDIUM_ERROR. It would complete the good sectors before the error, fail the bad

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
James Bottomley wrote: First off, please send SCSI patches to the SCSI list: linux-scsi@vger.kernel.org Fixed already, thanks! This patch fixes the behaviour to be similar to what we had originally. When a bad sector is encounted, SCSI will now work around it again, failing *only* the bad

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Mark Lord
Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks past the error to

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Ric Wheeler
Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by the number of blocks

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread James Bottomley
On Tue, 2007-01-30 at 22:20 -0500, Ric Wheeler wrote: Mark Lord wrote: The number of retries is an entirely separate issue. If we really care about it, then we should fix SD_MAX_RETRIES. The current value of 5 is *way* too high. It should be zero or one. Cheers I think that drives

Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

2007-01-30 Thread Douglas Gilbert
Ric Wheeler wrote: Mark Lord wrote: Eric D. Mudama wrote: Actually, it's possibly worse, since each failure in libata will generate 3-4 retries. With existing ATA error recovery in the drives, that's about 3 seconds per retry on average, or 12 seconds per failure. Multiply that by