On Thu, 22 Dec 2011, Gareth de Vaux wrote:

On Thu 2011-12-22 (10:09), Bob Friesenhahn wrote:
One of your disks failed to return a sector.  Due to redundancy, the
original data was recreated from the remaining disks.  This is normal
good behavior (other than the disk failing to read the sector).

So those checksum counts were historical?

Yes. When a problem is detected and there is enough redundancy to resolve the problem, then the bad data block is not used any more and the corrected data is relocated somewhere else on the drive (I not sure if zfs does this, or if it requests that drive firmware do this). The count reflects that problems were found but does not reflect if a correction was made. Other text describes any continuing issues which were found.

I did a scrub and what worries me is that it came back with 0 issues
when clearly there were considering what happens when I kick 1 disk

Zero issues seems like a good thing. Resilvering the disk in the pool performed most of the functions that scrub does so it should not surprise that there are no more issues remaining.

Similarly I've seen that 'zpool clear' just sets you up for problems
down the line. It just pretends there aren't errors.

As far as I am aware, the data cleared by 'zpool clear' is for administrators to confirm they are aware the issue occured. A good administrator will consider any implications. The decision made for a high capacity SATA drive should likely be different than that made for a low-capacity enterprise SAS drive. Studies by Google suggest that SATA drives will experience many more block errors than enterprise SAS drives, and that higher error rates should be allowed from SATA drives than SAS drives when considering to replace the disk. Drives which experience continually more block failures are doomed to fail.

Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
zfs-discuss mailing list

Reply via email to