2012-05-24 18:55, Richard Elling wrote:
This is a big assumption -- that the disk will operate normally, even for data it cannot read. In my experience, this assumption is not valid for the majority of HDD failure modes. Also, in the case of consumer-grade disks, a single sector media error could take a very long time to retry/fail.
Indeed it is, and I've covered this in the thread earlier - the bulk copying phase ("DD-phase") should monitor its real progress, and if it detects lags in comparison to the average or expected speeds (expected = some tuning variable i.e. 50Mb/s), the process should skip over some (arbitrary) range of sectors and go on from another location (such skipped sectors are in danger indeed, until the scrub-phase detects and reconstructs them) or fall back to the original resilver method completely. That was already described in some detail I thought of at the time of the posting, and I can't add much to that yet. From what I've seen with faulty sectors is that they are usually either single errors or a "scratched" range which can be worked around with i.e. partitioning for legacy FSes (if the SMART relocation doesn't deal with them properly for any reason), while most of the rest of the disk is okay. Retries may be lengthy, ranging from several seconds up to a minute, but they are often constrained in a few locations and *may* add little delay in the overall scheme of things. If the delay is more than acceptable and/or we can't find a "working location" on the source disk, we just fall back to the old method - either original resilver, or if much data has been copied to the new disk - to the new selective scrub (it being much like the resilver, but taking into account those sectors on the target disk which may have been copied over correctly). A somewhat worse case is intermittent errors in random times and logical disk locations due to who knows what - overheating, firmware overflow errors, bus resets, or whatever. It's rather them being the reason for scrub-validation of data after mass migration, perhaps (as well as a reason for preventive regular scrubs)... //Jim _______________________________________________ zfs-discuss mailing list firstname.lastname@example.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss