About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB SATA 
drives. After 10 months I ran out of space and added a mirror of 2 250GB 
drives to my pool with "zpool add". No pb. I scrubbed it weekly. I only saw 1 
CKSUM error one day (ZFS self-healed itself automatically of course). Never 
had any pb with that server.

After running again out of space I replaced it with a new system running 
snv_82, configured with a raidz on top of 7 750GB drives. To burn in the 
machine, I wrote a python script that read random sectors from the drives. I 
let it run for 48 hours to subject each disk to 10+ million I/O operations. 
After it passed this test, I created the pool and run some more scripts to 
create/delete files off it continously. To test disk failures (and SATA 
hotplug), I disconnected and reconnected a drive at random while the scripts 
were running. The system was always able to redetect the drive immediately 
after being plugged in (you need "set sata:sata_auto_online=1" for this to 
work). Depending on how long the drive had been disconnected, I either needed 
to do a "zpool replace" or nothing at all, for the system to re-add the disk 
to the pool and initiate a resilver. After these tests, I trusted the system 
enough to move all my data to it, so I rsync'd everything and double-checked 
it with MD5 sums.

I have another ZFS server, at work, on which 1 disk someday started acting 
weirdly (timeouts). I physically replaced it, and ran "zpool replace". The 
resilver completed successfully. On this server, we have seen 2 CKSUM errors 
over the last 18 months or so. We read about 3 TB of data every day from it 
(daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2 silent 
data corruptions while reading that quantity of data is about the expected 
error rate of modern SATA drives. (Again ZFS self-healed itself, so this was 
completely transparent to us.)

-marc


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to