Re: [zfs-discuss] ZFS Success Stories

2008-10-21 Thread Robert Milkowski
Hello Marc,

Tuesday, October 21, 2008, 8:14:17 AM, you wrote:

MB> About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB SATA
MB> drives. After 10 months I ran out of space and added a mirror of 2 250GB
MB> drives to my pool with "zpool add". No pb. I scrubbed it weekly. I only saw 
1
MB> CKSUM error one day (ZFS self-healed itself automatically of course). Never
MB> had any pb with that server.

MB> After running again out of space I replaced it with a new system running
MB> snv_82, configured with a raidz on top of 7 750GB drives. To burn in the
MB> machine, I wrote a python script that read random sectors from the drives. I
MB> let it run for 48 hours to subject each disk to 10+ million I/O operations.
MB> After it passed this test, I created the pool and run some more scripts to
MB> create/delete files off it continously. To test disk failures (and SATA
MB> hotplug), I disconnected and reconnected a drive at random while the scripts
MB> were running. The system was always able to redetect the drive immediately
MB> after being plugged in (you need "set sata:sata_auto_online=1" for this to
MB> work). Depending on how long the drive had been disconnected, I either 
needed
MB> to do a "zpool replace" or nothing at all, for the system to re-add the disk
MB> to the pool and initiate a resilver. After these tests, I trusted the system
MB> enough to move all my data to it, so I rsync'd everything and double-checked
MB> it with MD5 sums.

MB> I have another ZFS server, at work, on which 1 disk someday started acting
MB> weirdly (timeouts). I physically replaced it, and ran "zpool replace". The
MB> resilver completed successfully. On this server, we have seen 2 CKSUM errors
MB> over the last 18 months or so. We read about 3 TB of data every day from it
MB> (daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2 silent
MB> data corruptions while reading that quantity of data is about the expected
MB> error rate of modern SATA drives. (Again ZFS self-healed itself, so this was
MB> completely transparent to us.)

Which means you haven't experienced silent data corruption thanks to
ZFS. :)

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Success Stories

2008-10-21 Thread Marc Bevand
About 2 years ago I used to run snv_55b with a raidz on top of 5 500GB SATA 
drives. After 10 months I ran out of space and added a mirror of 2 250GB 
drives to my pool with "zpool add". No pb. I scrubbed it weekly. I only saw 1 
CKSUM error one day (ZFS self-healed itself automatically of course). Never 
had any pb with that server.

After running again out of space I replaced it with a new system running 
snv_82, configured with a raidz on top of 7 750GB drives. To burn in the 
machine, I wrote a python script that read random sectors from the drives. I 
let it run for 48 hours to subject each disk to 10+ million I/O operations. 
After it passed this test, I created the pool and run some more scripts to 
create/delete files off it continously. To test disk failures (and SATA 
hotplug), I disconnected and reconnected a drive at random while the scripts 
were running. The system was always able to redetect the drive immediately 
after being plugged in (you need "set sata:sata_auto_online=1" for this to 
work). Depending on how long the drive had been disconnected, I either needed 
to do a "zpool replace" or nothing at all, for the system to re-add the disk 
to the pool and initiate a resilver. After these tests, I trusted the system 
enough to move all my data to it, so I rsync'd everything and double-checked 
it with MD5 sums.

I have another ZFS server, at work, on which 1 disk someday started acting 
weirdly (timeouts). I physically replaced it, and ran "zpool replace". The 
resilver completed successfully. On this server, we have seen 2 CKSUM errors 
over the last 18 months or so. We read about 3 TB of data every day from it 
(daily rsync), that amounts to about 1.5 PB over 18 months. I guess 2 silent 
data corruptions while reading that quantity of data is about the expected 
error rate of modern SATA drives. (Again ZFS self-healed itself, so this was 
completely transparent to us.)

-marc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Success Stories

2008-10-20 Thread gm_sjo
Hi all,

I  have built out an 8TB SAN at home using OpenSolaris + ZFS. I have
yet to put it into 'production' as a lot of the issues raised on this
mailing list are putting me off trusting my data onto the platform
right now.

Throughout time, I have stored my personal data on NetWare and now NT
and this solution has been 100% reliable for the last 12 years. Never
a single problem (nor have I had any issues with NTFS with the tens of
thousands of spindles i've worked with over the years).

I appreciate 99% of the time people only comment if they have a
problem, which is why I think it'd be nice for some people who have
successfully implemented ZFS, including making various use of the
features (recovery, replacing disks, etc), could just reply to this
post with a sentence or paragraph detailing how great it is for them.
Not necessarily interested in very small implementations of one/two
disks that haven't changed config since the first day it was
installed, but more aimed towards setups that are 'organic' and have
changed/been_administered over time (to show functionality of the
tools, resilience of the platform, etc.)..

.. Of course though, I guess a lot of people who may have never had a
problem wouldn't even be signed up on this list! :-)


Thanks!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss