Bill Sommerfeld wrote:
On Tue, 2006-07-18 at 15:32, Daniel Rock wrote:
Stop right here! :)  If you have a large number of identical disks which
operate in the same environment[1], and possibly the same enclosure, it's
quite likely that you'll see 2 or more disks die within the same,
relatively short, timeframe.
Not my experience. I work and have worked with several disk arrays (EMC, IBM, Sun, etc.) and the failure rates of individual disks were fairly random.

My observation is that occasionally -- very occasionally -- you will get
a bad batch of disks, which, due to a subtle design or manufacturing
defect,  will all pass their tests, etc., run fine for some small number
of months or years (and, of course, long enough for you to believe
they're ready for production use..), and then start dying in droves.

Yes, something like the sticksion problem that plaqued old Quantum ProDrives,
or the phosphorus contamination which plagued some control electronics, or
the defect growth rates caused by crystal growth in the mechanics, et.al.
In my experience, there are also cases where bad power supplies cause
a whole bunch of unhappiness.  I've also heard horror stories about failed
air conditioning and extreme vibration problems (eg. a stamping plant).
From a modelling perspective, these are difficult because we don't know
how to assign reasonable failure rates to them.  So beware, most availability
models assume perfect manufacturing and operating environments as well as
bug-free software and firmware.  YMMV.

The paranoid in me wonders whether it would be worthwhile to buy pairs
of disks of the same size from each of two different manufacturers, and
mirror between unlike pairs, to control against this risk...

First, let's convince everyone to mirror and not RAID-Z[2] -- boil one
ocean at a time, there are only 5 you know... :-)
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to