RE: [SLUG] HDD Failures (Was Linux and Clustering..)

John Wiltshire Thu, 03 Aug 2000 21:37:48 -0700
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]

> We recently upgraded several of our serverds.  We had a total 
> of 4 HDD failures
> on reboots, including 2 failures from a RAID 5 array (luckily 
> in that case we we
> shutting it down prior to formatting the whole thing, so the 
> data loss wasn't an
> issue- and the other systems used RAID 1 so we could just 
> rebuild the drives)
> 
> I've only ever seen this on unix/linux systems - windows 
> reboots so often it
> causes the failures to be detected much sooner.  
> Unfortunately rebooting the
> systems regularly to check for faulty hardware isn't really 
> an option with a
> live network.  How many other people have seen systems that 
> have been working
> fine for a very long period of time fail when reboot?  And 
> how many people have
> lost a redundent RAID array to multiple HDD failure?

I've seen failures on reboots once or twice for machines that have been up a
significant period of time (some of our machines that run the traffic
control systems stay up for a while).  No comment on the OS running - not
really relavent.  I've not had a multiple drive failure on RAID yet (thank
goodness) - restoring from backups is a real pain.

I remember talking to someone who was about to move a VMS cluster that
hadn't been rebooted in about 8 years through all sorts of OS upgrades and
application upgrades.  They were pretty seriously worried about the boot
code working properly.

John Wiltshire


--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug
RE: [SLUG] HDD Failures (Was Linux and Clustering..)

Reply via email to