From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> We recently upgraded several of our serverds. We had a total
> of 4 HDD failures
> on reboots, including 2 failures from a RAID 5 array (luckily
> in that case we we
> shutting it down prior to formatting the whole thing, so the
> data loss wasn't an
> issue- and the other systems used RAID 1 so we could just
> rebuild the drives)
>
> I've only ever seen this on unix/linux systems - windows
> reboots so often it
> causes the failures to be detected much sooner.
> Unfortunately rebooting the
> systems regularly to check for faulty hardware isn't really
> an option with a
> live network. How many other people have seen systems that
> have been working
> fine for a very long period of time fail when reboot? And
> how many people have
> lost a redundent RAID array to multiple HDD failure?
I've seen failures on reboots once or twice for machines that have been up a
significant period of time (some of our machines that run the traffic
control systems stay up for a while). No comment on the OS running - not
really relavent. I've not had a multiple drive failure on RAID yet (thank
goodness) - restoring from backups is a real pain.
I remember talking to someone who was about to move a VMS cluster that
hadn't been rebooted in about 8 years through all sorts of OS upgrades and
application upgrades. They were pretty seriously worried about the boot
code working properly.
John Wiltshire
--
SLUG - Sydney Linux User Group Mailing List - http://slug.org.au/
More Info: http://slug.org.au/lists/listinfo/slug