Jan-Benedict Glaw schrieb am 2006-07-31:

> > Massive hardware problems don't count. ext2/ext3 doesn't look much better in
> > such cases. I had a machine with RAM gone bad (no ECC - I wonder what
> 
> They do! Very much, actually. These happen In Real Life, so I have to
> pay attention to them. Once you're in setups with > 10000 machines,
> everything counts. At some certain point, you can even use HDD's
> temperature sensors in old machines to diagnose dead fans.
> 
> Everything that eases recovery for whatever reason is something you
> have to pay attention to. The simplicity of ext{2,3} is something I
> really fail to find proper words for. As well as the really good fsck.
> Once seen a SIGSEGV'ing fsck, you really don't want to go there.

The point is: If you've written data with broken hardware (RAM, bus,
controllers - loads of them, CPU), what is on your disks is
untrustworthy anyways, and fsck isn't going to repair your gzip file
where every 64th bit has become a 1 or when the battery-backed write
cache threw 60 MB down the drain...

Of course, an fsck that crashes is unbearable, but that doesn't apply to
"broken hardware" failures. You need backups with a few generations to
avoid massively losing data.

-- 
Matthias Andree

Reply via email to