På 2007-04-11, skrev Rick Moen: > Quoting Bill Broadley ([EMAIL PROTECTED]): > > Rick Moen wrote: > > > A bad bit in memory, if indicative of a physical defect, will quickly > > > manifest unmistakeably on Linux in the manner I described. If not thus > > > indicative, (from empirical observation over a long period of time:) > > > it's extremely unlikely to have detectable long-term consequences. > > > > You speculate that it contributes to premature httpd deaths but is > > undetectable long term? > > I didn't think what I was saying was that difficult to follow, but here > is what I said, again: "I'd speculate that some non-zero percentage of > prematurely deceased httpd instances owed to that...." I figure that > possibly (i.e., speculate that) some quite small number of such events > ultimately owe to uncorrected single-bit memory errors that are not > associated with actually bad RAM -- but, effectively, it's way down in > the noise of undiagnosable oddities. > > > $10 a dimm requires you to "pay through the node" and the "wealth of > > midas"? > > If you assumed I was endorsing your figure, you assumed wrong. ;-> > > Ironically, the most recent RAM I purchased _was_ ECC, because it was > a gig for the Intel L440GX+ "Lancewood" motherboard in my old VA Linux > Systems model 2230 server. However, let's talk about the HP ProLiant > 380 I was working on recently: 128 MB ECC Registered is $42 at SA > Technologies, Inc. (where I would buy such things by preference). > Without ECC, $32. ECC thus exacts a 31% premium in that case. > > Now, would I pay that premium? I might, or I might put the money > somewhere else, where it's more likely to yield significant benefit. > (In 2007, I'd actually try not to drop cash on a 800MHz PIII that I > didn't dearly love, but five years ago might have been different.)
Here's my perspective on that. Assuming that one of those uncorrected single-bit errors turned out to be in the worst possible place (say, a pointer in the kernel or in postgresql in a journaling memory structure) that turned out to cause data corruption that caused a day of work to be lost (i.e., the last good backup was 24 hours old), then: - assuming a man-hour is worth $50 (that's probably low) - assuming that the machine is used by four people (other people's servers have more users), then the problem would cost $1600 to recover from, plus whatever additional time was required to take the system down to restore the backup, fsck the filesystem, etc. That notwithstanding, I agree with Rick about disk failures being an order of magnitute more likely. I've experienced the pain of a failing disk more times that I care to remember. -- Henry House +1 530 753 3361 ext. 13 Please don't send me HTML mail! My mail system frequently rejects it. The unintelligible text that may follow is a digital signature. See <http://hajhouse.org/pgp> to find out how to use it. My OpenPGP key: <http://hajhouse.org/hajhouse.asc>.
signature.asc
Description: Digital signature
_______________________________________________ vox-tech mailing list [email protected] http://lists.lugod.org/mailman/listinfo/vox-tech
