On 22.12.2010 17:31, Simon Riggs wrote:
On Wed, 2010-12-22 at 17:01 +0200, Heikki Linnakangas wrote:
There's plenty of stuff in memory that's not covered by an
application-level CRC. That's what ECC RAM is for.
http://www.google.com/research/pubs/archive/35162.pdf
Google research shows that each DIMM has an 8% chance per annum of
uncorrectable memory errors, even on ECC.
You misread that paper. From summary:
About a third of machines and over 8% of DIMMs in
our fleet saw at least one *correctable* error per year.
Emphasis mine.
Our
per-DIMM rates of correctable errors translate to an aver-
age of 25,000–75,000 FIT (failures in time per billion hours
of operation) per Mbit and a median FIT range of 778 –
25,000 per Mbit (median for DIMMs with errors), while pre-
vious studies report 200-5,000 FIT per Mbit. The number of
correctable errors per DIMM is highly variable, with some
DIMMs experiencing a huge number of errors, compared to
others. The annual incidence of uncorrectable errors was
1.3% per machine and 0.22% per DIMM.
So the real figure of uncorrectable errors is 0.22% per DIMM.
Anyway, unreliable RAM calls for more ECC bits in DIMMs, not invasive
architectural changes to every single application in the system.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers