On Sun, Jan 22, 2017 at 7:37 AM, Stephen Frost <sfr...@snowman.net> wrote: > Exactly, and that awareness will allow a user to prevent further data > loss or corruption. Slow corruption over time is a very much known and > accepted real-world case that people do experience, as well as bit > flipping enough for someone to write a not-that-old blog post about > them: > > https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1
I have no doubt that low frequency cosmic ray bit flipping in main memory is a real phenomenon, having worked at a company that runs enough computers to see ECC messages in kernel logs on a regular basis. But our checksums can't actually help with that, can they? We verify checksums on the way into shared buffers, and compute new checksums on the way back to disk, so any bit-flipping that happens in between those two times -- while your data is a sitting duck in shared buffers -- would not be detected by this scheme. That's ECC's job. So the risk being defended against is corruption while in the disk subsystem, whatever that might consist of (and certainly that includes more buffers in strange places that themselves are susceptible to memory faults etc, and hopefully they have their own error detection and correction). Certainly the ZFS community thinks that pile of turtles can't be trusted and that extra checks are worthwhile, and you can find anecdotal reports and studies about filesystem corruption being detected, for example in the links from https://en.wikipedia.org/wiki/ZFS#Data_integrity . So +1 for enabling it by default. I always turn that on. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers