On Sat, Jan 21, 2017 at 9:09 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Not at all; I just think that it's not clear that they are a net win > for the average user, and so I'm unconvinced that turning them on by > default is a good idea. I could be convinced otherwise by suitable > evidence. What I'm objecting to is turning them on without making > any effort to collect such evidence.
+1 One insight Jim Gray has in the classic paper "Why Do Computers Stop and What Can Be Done About It?" [1] is that fault-tolerant hardware is table stakes, and so most failures are related to operator error, and to a lesser extent software bugs. The paper is about 30 years old. I don't recall ever seeing a checksum failure on a Heroku Postgres database, even though they were enabled as soon as the feature became available. I have seen a few corruption problems brought to light by amcheck, though, all of which were due to bugs in software. Apparently, before I joined Heroku there were real reliability problems with the storage subsystem that Heroku Postgres runs on (it's a pluggable storage service from a popular cloud provider -- the "pluggable" functionality would have made it fairly novel at the time). These problems were something that the Heroku Postgres team dealt with about 6 years ago. However, anecdotal evidence suggests that the reliability of the same storage system *vastly* improved roughly a year or two later. We still occasionally lose drives, but drives seem to fail fast in a fashion that lets us recover without data loss easily. In practice, Postgres checksums do *not* seem to catch problems. That's been my experience, at least. Obviously every additional check helps, and it may be something we can do without any appreciable downside. I'd like to see a benchmark. [1] http://www.hpl.hp.com/techreports/tandem/TR-85.7.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers