On 01/21/2017 05:35 PM, Tom Lane wrote:
Stephen Frost <sfr...@snowman.net> writes:
* Tom Lane (t...@sss.pgh.pa.us) wrote:
Have we seen *even one* report of checksums catching problems in
auseful way?

This isn't the right question.

I disagree. If they aren't doing something useful for people who
have turned them on, what's the reason to think they'd do something
useful for the rest?


I believe Stephen is right. The fact that you don't see something, e.g. reports about checksums catching something in production deployments, proves nothing because of "survivorship bias" discovered by Abraham Wald during WWW II [1]. Not seeing bombers with bullet holes in engines does not mean you don't need to armor engines. Quite the opposite.

[1] https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d#.j9d9c35mb

Applied to checksums, we're quite unlikely to see reports about data corruption caught by checksums because "ERROR: invalid page in block X" is such a clear sign of data corruption that people don't even ask us about that. Combine that with the fact that most people are running with defaults (i.e. no checksums) and that data corruption is a rare event by nature, and we're bound to have no such reports.

What we got, however, are reports about strange errors from instances without checksums enabled, that were either determined to be data corruption, or disappeared after dump/restore or reindexing. It's hard to say for sure whether those were cases of data corruption (where checksums might have helped) or some other bug (resulting in a corrupted page with the checksum computed on the corrupted page).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to