On Tue, Mar 06, 2012 at 09:25:17AM -0500, Robert Haas wrote:
> > 2. Turning checksums on/off/on/off in rapid succession can cause false
> > positive reports of checksum failure if crashes occur and are ignored.
> > That may lead to the feature and PostgreSQL being held in disrepute.
> 
> This I do think is a problem, although not for precisely the reason
> stated here.  In my experience, in data corruption situations, the
> first thing customers do is blame PostgreSQL: they don't believe it's
> the hardware; they accuse us of having bugs in our code.  Having a
> checksum feature would be valuable, because, first, we'd perhaps
> detect problems sooner and, second, people understand what checksums
> are and that checksum failures really shouldn't happen unless the
> hardware is bad.  More generally, one of the purposes of checksums is
> to distinguish hardware failure from other possible causes of data
> corruption problems.  If there are code paths where checksum failures
> can happy despite the hardware being good, I think that the patch will
> fail to accomplish its goal of giving us confidence that the hardware
> is bad.

I think the "turning checksums on/off/on/off" is really a killer
problem, and obviously many of the actions needed to make it safe make
the checksum feature itself less useful.  

One crazy idea would be to have a checksum _version_ number somewhere on
the page and in pg_controldata.  When you turn on checksums, you
increment that value, and all new checksum pages get that checksum
version;  if you turn off checksums, we just don't check them anymore,
but they might get incorrect due to a hint bit write and a crash.  When
you turn on checksums again, you increment the checksum version again,
and only check pages having the _new_ checksum version.

Yes, this does add additional storage requirements for the checksum, but
I don't see another clean option.  If you can spare one byte, that gives
you 255 times to turn on checksums;   after that, you have to
dump/reload to use the checksum feature.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to