Martijn van Oosterhout <[EMAIL PROTECTED]> writes:
> Actually, the real problem to me seems to be that to check the checksum
> when you read the page in, you need to look at the contents of the page
> and "assume" some of the values in there are correct, before you can
> even calculate the checksum. If the page really is corrupted, chances
> are the item pointers are going to be bogus, but you need to read them
> to calculate the checksum...

Hmm.  You could verify the values closely enough to ensure you don't
crash while redoing the CRC calculation, which ought to be sufficient.
Still, I agree that the whole thing looks too Rube Goldbergian to count
as a reliability enhancer, which is what the point is after all.

> Double-buffering allows you to simply checksum the whole page, so
> creating a COMP_CRC32_WITH_COPY() macro would do it. Just allocate a
> block on the stack, copy/checksum it there, do the write() syscall and
> forget it.

I think the argument is about whether we increase our vulnerability to
torn-page problems if we just add a CRC and don't do anything else to
the overall writing process.  Right now, a partial write on a
hint-bit-only update merely results in some hint bits getting lost
(as long as you discount the scenario where the disk fails to read a
partially-written sector at all --- maybe we're fooling ourselves to
ignore that?).  With a CRC added, that suddenly becomes a corrupted-page
situation, and it's not easy to tell that no real harm was done.

Again, the real bottom line here is whether there will be a *net*
gain in reliability.  If a CRC adds too many false-positive
reports of bad data, it's not going to be a win.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to