On 12.04.2013 22:31, Bruce Momjian wrote:
On Fri, Apr 12, 2013 at 09:28:42PM +0200, Andres Freund wrote:
Only point worth discussing is that this change would make backup blocks be
covered by a 16-bit checksum, not the CRC-32 it is now. i.e. the record
header is covered by a CRC32 but the backup blocks only by 16-bit.

That means we will have to do the verification for this in
ValidXLogRecord() *not* in RestoreBkpBlock or somesuch. Otherwise we
won't always recognize the end of WAL correctly.
And I am a bit wary of reducing the likelihood of noticing the proper
end-of-recovery by reducing the crc width.

Why again are we doing this now? Just to reduce the overhead of CRC
computation for full page writes? Or are we forseeing issues with the
page checksums being wrong because of non-zero data in the hole being
zero after the restore from bkp blocks?

I thought the idea is that we were going to re-use the already-computed
CRC checksum on the page, and we only have 16-bits of storage for that.

No, the patch has to compute the 16-bit checksum for the page when the full-page image is added to the WAL record. There would otherwise be no need to calculate the page checksum at that point, but only later when the page is written out from shared buffer cache.

I think this is a bad idea. It complicates the WAL format significantly. Simon's patch didn't include the changes to recovery to validate the checksum, but I suspect it would be complicated. And it reduces the error-detection capability of WAL recovery. Keep in mind that unlike page checksums, which are never expected to fail, so even if we miss a few errors it's still better than nothing, the WAL checkum is used to detect end-of-WAL. There is expected to be a failure every time we do crash recovery. This far, we've considered the probability of one in 1^32 small enough for that purpose, but IMHO one in 1^16 is much too weak.

If you want to speed up the CRC calculation of full-page images, you could have an optimized version of the WAL CRC algorithm, using e.g. SIMD instructions. Because typical WAL records are small, max 100-200 bytes, and it consists of several even smaller chunks, the normal WAL CRC calculation is quite resistant to common optimization techniques. But it might work for the full-page images. Let's not conflate it with the page checksums, though.

- Heikki

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to