Aidan Van Dyk <[EMAIL PROTECTED]> writes: > * Gregory Stark <[EMAIL PROTECTED]> [081001 11:59]: > >> If setting a hint bit cleared a flag on the buffer header then the >> checksumming process could set that flag, begin checksumming, and check that >> the flag is still set when he's finished. >> >> Actually I suppose that wouldn't actually be good enough. He would have to do >> the i/o and check that the checksum was still valid after the i/o. If not >> then >> he would have to recalculate the checksum and repeat the i/o. That might make >> the idea a loser since I think the only way it wins is if you rarely actually >> get someone setting the hint bits during i/o anyways. > > A doubled-write is essentially "free" with PostgreSQL because it's not > doing direct IO, rather relying on the OS page cache to be efficient.
All things are relative. What we're talking about here is all cpu and memory-bandwidth costs anyways so, yes, it'll be cheap compared to the disk i/o but it'll still represent doubling the memory bandwidth and cpu cost of these routines. That said you would only have to do it in cases where the hint bits actually get twiddled. That might not actually happen often. > But the problem is if something crashes (or interrupts PG) between those > two writes, you've got a block of data into the pagecache (and possibly > to the disks) that PG will no longer read in, because the CRC/checksum > fails despite the actual content being valid... I don't think this is a problem because we're still doing WAL logging. The i/o isn't allowed to happen until the page has been WAL logged and fsynced anyways. Incidentally I think the JUST_DIRTIED bit might actually be sufficient here. Hint bits already cause the buffer to be marked dirty. So the only case I see a real problem for is when we're writing a block as part of a checkpoint and find it's JUST_DIRTIED after writing it. In that case we would have to start over and write it again rather than leave it marked dirty. If we're writing the block as part of normal i/o then we could just decide to leave the possibly-bogus checksum in the table since it'll be overwritten by a full page write anyways. It'll be overwritten in normal use when the newly dirty buffer is eventually written out again. If you're not doing full page writes then you would have to restore from backup in cases where previously the page might actually have been valid though. That's kind of unfortunate. In theory it hasn't actually changed anything the risks of running without full page writes but it has certainly increased the likelihood of actually having to deal with "corruption" in the form of a gratuitously invalid checksum. (Of course without checksums you don't ever actually know if you have corruption -- and real corruption). > One possibility would be to "double-buffer" the write... i.e. as you > calculate your CRC, you're doing it on a local copy of the block, which > you hand to the OS to write... If you're touching the whole block of > memory to CRC it, it isn't *ridiculously* more expensive to copy the > memory somewhere else as you do it... Hm. Well that might actually work. You can do the CRC at the same time as copying to the buffer, effectively doing it for the same cost as the CRC alone. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers