On 10/18/2016 07:01 AM, Pavan Deolasee wrote:
On Mon, Oct 17, 2016 at 4:34 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote:

visibilitymap_truncate is actually also wrong, in a different way. The
truncation WAL record is written only after the VM (and FSM) are truncated.
But visibilitymap_truncate() has already modified and dirtied the page. If
the VM page change is flushed to disk before the WAL record, and you crash,
you might have a torn VM page and a checksum failure.

Right, I missed the problem with checksums.

Simply replacing the MarkBufferDirtyHint() call with MarkBufferDirty() in
FreeSpaceMapTruncateRel would have the same issue. If you call
MarkBufferDirty(), you must WAL-log the change, and also set the page's LSN
to make sure the WAL record is flushed first.

I'm not sure I fully understand this. If the page is flushed before the WAL
record, we might have a truncated FSM where as the relation truncation is
not replayed. But that's not the same problem, right? I mean, you might
have an FSM which is not accurate, but it won't at the least return the
blocks outside the range. Having said that, I agree your proposed changes
are more robust.

Actually, this is still not 100% safe. Flushing the WAL before modifying the FSM page is not enough. We also need to WAL-log a full-page image of the FSM page, otherwise we are still vulnerable to the torn page problem.

I came up with the attached. This is fortunately much simpler than my previous attempt. I replaced the MarkBufferDirtyHint() calls with MarkBufferDirty(), to fix the original issue, plus WAL-logging a full-page image to fix the torn page issue.

BTW any thoughts on race-condition on the primary? Comments at
MarkBufferDirtyHint() seems to suggest that a race condition is possible
which might leave the buffer without the DIRTY flag, but I'm not sure if
that can only happen when the page is locked in shared mode.

I think the race condition can only happen when the page is locked in shared mode. In any case, with this proposed fix, we'll use MarkBufferDirty() rather than MarkBufferDirtyHint(), so it's moot.

- Heikki

Attachment: truncation-vm-fsm-2.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to