> Nicolas Barbier  wrote:
> 2011/12/30 Ants Aasma :
>> Kevin Grittner  wrote:
>>> positives. To get this right for a checksum in the page header,
>>> double-write would need to be used for all cases where
>>> full_page_writes now are used (i.e., the first write of a page
>>> after a checkpoint), and for all unlogged writes (e.g.,
>>> hint-bit-only writes). There would be no correctness problem for
>>> always using double-write, but it would be unnecessary overhead
>>> for other page writes, which I think we can avoid.
>> Unless I'm missing something, double-writes are needed for all
>> writes, not only the first page after a checkpoint. Consider this
>> sequence of events:
>> 1. Checkpoint
>> 2. Double-write of page A (DW buffer write, sync, heap write)
>> 3. Sync of heap, releasing DW buffer for new writes.
>> ... some time goes by
>> 4. Regular write of page A
>> 5. OS writes one part of page A
>> 6. Crash!
>> Now recovery comes along, page A is broken in the heap with no
>> double-write buffer backup nor anything to recover it by in the
>> WAL.
> I guess the assumption is that the write in (4) is either backed by
> the WAL, or made safe by double writing. ISTM that such reasoning
> is only correct if the change that is expressed by the WAL record
> can be applied in the context of inconsistent (i.e., partially
> written) pages, which I assume is not the case (excuse my ignorance
> regarding such basic facts).
> So I think you are right.
Hmm.  It appears that I didn't think that through all the way.  I see
two alternatives.
(1)  We don't eliminate full_page_writes and we only need to use
double-writes for unlogged writes.
(2)  We double-write all writes and on recovery we only apply WAL to
a page from pd_lsn onward.  We would start from the same point and
follow the same rules except that when we read a page and find a
pd_lsn past the location of the record we are applying, we do nothing
because we are 100% sure everything to that point is safely written
and not torn.  full_page_writes to WAL would not be needed.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to