Greg Stark <st...@mit.edu> writes:
> A single WAL record can be over 24kB.

<pedantic>
Actually, WAL records can run to megabytes.  Consider for example a
commit record for a transaction that dropped thousands of tables ---
there'll be info about each such table in the commit record, to cue
replay to remove those files.
</pedantic>

> If you replayed the following record but not this record you would
> have an inconsistent database. ...
> Or it could be an index insert for that tuple would would result in a
> physically inconsistent database with index pointers that point to
> incorrect tuples. Index scans would return tuples that didn't match
> the index  or would miss tuples that should be returned.

Skipping actions such as index page splits would lead to even more fun.

Even in simple cases such as successive inserts and deletions in the
same heap page, failing to replay some of the actions is going to be
disastrous.  The *best case* scenario for that is that WAL replay
PANICs when it notices that the action it's trying to replay is
inconsistent with the current state of the page, eg it's trying to
insert at a TID that already exists.

IMO we can't proceed past a broken WAL record.  The actually useful
suggestion upthread was that we try to notice whether there seem
to be valid WAL records past the broken one, so that we could warn
the DBA that some commits might have been lost.  I don't think we
can do much in the way of automatic data recovery, but we could give
the DBA a chance to do forensics rather than blindly starting up (and
promptly overwriting all the evidence).

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to