On 4/5/13 6:39 PM, Jeff Davis wrote:
On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:
Maybe we could scan forward to check whether a corrupted WAL record is
followed by one or more valid ones with sensible LSNs. If it is,
chances are high that we haven't actually hit the end of the WAL. In
that case, we could either log a warning, or (better, probably) abort
crash recovery.

+1.

Corruption of fields which we require to scan past the record would
cause false negatives, i.e. no trigger an error even though we do
abort recovery mid-way through. There's a risk of false positives too,
but they require quite specific orderings of writes and thus seem
rather unlikely. (AFAICS, the OS would have to write some parts of
record N followed by the whole of record N+1 and then crash to cause a
false positive).

Does the xlp_pageaddr help solve this?

Also, we'd need to be a little careful when written-but-not-flushed WAL
data makes it to disk, which could cause a false positive and may be a
fairly common case.

Apologies if this is a stupid question, but is this mostly an issue due to torn 
pages? IOW, if we had a way to ensure we never see torn pages, would that mean 
an invalid CRC on a WAL page indicated there really was corruption on that page?

Maybe it's worth putting (yet more) thought into the torn page issue... :/
--
Jim C. Nasby, Data Architect                       j...@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to