On Apr4, 2013, at 23:21 , Jeff Janes <jeff.ja...@gmail.com> wrote:
> This brings up a pretty frightening possibility to me, unrelated to data 
> checksums.  If a bit gets twiddled in the WAL file due to a hardware issue or 
> a "cosmic ray", and then a crash happens, automatic recovery will stop early 
> with the failed WAL checksum with an innocuous looking message.  The system 
> will start up but will be invisibly inconsistent, and will proceed to 
> overwrite that portion of the WAL file which contains the old data (real data 
> that would have been necessary to reconstruct, once the corruption is finally 
> realized ) with an end-of-recovery checkpoint record and continue to chew up 
> real data from there.

Maybe we could scan forward to check whether a corrupted WAL record is followed 
by one or more valid ones with sensible LSNs. If it is, chances are high that 
we haven't actually hit the end of the WAL. In that case, we could either log a 
warning, or (better, probably) abort crash recovery. The user would then need 
to either restore the broken WAL segment from backup, or override the check by 
e.g. setting recovery_target_record="invalid_record". (The default would be 
recovery_target_record="last_record". The name of the GUC tries to be 
consistent with existing recovery.conf settings, even though it affects crash 
recovery, not archive recovery.)

Corruption of fields which we require to scan past the record would cause false 
negatives, i.e. no trigger an error even though we do abort recovery mid-way 
through. There's a risk of false positives too, but they require quite specific 
orderings of writes and thus seem rather unlikely. (AFAICS, the OS would have 
to write some parts of record N followed by the whole of record N+1 and then 
crash to cause a false positive).

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to