Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Amit Kapila Tue, 19 Jun 2012 21:42:49 -0700

>>  I've got a problem with the assumption that, when pg_control is trash,
>>  megabytes or gigabytes of WAL can still be relied on completely.
>>
>>  I'm almost inclined to suggest that we not get next-LSN from WAL, but
>>  by scanning all the pages in the main data store and computing the max
>>  observed LSN.  This is clearly not very attractive from a performance
>>  standpoint, but it would avoid the obvious failure mode where you lost
>>  some recent WAL segments along with pg_control.


> I think it could be useful to have a tool that scans all the blocks
> and computes that value, but I'd want it to just print the value out
> and let me decide what to do about it.  There are cases where you
> don't necessarily want to clobber pg_control, but you do have future
> LSNs in your data file pages.  This can be either because the disk ate
> your WAL, or because you didn't create recovery.conf, or because your
> disk corrupted the LSNs on the data file pages.  I'd want a tool that
> could be either run on an individual file, or recursively on a
> directory.

The whole point is we need to find a valid next-LSN (Redo Replay location as
I understand).
If we let user decide about it, I think it can lead to inconsistent
database.

As per my understanding postgres database can come to consistent point only
if it has
both datafiles and WAL after crash. 
So I am not able to think if it lost WAL, how we can it make a consistent
database.

> If these values seem acceptable, use -f to force reset.
> [rhaas pgsql]$ pg_resetxlog -f ~/pgdata
> pg_resetxlog: pg_control exists but is broken or unknown version; ignoring
it
> Transaction log reset
> [rhaas pgsql]$ postgres
> LOG:  database system was shut down at 2012-06-19 15:25:28 EDT
> LOG:  database system is ready to accept connections
> LOG:  autovacuum launcher started

> So I still don't understand what problem we're solving here.

1. The values (like nextoid, nextxid, etc) are guessed values which can be
improved by having
   these values from last checkpoint record using WAL files. 

2. The value for next-LSN (ControlFile.checkPointCopy.redo) will be guessed
value which if
   directly used for recovery after pg_resetxlog will lead to inconsistent
database.
   So I want to improve the logic to have either appropriate value for
next-LSN or more reliable value.

In documentation, it is mentioned that starting database after using
pg_resetxlog can contain inconsistent data.
The exact wording is mentioned below in mail.

My purposal to work on this Todo item is to improve the values generated for
pg_control, so that it becomes more easy for users to recover from database
corruption scenario's.
I don't think even after working on this feature, user can recover database
for all corruption scenario's. However it can improve the situation from
now. 

Pg_resetxlog documentation related excerpts- 
"After running this command, it should be possible to start the server, but
bear in mind that the database might contain inconsistent data due to
partially-committed transactions. You should immediately dump your data, run
initdb, and reload. After reload, check for inconsistencies and repair as
needed."

   

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

Reply via email to