>> I've got a problem with the assumption that, when pg_control is trash, >> megabytes or gigabytes of WAL can still be relied on completely. >> >> I'm almost inclined to suggest that we not get next-LSN from WAL, but >> by scanning all the pages in the main data store and computing the max >> observed LSN. This is clearly not very attractive from a performance >> standpoint, but it would avoid the obvious failure mode where you lost >> some recent WAL segments along with pg_control.
> I think it could be useful to have a tool that scans all the blocks > and computes that value, but I'd want it to just print the value out > and let me decide what to do about it. There are cases where you > don't necessarily want to clobber pg_control, but you do have future > LSNs in your data file pages. This can be either because the disk ate > your WAL, or because you didn't create recovery.conf, or because your > disk corrupted the LSNs on the data file pages. I'd want a tool that > could be either run on an individual file, or recursively on a > directory. The whole point is we need to find a valid next-LSN (Redo Replay location as I understand). If we let user decide about it, I think it can lead to inconsistent database. As per my understanding postgres database can come to consistent point only if it has both datafiles and WAL after crash. So I am not able to think if it lost WAL, how we can it make a consistent database. > If these values seem acceptable, use -f to force reset. > [rhaas pgsql]$ pg_resetxlog -f ~/pgdata > pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it > Transaction log reset > [rhaas pgsql]$ postgres > LOG: database system was shut down at 2012-06-19 15:25:28 EDT > LOG: database system is ready to accept connections > LOG: autovacuum launcher started > So I still don't understand what problem we're solving here. 1. The values (like nextoid, nextxid, etc) are guessed values which can be improved by having these values from last checkpoint record using WAL files. 2. The value for next-LSN (ControlFile.checkPointCopy.redo) will be guessed value which if directly used for recovery after pg_resetxlog will lead to inconsistent database. So I want to improve the logic to have either appropriate value for next-LSN or more reliable value. In documentation, it is mentioned that starting database after using pg_resetxlog can contain inconsistent data. The exact wording is mentioned below in mail. My purposal to work on this Todo item is to improve the values generated for pg_control, so that it becomes more easy for users to recover from database corruption scenario's. I don't think even after working on this feature, user can recover database for all corruption scenario's. However it can improve the situation from now. Pg_resetxlog documentation related excerpts- "After running this command, it should be possible to start the server, but bear in mind that the database might contain inconsistent data due to partially-committed transactions. You should immediately dump your data, run initdb, and reload. After reload, check for inconsistencies and repair as needed." With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers