Hi, Ya the error logs came, but only when the DB1 was corrupted. Here is a excerpt from the logs.
*LOG: database system was shut down at 2013-06-19 09:40:55 UTC LOG: restored log file "00000004.history" from archive cp: cannot stat `/data/pgsql/archivedir/00000005.history': No such file or directory LOG: restored log file "00000004.history" from archive LOG: entering standby mode cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory cp: cannot stat `/data/pgsql/archivedir/000000030000000000000013': No such file or directory LOG: consistent recovery state reached at 0/13000080 LOG: record with zero length at 0/13000080 LOG: database system is ready to accept read only connections cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory cp: cannot stat `/data/pgsql/archivedir/000000030000000000000013': No such file or directory cp: cannot stat `/data/pgsql/archivedir/00000005.history': No such file or directory LOG: streaming replication successfully connected to primary ERROR: cannot execute CREATE ROLE in a read-only transaction STATEMENT: CREATE USER replicationuser REPLICATION; LOG: invalid record length at 0/13000080 FATAL: terminating walreceiver process due to administrator command cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory cp: cannot stat `/data/pgsql/archivedir/00000005.history': No such file or directory LOG: invalid record length at 0/13000080 * This is what happened.. When DB1 came up as a new standby, It connected SR with DB2 (new master) then after some time it terminated the walreceiver process. And the invalid record length logs went on, until i took base backup on DB1 and restarted it. Also On DB1 (before DB1 was started in standby mode) : pg_controldata /data/pgsql/9.2/data/ pg_control version number: 922 Catalog version number: 201204301 Database system identifier: 5891091665573732008 Database cluster state: shut down pg_control last modified: Wed Jun 19 09:40:55 2013 Latest checkpoint location: 0/13000020 Prior checkpoint location: 0/124F1BC0 Latest checkpoint's REDO location: 0/13000020 Latest checkpoint's TimeLineID: 3 Latest checkpoint's full_page_writes: off Latest checkpoint's NextXID: 0/8409 Latest checkpoint's NextOID: 18470 Latest checkpoint's NextMultiXactId: 1 Latest checkpoint's NextMultiOffset: 0 Latest checkpoint's oldestXID: 669 Latest checkpoint's oldestXID's DB: 1 Latest checkpoint's oldestActiveXID: 0 Time of latest checkpoint: Wed Jun 19 09:40:54 2013 Minimum recovery ending location: 0/0 Backup start location: 0/0 Backup end location: 0/0 End-of-backup record required: no Current wal_level setting: hot_standby Current max_connections setting: 300 Current max_prepared_xacts setting: 0 Current max_locks_per_xact setting: 64 Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 WAL block size: 8192 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Maximum size of a TOAST chunk: 1996 Date/time type storage: 64-bit integers Float4 argument passing: by value Float8 argument passing: by value Will it be wrong to compare* the "last replayed xlog id" on DB2* (which is the new master) with "Latest checkpoint location" on DB1 before starting DB1 in standby mode and if "Latest checkpoint location" on DB1(old master) is greater than *"last replayed xlog id*" on DB2(new master) then i have to go for basebackup. regards, Prakhar. On Wed, Jun 19, 2013 at 1:11 PM, Albe Laurenz <laurenz.a...@wien.gv.at>wrote: > prakhar jauhari wrote: > > I am facing a similar kind of problem, but in a two node setup. > Streaming replication is being used > > with a high wal_keep_segments, using log shipping to bump up timelines > on the standby server to setup > > SR(streaming replication). > > DB1 - master > > > > DB2 - standby > > > > > > When a switchover happens DB2 becomes the new master and when DB1 comes > up it will act as the standby > > to the new master (.history files from new master are copied to DB1 to > bump up its timeline so as to > > setup SR). DB1 is not recreated from scratch. This runs fine in normal > switchover, but there seems to > > be problem in the following situation, leading to database corruption: > > > > > > Current state : > > DB1 - master > > DB2 - standby > > > > > > Now the failing scenario: > > > > > > > > 1. DB2 machine goes down. > > > > 2. After some time DB1 machine also goes down (DB2 is still down). > > > > 3. Now DB2 comes up (it will join the cluster as master as DB1 is still > down). > > > > 4. DB2 is started as master postgresql. > > > > 5. Now DB1 comes up (it will join the cluster as standby to DB2) > > > > 6. Now when DB1 attempts to SR with DB2, DB1 gets corrupted. > > > > > > Looked into the issue and found that when DB1 went down initially, it > created some WAL's which were > > not synced to DB2 as it was already down. > > > > Now when DB2 started as master it still had not played the last few > WAL's created by DB1(when it was > > master). DB2 starts as master properly. > > > > When DB1 came as standby to DB2, it bumped it timeline using history > file from DB2, but when SR was > > setup with DB2, DB1 gets corrupted. > > > > > > Now the question is: > > > > > > 1. Is this a theoretically valid approach? > > > > 2. If it is a valid approach, then how can i detect such a scenario > (where SR will corrupt the DB)? So > > that i can go for a basebackup in such situation. > > If you want to use the old primary as new standby without a new backup, > you have to ascertain that all transactions from the former have > been replayed at the latter. > > To figure out where the primary currently is, you can > SELECT pg_current_xlog_location(); > > To figure how much the standby has replayed, you can > SELECT pg_last_xlog_replay_location(); > > Of course this only works if both are up. > > I think that it would be tricky to automatize that; I'd choose > making a new backup after each failover. > > In the event of a controlled failover it might be an option. > > I am surprised that the scenario you described leads to > corruption; I would have expected an error message. > > Yours, > Laurenz Albe >